0% found this document useful (0 votes)
379 views148 pages

Communications201112 DL

Uploaded by

hhhzine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
379 views148 pages

Communications201112 DL

Uploaded by

hhhzine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

COMMUNICATIONS

ACM
CACM.ACM.ORG OF THE 12/2011 VOL.54 NO.12

Visual Crowd Surveillance


Is Like Hydrodynamics
The Legacy of
Steve Jobs
Answer Set
Programming
How Will Astronomy
Archives Survive the
Data Tsunami?
Formal Analysis of
MPI-based Parallel
Programs
Brave NUI World

Association for
Computing Machinery
ADVANCE YOUR CAREER WITH ACM TECH PACKS…

For Serious
Computing Professionals.

Searching through technology books, magazines, and websites


to find the authoritative information you need takes time.
That’s why ACM created“Tech Packs."
• Compiled by subject matter experts to help serious Current topics include Cloud Computing, Parallel
computing practitioners, managers, academics, and Computing and Enterprise Architecture. In development
students understand today’s vital computing topics. are Mobility, Business Intelligence/Data Management,
Security, and more.
• Comprehensive annotated bibliographies: from ACM
Digital Library articles, conference proceedings, and
Suggestions for future Tech Packs? Contact:
videos to ACM Learning Center Online Books and Courses
to non-ACM books, tutorials, websites, and blogs. Yan Timanovsky
ACM Education Manager
• Peer-reviewed for relevance and usability by computing timanovsky@hq.acm.org
professionals, and updated periodically to ensure currency.

Access to ACM resources in Tech Packs is available


to ACM members at http://techpack.acm.org
or http://learning.acm.org.
ACM’s Career & Job Center!
Are you looking for your next IT job?
Do you need Career Advice?

Visit ACM’s Career & Job Center at:


http://www.acm.org/careercenter

◆ ◆ ◆ ◆ ◆

The ACM Career & Job Center offers ACM members


a host of career-enhancing benefits:
➜ A highly targeted focus on job opportunities in the computing
industry
➜ Access to hundreds of corporate job postings

➜ Resume posting keeping you connected to the employment


market while letting you maintain full control over your
confidential information
➜ An advanced Job Alert system that notifies you of new
opportunities matching your criteria
➜ Career coaching and guidance from trained experts dedicated
to your success
➜ A content library of the best career articles complied from
hundreds of sources, and much more!

The ACM Career & Job Center is the perfect place to


begin searching for your next employment opportunity!
Visit today at
http://www.acm.org/careercenter
communications of the acm

Departments News Viewpoints

5 Editor’s Letter 11 The Rise of Molecular Machines 22 The Most Ancient Marketing
Computing for Humans The field of molecular computing By Jaron Lanier
By Moshe Y. Vardi is achieving new levels of control
over biochemical processes and 24 Life, Death, and the iPad:
7 Letters To The Editor fostering sophisticated connections Cultural Symbols and Steve Jobs
To Boost Presentation Quality, between computer science and By Genevieve Bell
Ask Questions the biological sciences.
By Kirk L. Kroeker 26 Technology Strategy and Management
8 BLOG@CACM The Legacy of Steve Jobs
Conferences and Video Lectures; 14 Brave NUI World Reflecting on the career and
Scientific Educational Games Natural user interface developments, contributions of the Apple cofounder.
John Langford analyzes whether such as Microsoft’s Kinect, may By Michael A. Cusumano
conferences should offer indicate the beginning of the end
video lectures. Judy Robertson for the mouse. 29 Emerging Markets
discusses the merits of the Game By Gregory Goth On Turbocharged, Heat-Seeking,
Design Through Mentoring Robotic Fishing Poles
and Collaboration project. 17 Activism Vs. Slacktivism Applying a well-known proverb to
Today’s activists are highly plugged socio-technical transformation.
10 Nominees for Elections and Report into social media, mobile apps, By Kentaro Toyama
of the ACM Nominating Committee and other digital tools. But does
this make a difference where 32 Kode Vicious
25 Calendar it matters most? Debugging on Live Systems
By Dennis McCafferty It is more of a social than
104 Careers a technical problem.
20 CSEdWeek Takes Hold By George V. Neville-Neil
Groups in more than 130 countries
Last Byte will participate in Computer Science 34 Broadening Participation
Education Week this year. Data Trends on Minorities and People
142 Solutions and Sources By Samuel Greengard with Disabilities in Computing
By Peter Winkler Seeking a comprehensive view of
21 Dennis Ritchie, 1941–2011 minority student demographics to
144 Q&A Colleagues recall the creator of C and determine what programs and policies
Scaling Up codeveloper of Unix, an unassuming are needed to promote diversity.
M. Frans Kaashoek talks about but brilliant man who enjoyed playing By Valerie Taylor and Richard Ladner
multicore computing, security, practical jokes on his coworkers.
and operating system design. By Paul Hyman 38 The Profession of IT
By Leah Hoffmann The Grounding Practice
The skill of making and recognizing
grounded claims is essential for
professional practice. Getting
objective data to support your
conclusions is not enough.
By Peter J. Denning

41 Viewpoint
Doctoral Program Rankings for U.S.
Computing Programs: The National
Research Council Strikes Out
A proposal for improving doctoral
Association for Computing Machinery
Advancing Computing as a Science & Profession program ranking strategy.
By Andrew Bernat and Eric Grimson

2 commun ications of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


12/2012 vol. 54 no. 12

Practice Contributed Articles Review Articles

52 74

44 Postmortem Debugging 64 Visual Crowd Surveillance 92 Answer Set Programming at a Glance


in Dynamic Environments through a Hydrodynamics Lens The motivation and key concepts
Many modern dynamic languages People in high-density crowds appear behind answer set programming—
lack tools for understanding to move with the flow of the crowd, a promising approach to declarative
complex failures. like particles in a liquid. problem solving.
By David Pacheco By Brian E. Moore, Saad Ali, By Gerhard Brewka, Thomas Eiter,
Ramin Mehran, and Mubarak Shah and Mirosław Truszczyński
52 How Will Astronomy Archives
Survive the Data Tsunami? 74 License Risks from Ad Hoc Reuse
Astronomers are collecting more of Code from the Internet Research Highlights
data than ever. What practices can Software developers’ reuse of code
keep them ahead of the flood? from the Internet bears legal and 122 Technical Perspective
By G. Bruce Berriman economic risks for their employers. Safety First!
photogra ph Courtesy of NASA/ JPL -Caltech , Illustrat ion by met ro polis, Illust ration by Gwe n Va nhee

and Steven L. Groom By Manuel Sojer and Joachim Henkel By Xavier Leroy

57 Coding Guidelines: Finding 82 Formal Analysis of MPI-based 123 Safe to the Last Instruction:
the Art in the Science Parallel Programs Automated Verification
What separates good code The goal is reliable parallel of a Type-Safe Operating System
from great code? simulations, helping scientists By Jean Yang and Chris Hawblitzel
By Robert Green and Henry Ledgard understand nature, from how
foams compress to how ribosomes
Articles’ development led by construct proteins. 132 Technical Perspective
queue.acm.org By Ganesh Gopalakrishnan, Anonymity Is Not Privacy
Robert M. Kirby, Stephen Siegel, By Vitaly Shmatikov
Rajeev Thakur, William Gropp,
About the Cover: Ewing Lusk, Bronis R. de Supinski, 133 Wherefore Art Thou R3579X?
This month’s cover story Martin Schulz, and Greg Bronevetsky Anonymized Social Networks,
(p. 64) investigates the
challenges of video Hidden Patterns, and
surveillance of crowded Structural Steganography
scenes. The authors
propose a framework By Lars Backstrom, Cynthia Dwork,
that treats the interactions
of people in a scene like
and Jon Kleinberg
moving particles in
a liquid, thus considering
techniques often found in
the study of hydrodynamics.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 3


communications of the acm
Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.

ACM, the world’s largest educational STA F F e ditor ial Board


and scientific computing society, delivers Director of Group P ublish ing  
resources that advance computing as a Scott E. Delman E ditor-i n -c hief
science and profession. ACM provides the publisher@cacm.acm.org Moshe Y. Vardi ACM Copyright Notice
computing field’s premier Digital Library eic@cacm.acm.org Copyright © 2011 by Association for
and serves its members and the computing Executive Editor Computing Machinery, Inc. (ACM).
Diane Crawford News
profession with leading-edge publications, Co-chairs Permission to make digital or hard copies
conferences, and career resources. Managing Editor of part or all of this work for personal
Thomas E. Lambert Marc Najork and Prabhakar Raghavan
Board Members or classroom use is granted without
Executive Director and CEO Senior Editor fee provided that copies are not made
Andrew Rosenbloom Hsiao-Wuen Hon; Mei Kobayashi;
John White William Pulleyblank; Rajeev Rastogi; or distributed for profit or commercial
Deputy Executive Director and COO Senior Editor/News advantage and that copies bear this
Jack Rosenberger Jeannette Wing
Patricia Ryan notice and full citation on the first
Director, Office of Information Systems Web Editor Viewpoints page. Copyright for components of this
Wayne Graves David Roman Co-chairs work owned by others than ACM must
Director, Office of Financial Services Editorial Assistant Susanne E. Hambrusch; John Leslie King; be honored. Abstracting with credit is
Russell Harris Zarina Strakhan J Strother Moore permitted. To copy otherwise, to republish,
Director, Office of SIG Services Rights and Permissions Board Members to post on servers, or to redistribute to
Donna Cappo Deborah Cotton P. Anandan; William Aspray; Stefan Bechtold; lists, requires prior specific permission
Director, Office of Publications Art Director Judith Bishop; Stuart I. Feldman; and/or fee. Request permission to publish
Bernard Rous Andrij Borys Peter Freeman; Seymour Goodman; from permissions@acm.org or fax
Director, Office of Group Publishing Associate Art Director Shane Greenstein; Mark Guzdial; (212) 869-0481.
Scott E. Delman Alicia Kubista Richard Heeks; Rachelle Hollander;
Assistant Art Directors Richard Ladner; Susan Landau; For other copying of articles that carry a
ACM Coun c il Mia Angelica Balaquiot Carlos Jose Pereira de Lucena; code at the bottom of the first or last page
President Brian Greenberg Beng Chin Ooi; Loren Terveen or screen display, copying is permitted
Alain Chesnais Production Manager provided that the per-copy fee indicated
Vice-President Lynn D’Addesio P ractice in the code is paid through the Copyright
Barbara G. Ryder Director of Media Sales Chair Clearance Center; www.copyright.com.
Secretary/Treasurer Jennifer Ruzicka Stephen Bourne
Alexander L. Wolf Public Relations Coordinator Board Members Subscriptions
Past President Virgina Gold Eric Allman; Charles Beeler; David J. Brown; An annual subscription cost is included
Wendy Hall Publications Assistant Bryan Cantrill; Terry Coatta; Stuart Feldman; in ACM member dues of $99 ($40 of
Chair, SGB Board Emily Williams Benjamin Fried; Pat Hanrahan; Marshall Kirk which is allocated to a subscription to
Vicki Hanson McKusick; Erik Meijer; George Neville-Neil; Communications); for students, cost
Co-Chairs, Publications Board Columnists Theo Schlossnagle; Jim Waldo is included in $42 dues ($20 of which
Ronald Boisvert and Jack Davidson Alok Aggarwal; Phillip G. Armour; is allocated to a Communications
Martin Campbell-Kelly; The Practice section of the CACM subscription). A nonmember annual
Members-at-Large Editorial Board also serves as
Vinton G. Cerf; Carlo Ghezzi; Michael Cusumano; Peter J. Denning; subscription is $100.
Shane Greenstein; Mark Guzdial; the Editorial Board of .
Anthony Joseph; Mathai Joseph;
Kelly Lyons; Mary Lou Soffa; Salil Vadhan Peter Harsha; Leah Hoffmann; C ontributed Articles ACM Media Advertising Policy
SGB Council Representatives Mari Sako; Pamela Samuelson; Co-chairs Communications of the ACM and other
G. Scott Owens; Andrew Sears; Gene Spafford; Cameron Wilson Al Aho and Georg Gottlob ACM Media publications accept advertising
Douglas Terry Board Members in both print and electronic formats. All
C ontac t P o ints Robert Austin; Yannis Bakos; Elisa Bertino; advertising in ACM Media publications is
Board Chair s Copyright permission Gilles Brassard; Kim Bruce; Alan Bundy; at the discretion of ACM and is intended
Education Board permissions@cacm.acm.org Peter Buneman; Andrew Chien; to provide financial support for the various
Andrew McGettrick Calendar items Peter Druschel; Blake Ives; James Larus; activities and services for ACM members.
Practitioners Board calendar@cacm.acm.org Igor Markov; Gail C. Murphy; Shree Nayar; Current Advertising Rates can be found
Stephen Bourne Change of address Bernhard Nebel; Lionel M. Ni; by visiting http://www.acm-media.org or
acmhelp@acm.org Sriram Rajamani; Marie-Christine Rousset; by contacting ACM Media Sales at
Regional C o unc il Chair s Letters to the Editor Avi Rubin; Krishan Sabnani; (212) 626-0686.
ACM Europe Council letters@cacm.acm.org Fred B. Schneider; Abigail Sellen;
Fabrizio Gagliardi Ron Shamir; Marc Snir; Larry Snyder; Single Copies
ACM India Council W e b S I TE Veda Storey; Manuela Veloso; Michael Vitale; Single copies of Communications of the
Anand S. Deshpande, PJ Narayanan http://cacm.acm.org Wolfgang Wahlster; Hannes Werthner; ACM are available for purchase. Please
ACM China Council Andy Chi-Chih Yao contact acmhelp@acm.org.
Jiaguang Sun Autho r Guide lines
http://cacm.acm.org/guidelines Research High lights Communi cations o f the AC M
Co-chairs (ISSN 0001-0782) is published monthly
P ublic ati o ns B oa rd
ACM Advertisin g De partme nt Stuart J. Russell and Gregory Morrisett by ACM Media, 2 Penn Plaza, Suite 701,
Co-Chairs
2 Penn Plaza, Suite 701, New York, NY Board Members New York, NY 10121-0701. Periodicals
Ronald F. Boisvert; Jack Davidson
10121-0701 Martin Abadi; Stuart K. Card; Jon Crowcroft; postage paid at New York, NY 10001,
Board Members
T (212) 869-7440 Shafi Goldwasser; Monika Henzinger; and other mailing offices.
Nikil Dutt; Carol Hutchins;
F (212) 869-0481 Maurice Herlihy; Dan Huttenlocher;
Joseph A. Konstan; Ee-Peng Lim;
Norm Jouppi; Andrew B. Kahng; POSTMASTER
Catherine McGeoch; M. Tamer Ozsu;
Director of Media Sales Daphne Koller; Michael Reiter; Please send address changes to
Vincent Shen; Mary Lou Soffa
Jennifer Ruzicka Mendel Rosenblum; Ronitt Rubinfeld; Communications of the ACM
jen.ruzicka@hq.acm.org David Salesin; Lawrence K. Saul; 2 Penn Plaza, Suite 701
ACM U.S. Public Policy Office
Guy Steele, Jr.; Madhu Sudan; New York, NY 10121-0701 USA
Cameron Wilson, Director Media Kit acmmediasales@acm.org Gerhard Weikum; Alexander L. Wolf;
1828 L Street, N.W., Suite 800
Margaret H. Wright
Washington, DC 20036 USA
T (202) 659-9711; F (202) 667-1066 W eb
Association for Computing Machinery Co-chairs
Computer Science Teachers Association (ACM) James Landay and Greg Linden
Chris Stephenson, 2 Penn Plaza, Suite 701 Board Members A
SE
REC
Y

Executive Director New York, NY 10121-0701 USA Gene Golovchinsky; Marti Hearst;
E

CL
PL

T (212) 869-7440; F (212) 869-0481 Jason I. Hong; Jeff Johnson; Wendy E. MacKay Printed in the U.S.A.
NE
TH

S
I

Z
I

M AGA

4 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


editor’s letter

DOI:10.1145/2043174.2043175 Moshe Y. Vardi

Computing for Humans


Gottfried Wilhelm Leibniz (1646−1716)
has been called the “Patron Saint of
Computing.” Leibniz is most famous
for the development—independently
of Isaac Newton—of the infinitesimal operating systems. Before the develop- discussion of Job’s business strategy,
calculus. (In fact, it is his mathemati- ment of C and Unix, programming— see Michael A. Cusumano’s “Technol-
cal notation we use today, rather than especially systems programming—was ogy Strategy and Management.”
Newton’s.) He was also a prolific inven- tightly connected to the underlying Undoubtedly, Jobs’ uniqueness
tor of mechanical calculators and de- hardware. C and Unix, in contrast, were was his relentless and singular focus
veloped the binary number system. highly portable. The citation for the on the human side of computing. To
Leibniz conceived of a universal 1983 Turing Award that Richie re- start with, the Apple and Apple II were
mathematical language, lingua char- ceived together with Ken Thompson personal computers, and the Mac’s
acteristica universalis, in which all hu- refers succinctly to “their development claim to fame was its user interface.
man knowledge can be expressed, and of generic operating systems theory.” The sequence of products that revo-
calculational rules, calculus ratiocina- There is no computer scientist who is lutionized computing over the past 10
tor, carried out by machines to derive not familiar with C and Unix, but it is years, the iPod, iPhone, and iPad, was
all logical relationships. Leibniz’s goal unlikely your cousin heard about them, unique in its focus on user experience.
was nothing short of prophetic: “Once unless she is also a computer scientist. In fact, the very term “user experience”
the characteristic numbers are estab- Undoubtedly, however, your cousin is was coined at Apple in the mid-1990s.
lished for most concepts, mankind will familiar with Steve Jobs. The success of Apple’s products over
then possess a new instrument that Steven Paul “Steve” Jobs (Febru- the past decade made this term quite
will enhance the capabilities of the ary 24, 1955−October 5, 2011) was the fashionable lately.
mind to a far greater extent than opti- founder of Apple, NeXT, and Pixar. His Yet the user has not always been and
cal instruments strengthen the eyes.” death received a tremendous amount is probably still not at the center of our
This definition of computing, as of worldwide news coverage and is ad- discipline. A quick perusal of ACM’s
an “instrument for the human mind,” dressed by three articles in this issue Special Interest Groups shows that
captures, I believe, the essence of our of Communications. It is hard to think their general focus tends to be quite
field. On one hand, our discipline is a of anyone in recent memory whose technical. In fact, one often encounters
technical one, focusing on hardware, passing received so much global at- among computing professionals an at-
software, and their theoretical founda- tention. This level of interest is by it- titude that regards the field of human-
tions. On the other hand, the artifacts self worthy of observation. As Jaron computer interaction as “soft,” imply-
we build are meant to enhance the Lanier makes clear in his essay, “The ing it is less worthy than the “harder”
human mind. This duality of our field Most Ancient Marketing,” Jobs was technical areas. In my own technical
is witnessed by the two pioneers we very much not an engineer. In fact, areas, databases and formal methods,
lost last October: Steve Jobs and Den- the title of one of the many essays I almost never encounter papers that
nis Ritchie, who passed away within a published in the wake of Jobs’ death pay attention to usability issues.
week of each other. is “Why Jobs Is No Edison.” Yet, it is The almost simultaneous depar-
Dennis MacAlistair Ritchie (Sep- difficult to point to anyone who had ture and Jobs and Ritchie should re-
tember 9, 1941−October 12, 2011) was as much impact on computing over mind us of the fundamental duality
the techies’ techie, as the creator of the last 30 years as Jobs. In fact, as of computing. As Leibniz prophesied,
the C programming language and the Genevieve Bell points out in her essay, computing is “an instrument for the
codeveloper of the Unix operating sys- “Life, Death, and the iPad: Cultural human mind.” Let us keep the human
tem. The C language paved the way for Symbols and Steve Jobs,” his impact in the center!
C++ and Java, while Unix was the basis goes beyond the world of computing,
for many of today’s most widely used well into the realm of culture. (For a Moshe Y. Vardi, editor-in-chief

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 5


membership application &
Advancing Computing as a Science & Profession
digital library order form
Priority Code: AD10

You can join ACM in several easy ways:


Online Phone Fax
http://www.acm.org/join +1-800-342-6626 (US & Canada) +1-212-944-1318
+1-212-626-0500 (Global)
Or, complete this application and return with payment via postal mail

Special rates for residents of developing countries: Special rates for members of sister societies:
http://www.acm.org/membership/L2-3/ http://www.acm.org/membership/dues.html
Please print clearly
Purposes of ACM
ACM is dedicated to:
Name
1) advancing the art, science, engineering,
and application of information technology
2) fostering the open interchange of
Address information to serve both professionals and
the public
3) promoting the highest professional and
City State/Province Postal code/Zip ethics standards
I agree with the Purposes of ACM:
Country E-mail address

Signature

Area code & Daytime phone Fax Member number, if applicable ACM Code of Ethics:
http://www.acm.org/serving/ethics.html

choose one membership option:


PROFESSIONAL MEMBERSHIP: STUDENT MEMBERSHIP:
o ACM Professional Membership: $99 USD o ACM Student Membership: $19 USD

o ACM Professional Membership plus the ACM Digital Library: o ACM Student Membership plus the ACM Digital Library: $42 USD
$198 USD ($99 dues + $99 DL) o ACM Student Membership PLUS Print CACM Magazine: $42 USD
o ACM Digital Library: $99 USD (must be an ACM member) o ACM Student Membership w/Digital Library PLUS Print
CACM Magazine: $62 USD

All new ACM members will receive an payment:


ACM membership card. Payment must accompany application. If paying by check or
For more information, please visit us at www.acm.org money order, make payable to ACM, Inc. in US dollars or foreign
currency at current exchange rate.
Professional membership dues include $40 toward a subscription
to Communications of the ACM. Student membership dues include o Visa/MasterCard o American Express o Check/money order
$15 toward a subscription to XRDS. Member dues, subscriptions,
and optional contributions are tax-deductible under certain
o Professional Member Dues ($99 or $198) $ ______________________
circumstances. Please consult with your tax advisor.
o ACM Digital Library ($99) $ ______________________
RETURN COMPLETED APPLICATION TO:
o Student Member Dues ($19, $42, or $62) $ ______________________
Association for Computing Machinery, Inc.
General Post Office Total Amount Due $ ______________________
P.O. Box 30777
New York, NY 10087-0777

Questions? E-mail us at acmhelp@acm.org Card # Expiration date


Or call +1-800-342-6626 to speak to a live representative

Satisfaction Guaranteed! Signature


letters to the editor

DOI:10.1145/2043174.2043176

To Boost Presentation Quality, Ask Questions

M
Editor’s
o s h e Y. Va r d i ’ s attempting to do better for users, we
Fewer Lines of Code
Letter “Are You Talk- might, in fact, do just the opposite. The
for More Results
ing to Me?” (Sept. 2011) authors recognized that developers
said conference attend- treat networks as opaque infrastruc- In Poul-Henning Kamp’s article “The
ees are sometimes un- ture, which is the fundamental archi- Most Expensive One-Byte Mistake”
able to follow speakers’ presentations tectural principle that has made the (Sept. 2011), did Ken, Dennis, and Bri-
and eventually give up trying. So how Internet so generative. an indeed choose wrong with NUL-ter-
about if ACM and IEEE would run an Classic telecommunications is the minated text strings? I say they chose
experimental conference where ses- business of providing services like the correctly, then and now. The reason C
sion chairs are expected to ask ques- public switched telephone network, or is dying and nobody has used PL/I, Al-
tions during presentations when they PSTN. The Internet is a different con- gol, or Pascal for real work for the past
themselves lose track or when audi- cept, providing a common infrastruc- 30 years is that C makes it possible to
ence members clearly stop paying ture for all services. Yet the very power accomplish a lot in a few lines of intui-
attention. Note such an experiment of the Internet, which allows us to tun- tive code despite requiring little mem-
would have to be done without undue nel through legacy telecom, has also ory or CPU power. Searching and com-
disruption and not allowed to reflect led us to accept the idea that it is just paring NUL-terminated strings can be
on a particular speaker. another service, like PSTN. accomplished with such short code
The biggest trade-offs would be the In the 1990s this was the plan for segments; programmers hardly need
extra time presentations might require home computers, too. Working at Mi- a standard library, and code compiles
and the possibility of upsetting over- crosoft (Jan. 1995), I realized that home into a few PDP-11 machine instruc-
ly sensitive speakers. However, they networking could be do-it-yourself tions. Failing to check untrusted data
could be addressed experimentally, rather than a service with a monthly bill is fatal in any language.
initially at small, highly technical con- and restrictions on what we do. I took C allows fast simple code written by
ferences with flexible break periods the approach of removing complexity competent programmers, and simple
and by selecting only expert, person- rather than adding solutions. Windows code tends to be less buggy and more
able chairs to manage the sessions. 98se supported the necessary protocols readable than complex code. For pro-
Robin Williams, San Jose, CA to “just work.” This involved the re- grammers who still want to use ad-
quirement that the user would not have dress + length strings, such use can
to buy any service beyond a single IP be accomplished in just a few lines.
Author’s Response: but share a single IP address. I wanted There is, of course, the strlen() func-
I agree that ACM and IEEE conferences to use IPv6 so each device would have tion to measure the string’s length
should experiment to improve the quality of a first-class presence. But because IPv6 and the fgets() function to limit how
their talks. Some ideas can be implemented was not available at the time, I used many characters to read into a string
fairly easily, as in, say, asking conference Network Address Translation to share a from a file.
attendees to give anonymous feedback single IPv4 address. Sure, copying large strings can
to speakers. However, one must keep in Rather than make the home net- run faster with newer hardware if the
mind that conferences are grassroots work smarter and more cognizant of string lengths are known. This is a
operations, and experiments cannot be the particulars of the home, we must trade-off, and programmers can, if de-
dictated by association governing bodies. honor the end-to-end principle and sired, use address + length strings
Rather, the effort to improve conference treat the Internet as infrastructure. De- in C and even word-align them. For
talks must be undertaken by conferences velopers would thus be relieved of the others, there is always “C with Train-
on their own initiative. impossible burden of having to under- ing Wheels,” a.k.a. Pascal or Java, if
Moshe Y. Vardi, Editor-in-Chief stand the home environment and its one is in no special hurry for results.
inhabitants. Any number of approach- Good programmers write secure
es could coexist. code; bad programmers write insecure,
Adopt the End-to-End Principle Today’s Internet protocols date buggy code. Good practices are more
in Home Networks from when big computers were immo- valuable than “magic” language fea-
To address the user-experience con- bile and relationships could be defined tures. The largest Java application I know
cerns raised in “Advancing the State through fixed IP addresses. To preserve is also the buggiest application I know.
of Home Networking” by W. Keith Ed- this simplicity, we need stable relation- Bob Toxen, Atlanta, GA
wards et al. (June 2011), we must first ships for our untethered devices. This
Communications welcomes your opinion. To submit a
understand why home networks have way, we could address sources of com- Letter to the Editor, please limit yourself to 500 words or
been so successful despite the very plexity rather than their symptoms. less, and send to letters@cacm.acm.org.
real difficulties cited in the article. In Bob Frankston, Newton, MA © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 7


The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

doi:10.1145/2043174.2043178 http://cacm.acm.org/blogs/blog-cacm

Conferences and Video and acceptable resolution of typical


talk slides. Overall, these differences

Lectures; Scientific
are substantial enough that YouTube
is not presently a serious alternative.
So, if we can’t avoid paying the cost,

Educational Games is it worthwhile? One way to judge this


is by comparing how much authors
currently spend traveling to a confer-
John Langford analyzes whether conferences should offer video ence and presenting research vs. the
lectures. Judy Robertson discusses the merits of the Game Design size of the audience. In general, costs
Through Mentoring and Collaboration project. vary wildly, but for a typical academic
international conference, airfare, ho-
tel, and registration are commonly at
John Langford least $1,000 even after scrimping. The
“To Video Lecture John Langford size of audiences also varies substan-
or Not”
http://cacm.acm.org/
“Video lectures tially, but something in the 30–100
range is a typical average. For KDD
blogs/blog-cacm/100785 increase the size 2010, the average number of views
October 29, 2010
For the first time in several years, ICML
of the audience per presentation is 14.6, but this is
misleadingly low, as KDD presenta-
2010 did not have VideoLectures.net at- and the value to tions were just put up. A better num-
tending. Luckily, the tutorial on explo-
ration and learning that Alina Beygel-
authors by perhaps ber is for KDD 2009, where the average
view number is presently 74.2. This
zimer and I put together can be viewed a factor of two number is reasonably representative
since we also presented at KDD 2010,
which included video lecture support.
for a cost about with ICML 2009 presently at 115.8.
We can argue about the relative mer-
ICML didn’t cover the cost of a ⅓ of current its of online vs. in-person viewing,
video lecture, because PASCAL didn’t
provide a grant for it this year. On the
presentation costs.” but the order of their value is at least
unclear, since in an online system
other hand, KDD covered it out of reg- people specifically seek out lectures
istration costs. The cost of video lec- to view while at the conference itself
tures isn’t cheap. For a workshop the people are often opportunistic view-
baseline quote we have is 270 euros per ers. Valuing these equally, we see that
hour, plus a similar cost for the cam- video lectures increase the size of the
eraman’s travel and accommodations. free video site with a cost of $0, but audience, and the value to authors by
This can be reduced substantially by it turns out to be a poor alternative. perhaps a factor of two for a cost about
having a volunteer with a camera han- Fifteen-minute upload limits do not 1/3 of current presentation costs.

dle the cameraman duties, uploading match typical talk lengths. Video lec- This conclusion is conservative, be-
the video and slides to be processed for tures also have side-by-side synchro- cause a video lecture is almost surely
a quoted 216 euros per hour. nized slides and video that allows viewed over more than a year, costs of
YouTube is the most predominant quick navigation of the video stream conference attendance are often high-

8 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


blog@cacm

er, and the cost in terms of a present- about the effectiveness of such
er’s time is not accounted for. Overall, Judy Robertson schemes. In an article published
video lecture coverage seems quite
worthwhile. Since authors also typi-
“If we can’t earlier this year in Computers and
Education, Vos and colleagues 2 com-
cally are the attendees of a conference, convince every pared students’ motivation and use
increasing the registration fees to
cover the cost of video lectures seems
child to become of strategies for deep learning when
they either played a simple memory
reasonable. A video lecture is simply a a computer scientist, drag-and-drop game or constructed
new publishing format.
We can hope that the price will
any kind of scientist their own such game. The children
enjoyed making games more than
drop over time as it’s not clear to me will have to do.” playing them, and were more likely
that the 216 euros per hour reflects the to use deep learning strategies while
real costs of VideoLectures.net. Some doing so. A notable finding from this
competition of a similar quality would study was that the children were less
be the surest way to do that. But in the motivated in the play condition than
near future, whether or not a confer- by their normal classroom lessons.
ence has video lecture support sub- This just goes to show that if you’re
stantially impacts its desirability as a Judy Robertson going to spend classroom time on a
place to send papers. “Game Design game, it had better be good or you
Through Mentoring might as well not bother. Or perhaps
Reader’s comment and Collaboration” a more positive way of looking at
I share your emphasis on the impor- http://cacm.acm.org/ that would be to say it takes a high-
tance of collecting public video lec- blogs/blog-cacm/101956 quality game to beat an enthusiastic
tures from academic conferences. November 19, 2010 teacher.
However, your main reason for not us- I was interested to read about a re- Game genre and graphical quality
ing YouTube reflects a misconception, ally fantastic National Science Foun- are likely to be factors here. The sim-
and you ignore the many significant dation-funded project called Game ple 2D board game style application
advantages in features and reach that Design Through Mentoring and Col- in this study looks rather dull in com-
make YouTube worth a closer look. laboration (GDMC). Taking place at parison to the sort of action game you
1. “Partner channels” do not have McKinley Tech and George Mason might find gracing the screen of a Wii.
the 15-minute upload limit as can be University, the project encourages It may well be that making an online
seen with various academic confer- young people into STEM careers board game is more fun than playing
ence channels such as USENIX1, as through weekend and summer cours- it only because playing it isn’t that
well as various open source confer- es in computer game design. I par- exciting to start with. In contrast, the
ence channels. The limit seems to be ticularly like two aspects of GDMC. GDMC students learn a wider range
getting conference organizers to up- First, the students learn with slightly of technical skills that enable them
load content. more experienced peer mentors as to make 3D games with proper phys-
2. YouTube provides automatic well as an instructor. This can be a ics. I think this is pretty important
machine-generated captions and ma- very effective model because both because, in my experience, kids want
chine translation into ~50 languages, the mentor and the mentee can learn to make games that look and feel as
which greatly expands the reach to the a lot, and it gives the teacher much- good as the games they play at home.
hearing impaired, non-English speak- needed assistance in a busy class full After all, they want their friends to be
ers, etc. of temperamental computers and impressed when they play them. So,
3. The captioned text of conference children. (If you want to know more hats off to the students on GDMC:
videos on YouTube is indexed by the about different models for effective Your counterparts across the pond in
Google search engine, so a query for an mentoring, Kafai et al.1 is a good Scotland (see http://www.adventure-
obscure technical concept that is ver- place to start.) Second, the students author.org) salute you!
bally mentioned in a lecture will show also learn about science subjects and
up in search results for related queries. integrate their new knowledge into References
1. Kafai, Desai, Peppler, Chiu, and Moya, “The multiple
This is a huge benefit for the dissemi- their games, e.g., after input from a roles of mentoring,” The Computer Clubhouse:
nation of knowledge. Federation of American Scientists bi- Constructionism and Creativity in Youth Communities,
Kafai, Peppler, Chapman (Eds.), Teachers College
In any case, I hope more sites such ologist, the students’ games included Press, New York, 2009.
as VideoLectures.net add captions that accurate information about antibiot- 2. Vos, van der Meijden, Denessen, “Effects of
constructing versus playing an educational game on
can be indexed by search engines in the ics, glial cells, and neurotransmit- student motivation and deep learning strategy use,”
future so that there is further competi- ters. If we can’t convince every child Computers & Education 56, 1, January 2011; DOI:
10.1016/j.compedu.2010.08.013.
tion and feature development to make to become a computer scientist, any
it easier and cheaper in this space. kind of scientist will have to do. John Langford is a senior researcher at Yahoo! Research
—Murray Stokely Game design projects are increas- in New York. Judy Robertson is a lecturer at Heriot-Watt
University.
Reference
ingly popular in education, and the
1. http://www.youtube.com/user/USENIXAssociation evidence is starting to accumulate © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 9


acm member news

DOI:10.1145/2043174.2043179 ACM
Nominees for Elections Member
and Report of the ACM News
Nominating Committee Mentoring With
Mary Fernández
Before Mary
In accordance with the Constitution and Bylaws of the ACM, the Nominating Fernández
arrived at
Committee hereby submits the following slate of nominees for ACM’s officers. Brown
In addition to the officers of the ACM, five Members at Large will be elected. University as
The names of the candidates for each office are presented in random order below: a freshman in
1981, she had
never seen a computer. Then,
President (7/1/12–6/30/14): an influential Brown professor,
Barbara G. Ryder, Virginia Tech Andries van Dam, told her
Vinton G. Cerf, Google introduction to computer
science class that everyone
would have one in their home
Vice President (7/1/12–6/30/14): someday. She was astonished.
Mathai Joseph, Advisor, Tata Consultancy Services “He said we’d carry them
around in our pockets,”
Alexander L. Wolf, Imperial College London
Fernández says. “This seemed
ludicrous. At that time, computers
Secretary/Treasurer (7/1/12–6/30/14): were still the size of trucks.”
George V. Neville-Neil, Neville-Neil Consulting Van Dam’s teaching
captured her imagination to the
Vicki L. Hanson, University of Dundee point where she was “hooked,”
says Fernández, who switched
Members at Large (7/1/12–6/30/16): her major to computer science.
Radia Perlman, Intel As executive director of
distributed computing research
Ricardo Baeza-Yates, Yahoo! Research, Barcelona/Santiago for AT&T Labs Research in
Feng Zhao, Microsoft Research, Beijing Florham Park, NJ, Fernández
Eric Allman, Sendmail Inc. is immersed in cloud
infrastructure research, but
Mary Lou Soffa, University of Virginia her other passion is inspiring
P.J. Narayanan, IIIT-Hyderabad young women and minority
Eugene Spafford, Purdue University students to pursue a career
in computer science. In 1998,
Fernández joined MentorNet,
The Constitution and Bylaws provide that candidates for elected offices of the which matches mentors and
ACM may also be nominated by petition of one percent of the Members who as protégés, and became a board
of November 1 are eligible to vote for the nominee. Such petitions must be ac- member in 2008. She was
elected chair earlier this year.
companied by a written declaration that the nominee is willing to stand for elec-
For her professional
tion. The number of Member signatures required for the offices of President, Vice accomplishments and work
President, Secretary/Treasurer, and Members at Large, is 683. with MentorNet, Fernández
The Bylaws provide that such petitions must reach the Elections Committee was recently honored with
the Outstanding Technical
before January 31. Original petitions for ACM offices are to be submitted to the Achievement Award from
ACM Elections Committee, c/o Pat Ryan, COO, ACM Headquarters, 2 Penn Plaza, HENAAC/Great Minds in STEM.
Suite 701, New York, NY 10121, USA, by January 31, 2012. Duplicate copies of the One of Fernández’s key
petitions should also be sent to the Chair of the Elections Committee, Gerry Se- goals now is to raise awareness
Photogra ph court esy of M a ry F ern andez, AT&T L a bs R esea rch

about opportunities in STEM


gal, c/o ACM Headquarters. All candidates nominated by petition are reminded of and computer science in her
the requirements stated in the Policy and Procedures on Nominations and Elec- own ethnic community. “The
tions that a candidate for high office must meet in order to serve with distinc- Latino immigrant population,
in particular, often doesn’t even
tion. Copies of this document are available from Rosemary McGuinness, Office know that computer science
of Policy and Administration, ACM Headquarters. Statements and biographical is an option for them,” she
sketches of all candidates will appear in the May 2012 issue of Communications. says. “Latino parents should
The Nominating Committee would like to thank all those who helped us with know that STEM fields will
provide their children with
their suggestions and advice. stable careers, excellent starting
salaries, and ample room for
 ame Wendy Hall, Chair
D growth. It is important for them
to understand that their child’s
Anand Deshpande, Ben Fried, Cherri Pancake, and Robert A. Walker
university education will pay
huge dividends in the future.”
—Dennis McCafferty

10 commun ications of t h e acm | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


N
news

Science | doi:10.1145/2043174.2043180 Kirk L. Kroeker

The Rise of scholar in bioengineering at Caltech,


and Erik Winfree, a Caltech profes-
sor of bioengineering and computer

Molecular Machines science, computation, and neural sys-


tems, used DNA-based components to
build the circuit. Instead of depending
The field of molecular computing is achieving new levels of control
on electron flows through transistors,
over biochemical processes and fostering sophisticated connections the DNA logic gates receive and pro-
between computer science and the biological sciences. duce molecules as their signals. The

T
molecular signals travel from one gate
a k ing cue s from both spec- to another, connecting the circuit as if
ulative fiction and hard sci- the molecules were wires.
ence, today’s most prolific With his colleagues Georg Seelig,
futurists have envisioned a David Soloveichik, and David Zhang,
point in the future when de- Winfree first built a biochemical cir-
velopments in genetics, nanotechnol- cuit in 2006. In that work, DNA signal
ogy, and robotics make it possible to molecules connected several DNA
sidestep the constraints of human du- logic gates to each other, forming a
rability and intelligence. Controversial multilayered circuit consisting of 12
assumptions notwithstanding, even molecules. In the new design, Qian
the most optimistic speculation about and Winfree made the logic gates
the future symbiotic convergence of from pieces of single- and double-
humans and technology is deriving at stranded DNA. The two researchers
least some measure of credibility from have made several circuits with this
emerging work in molecular comput- approach; the largest, containing 74
ing. Researchers in this field are achiev- different DNA molecules, can com-
Wiring diagram specifying a biochemical
ing new levels of control over biological circuit that consists of 74 different DNA pute the square root of any number
processes and fostering sophisticated molecules. The circuit, developed at up to 15 and round the answer down
Caltech, demonstrates an approach for
crossovers between computer science implementing arbitrary digital logic in to the nearest integer.
and the biological sciences. biochemical systems. The lines correspond During the calculation process, the
to single-stranded oligonucleotides, while
In one recent development, scien- the nodes correspond to partially double- custom-built molecules float around
tists in the department of molecular stranded molecules. the solution and bump into each oth-
computing at the California Institute er, prompting strands with a certain
of Technology (Caltech) have built decision-making capabilities. Such DNA sequence to zip themselves to
what they are calling the most complex circuits, they say, will give biochemists compatible strands while simultane-
biochemical circuit ever created from unprecedented control over chemical ously unzipping other strands. The
scratch. These circuits, the Caltech reactions for biological and chemical unzipped strands are released back
researchers say, will allow scientists engineering and may even lead to the into the solution to continue the cycle
to explore the principles of informa- proliferation of molecular-scale bio- until the calculation process is com-
tion processing in biological systems logical machines. plete. The researchers simply monitor
and design biochemical pathways with Lulu Qian, a senior postdoctoral the concentrations of output mole-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 11


news

crypt text and images. The research-


ers working on the project, headed
by Ting Fung Chan, estimate that
one gram of bacteria can store up to
650,000 gigabytes, roughly equiva-
lent to 325 hard drives, each with two
terabytes of capacity.
Chan, a professor in the school of
life sciences and the deputy director
of the center for microbial genomics
and proteomics at CUHK, explains
that the idea for the project initially
came from a conversation about As-
sassin’s Creed, a video game in which
the protagonist has inherited his
ancestor’s memory through DNA.
In the original iGEM competition,
Chan and his team encoded a short
text message in a single piece of DNA
and inserted it into bacterial cells,
where the DNA was reshuffled, then
retrieved and decoded.
Storing small bits of data, accord-
Heading to the instrument room at Caltech. In the box are the pipettes and test tubes
containing molecules for a biochemical circuit: the 74 tubes each contain one type of DNA ing to Chan, is not the project’s ulti-
logic gate or fuel, and the eight tubes contain input signal strands. To get the DNA circuit mate goal. Currently, he and his team
running, the 74 gate and fuel molecules must be mixed to form the functional circuit, and
a combination of the eight input signal strands must be added. Fluorescence must then are designing a parallel storage system
be monitored during the next 10 hours to determine the circuit’s output. in which the DNA sequence would be
cut into segments and stored in mul-
cules to determine the answer, which mated them, and got them to work at a tiple cells. “We are pursuing a true
takes some 10 hours to compute. much larger scale,” he says. “Doing so, massively parallel storage system,”
While the logic gates have identical of course, involved some conceptual Chan says. “Storing a large piece of in-
structures and can therefore be stan- advances that might be considered a formation, such as a photograph or a
dardized, Winfree says the research breakthrough.” dictionary, is impossible to do within
still faces several significant challeng- To Winfree, the potential appli- a single piece of DNA because of the
es, such as automating the design and cations for such programmable bio- limits associated with current DNA
analysis process and finding ways to chemical systems are limitless. Even- synthesis technology.”
control the inevitable faulty behavior tually being able to use such systems Simply fragmenting the informa-
of the floating molecules. In addition, for diagnosing disease or delivering tion and inserting it into multiple cells
Winfree says it is not clear how well custom-designed cellular therapies, without having an effective method for
such molecular systems can be scaled for example, is a common refrain in rebuilding the original file would de-
to take on tasks more complex than rel- discussions about the utility of molec- stroy the data because the order of frag-
atively basic math. According to Win- ular computing. “You wouldn’t say that ments would be unknown, explains
free, the millions of logic gates that are the purpose of electronic computers is Chan. So he and his team devised a
common in silicon-based computing to play video games and keep track of biological system that uses standard
will not be possible for DNA circuits, at banks’ financial records,” he says. “The storage mechanisms—such as headers
least in the near term. same goes for circuitry at the molecu- and checksums—so each piece of in-
Despite the evident limitations of lar scale; there are a lot of things you formation can be mapped and located
DNA circuits, researchers working in can do with molecules and chemistry.” for retrieval. The method involves re-
this area are already achieving more moving DNA from the cells, altering it
control over biochemical processes DNA-based Data Storage with enzymes, and returning it to new
than has ever before been possible. In another molecular-computing cells where the DNA sequence will be
“We’re trying to design DNA systems project, researchers at the Chinese shuffled. “The idea is to demonstrate
that self-assemble, implement bio- University of Hong Kong (CUHK) are that we can, in principle, store any digi-
chemical circuitry, and act as molecu- using DNA for data storage. The proj- tal information,” says Chan.
lar robots,” says Winfree, who is mod- ect, which recently won the gold med- Chan notes that while he and his
est in his assessment of the science al at the International Genetically team have made progress at encoding
and engineering involved in creating Engineered Machine (iGEM) compe- pictures in several DNA sequences
the molecular circuit with Qian. “It tition at the Massachusetts Institute and storing them in parallel in mul-
took previously demonstrated proofs of Technology, is said to be the first to tiple cells, research in this area has
of principle, simplified them, auto- use the DNA of E. coli to store and en- essentially just begun. He says it will

12 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


news

take at least another five to 10 years Still, as the pace of innovation in


for the core ideas to develop. “Syn- this area of research continues to ac-
thetic biology is an emerging field, The largest circuit, celerate, more crossovers between
and there is special research fund- which contains molecular computing and traditional
ing only in some parts of the world,” computing will emerge. Many prin-
says Chan. “This is multidisciplinary 74 different DNA ciples from electrical engineering and
research, and training of a new gen- molecules, can computer science are already being
eration of young researchers is very used to design molecular systems, as
much needed.” compute the square was done in the Caltech project, which
While most research in molecular root of any number used the principles of digital abstrac-
computing has focused on construct- tion and signal restoration to make
ing small modules in which individual up to 15 and round the circuits capable of withstanding
genetic units carry out a particular the answer down to imperfections, and in the CUHK proj-
task, Chan says developing a complete ect, which used traditional storage
machine will require genome-scale en- the nearest integer. and cryptography paradigms to en-
gineering that depends not on the indi- code data in bacteria.
vidual building blocks, but on a deep Winfree points out that if the pace
understanding of how these individual of innovation in molecular comput-
building blocks interact with each ing continues, in 30 years scientists
other. To this end, he says, a key chal- will be able to make systems built
lenge to overcome in this area is to gain puting is beginning to be a real con- with 20,000,000 nucleotides, which
a much more complete understanding tender for some simple applications,” is larger than E. coli’s complete ge-
of individual cellular components so he says, noting that every few years nome. “It’s hard to imagine what such
they can be accurately modeled as an there is a breakthrough that changes designed-from-scratch molecular sys-
integrated system. what people inside and outside the tems will be capable of doing,” says
Despite such challenges, the tools field think is possible. Winfree. While it remains to be seen
and techniques used by scientists By way of example, he cites Paul whether such systems will eventually
working in molecular computing are Rothemund’s DNA origami project, be able to augment human capabili-
beginning to mature, suggesting the the goal of which was to fold DNA into ties in a manner consistent with the
possibility that some of these projects tiny shapes and images. “Before DNA visions of today’s futurists, Chan, like
may soon form the groundwork for origami, most folks, including me, Winfree, remains optimistic about
commercial applications. Caltech’s would not have thought it possible, the potential of molecular computing.
Winfree, for his part, predicts the field and afterwards it was routine,” says “I do believe that this is no longer sci-
will gain significant momentum dur- Winfree. “What’s next is impossible ence fiction, as I would have thought
ing the next 10 years. “Molecular com- to predict.” when I was a kid,” he says.

Further Reading
Qian, L. and Winfree, E.
Scaling up digital circuit computation
with DNA strand displacement cascades,
Science 332, 6034, June 3, 2011.
Ran, T., Kaplan, S., and Shapiro, E.
Molecular implementation of simple logic
programs, Nature Nanotechnology 4, 10,
October 2009.
Rothemund, P.
Folding DNA to create nanoscale shapes
and patterns, Nature 440, 7083, March 16,
2006.
Storm, D.
Unhackable data in a box of bacteria: Future
of InfoSec? Computerworld, January 18,
2011.
Zyga, L.
Biomolecular computer can autonomously
sense multiple signs of disease, PhysOrg.
com, July 6, 2011.
An overview of the biological data storage system developed at the Chinese University
of Hong Kong. In the bacteria-based storage system, binary files are compressed and
split into data packets. Each packet contains a payload, an address, error-correction Kirk L. Kroeker works in communications and has written
code, and an optional encryption marker. A binary-to-quaternary base conversion is extensively about the impact of emerging technologies.
performed on the encoded data, followed by substituting the quaternary numbers for
the four DNA bases. © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 13


news

Technology | doi:10.1145/2043174.2043181 Gregory Goth

Brave NUI World


Natural user interface developments, such as Microsoft’s Kinect,
may indicate the beginning of the end for the mouse.

C
o l o r a d o Sp r i ng s , C O. - b a s e d
independent software de-
veloper Kevin Connolly be-
came a minor YouTube ce-
lebrity over the past several
months with the demonstration of
his natural user interface (NUI) hack
of Microsoft’s Kinect software devel-
opment kit (SDK). Connolly demon-
strated moving different images on a
small bank of screens up and down,
in and out, and sorted through three-
dimensional image arrays, all through
gesture alone. It was, he notes, very
similar to the image manipulation
featured in Steven Spielberg’s 2002 fu-
turistic film, Minority Report.
“I’m just some guy,” Connolly says.
“I made that work in a matter of hours.
Imagine what we have the technology
to do if one guy in his apartment can
do that in a few hours.”
Indeed, Microsoft’s decision to re-
lease the Kinect SDK in June has gar-
nered much attention in the technical Audrey Penven created this photograph and similar ones by using Kinect’s infrared
and technology trade press and among structured light as a source of light.
researchers and enthusiasts. Anoop
Gupta, a Microsoft distinguished sci- the man behind the Minority Report tute of Technology students undertook
entist, says more than 100,000 indi- interface and has commercialized a study of motion-sensing technolo-
viduals downloaded the SDK in the it via Los Angeles-based Oblong In- gies originating in gaming, and their
first six weeks after its release. How- dustries, at which he is chief scien- possible uses in other computation-
ever, the terms of the release forbid tist. Underkoffler calls the Microsoft ally intensive fields. The students ex-
any commercial use of the SDK, and SDK release a “rhetorical event we all plored the depth camera technology in
Connolly says he halted work on his love.” He says these events, like the Kinect, the inertial sensors of the Wii,
nascent NUI after he got it working to 2006 release of Nintendo’s Wii, “puts and electromagnetic sensing technolo-
his satisfaction. Yet, the release of the in the foreground for different sets of gy developed by Sixense Entertainment
SDK was a signal that low-cost motion eyes—the end consumer for the Wii, and Razer. One of the students, Peter
and depth-sensing technology may the home or dorm room hacker for Ngo, believes the idea of the NUI as a
soon herald epochal changes in the the Kinect SDK—the idea that it isn’t fully toolless interface is overstated, as
way humans and computers interact. going to be mouse and keyboard for- does Amir Rubin, CEO of Sixense.
The Kinect hardware, for instance, ever. We’ve seen the dialogue go from “People don’t buy motion control,”
is manufactured by Tel Aviv-based one of doubt or questioning to a kind Rubin says. “They don’t buy PCs. They
PrimeSense, and lists for about $200. of acceptance. Everyone now knows it don’t buy consoles. They buy the ex-
Researchers in numerous disciplines isn’t going to be mouse and keyboard perience being delivered to them. The
call such technology for such a price forever, but the real question is, What best input device is something you
Photogra ph by Aud rey P enven

absolutely revolutionary. is it going to be?” don’t remember is on you, and that is


Some of the next steps of NUI re- true even for the slogan sold by Micro-
NUI Research search are likely to be through gaming soft—’You are the controller.’ I agree,
John Underkoffler’s experience with applications, which are increasingly but if I have to alter the way I use my
NUI research goes far beyond hob- dependent upon motion-sensing tech- body in order to interact with Kinect,
byist SDKs. In fact, Underkoffler is nology. Recently, four California Insti- then it’s the worst controller.”

14 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


news

Even the most ardent NUI advocates Another NUI developer, Evan Lang
agree with Rubin. Oblong’s Under- of Seattle-based UI design firm Iden-
koffler, for instance, says that writers “People don’t buy tityMine, says his work with trying to
will likely be well served by the key- motion control,” develop Kinect NUIs (on PrimeSense
board for the foreseeable future, but drivers) similar to current GUI com-
that those who design ship hulls or air- Amir Rubin says. mands revealed vexing user issues. In
plane wings would be better served by “They buy the developing a Web button, for instance,
three-dimensional NUIs. Moreover, he Lang says, “I programmed it to rec-
says it is vital not to graft a notion of a experience being ognize a poking gesture, where you
new interface design by simply extend- delivered to them. move your hand quickly forward and
ing two-dimensional GUI concepts quickly back. When I got some test
onto the prototypes of three-dimen- The best input users to try it out, and said ‘Poke it or
sional applications. These applications device is something press it,’ everybody had a very differ-
will need computational capabilities ent idea of what that actually meant.
along not just the flat x and y axes— you don’t remember Some did a kind of poking thing. Other
an example might be a wall-sized but is on you.” people moved their hand forward but
still two-dimensional application for wouldn’t move it back, and others,
designing the very three-dimensional who were very cautious and deliberate
ship’s hull he mentioned—but will about it, the machine wouldn’t regis-
also need to compute the depth of the z ter as a poke.”
axis. Oblong’s g-speak platform, based Oblong’s Underkoffler says prob-
on work Underkoffler pioneered in the lems such as Lang encountered are
1990s at the Massachusetts Institute top of different middleware modules. emblematic of grafting current GUI-
of Technology’s Media Lab, computes It also enables middleware developers based mechanics on an idea that needs
this spatial environment via networked to write algorithms on top of raw data something else.
computers and screens that allow rich formats, regardless of which sensor “We believe it’s not appropriate to
three-dimensional interaction. Ulti- device has produced them, and offers start talking about NUIs until you have
mately, Underkoffler thinks a hybrid sensor manufacturers the capability to a complete solution,” Underkoffler
UI ecosystem will evolve. build sensors that power any OpenNI- says. “If you flash back 30 years, it’s
“We’re not out to replace the key- compliant application. like dropping an early prototype of a
board; let it do what it’s best at,” he In fact, one nascent healthcare in- mouse in everybody’s lap and saying,
says, “but when it comes to designing dustry application partially built on ‘We have a new interface.’ You don’t,
airplane wings, you do need a spatial open source stacks by a team of sur- because you’ve just got a new input de-
UI. So, it’s about situating the right ac- geons and engineers at Sunnybrook vice. So, really it’s a full loop proposi-
tivities in the right interaction zone.” Health Sciences Center in Toronto for tion. What’s the input modality? What
the Kinect camera is already drawing shows up on screen? What’s the ana-
Homebrewed Algorithms attention. logue of the windows and scroll bars
Since the Microsoft SDK precludes Allowing surgeons access to medi- and radio buttons? Until that’s not
commercial use, many early academic cal images while not having to touch a only been answered in a way to allow
and enterprise projects using Prime- controller—and thereby saving them real work to happen, but has become
Sense and/or stripped down Kinect the necessity to re-scrub in order to kind of a standard, and more in the
hardware use either homebrewed al- preserve sterility around the patient— cognitive sense, recognizably and per-
gorithms or open source drivers and is an early enterprise triumph for the vasively present, then you don’t have a
middleware released by consortia NUI concept. Computer vision special- new interface.”
such as OpenCV or OpenNI, the natu- ist Jamie Tremaine says the gesture- Sixense’s Rubin predicts the next-
ral interface forge formed in Novem- based UI he and his colleagues devel- generation standard UI device would
ber 2010, by PrimeSense and robotics oped has proven exceptionally robust not be a question of which technology
pioneer Willow Garage. OpenNI lever- and enables surgeons to view through is most elegant, but rather, that which
ages the PrimeSense depth-sensing MRI and CT scan samples that can run meets three criteria: a consumer-
technology, which is processed in from 4,000 to 10,000 slides without friendly price, an intuitive UI design,
parallel by PrimeSense’s system-on-a- ever having to re-scrub. and ease of software development on
chip processor after receiving coded For such an application, the hand top of that device.
near-infrared light from its partnered and arm gestures recognized by the “If you can meet the combination
CMOS sensor. Kinect camera are suitable, but Tre- of those three,” he says, “then you will
OpenNI supplies a set of APIs to be maine says “a lot of the work we’ve have the next-generation standard of
implemented by the sensor devices, done hasn’t even been on the techni- input devices.”
and a set of APIs to be implemented cal side as much as creating gestures
by the middleware components. Thus, in the operating room that allow very Output Perceptions
OpenNI’s API enables applications to fine-grained control, but which have to Robotics researchers such as Nicholas
be written and ported to operate on be larger.” Roy, associate professor of aeronau-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 15


news

Vehicles equipped with the three- gestures using the body as an antenna,
dimensional sensors—among them Proceedings of the 2011 Annual Conference
Allowing surgeons a helicopter Roy and his students on human factors in computing systems,

access to medical programmed—gain what Roy calls a


Vancouver, British Columbia, Canada,
May 7–12, 2011.
“whole new sense of the human-cen-
images while not tered environment,” and are able to Gallo, L., Placitelli, A.P., and Ciampi, M.

having to touch a sense things such as drop-offs in stairs,


Controller-free exploration of medical image
data: Experiencing the Kinect, Proceedings
table legs, and so on. And now, with
controller—and more researchers exploiting the ex-
of the 24th IEEE International Symposium
on Computer-Based Medical Systems, Los
thereby saving them tremely cheap sensor technology, it is Alamitos, CA, June 27–30, 2011.
likely that more UI work will be done ex-
the need to re-scrub ploring how a robot, or any other com-
Henry, P., Krainin, M., Herbst, E., Ren, X.,
and Fox, D.
in order to preserve puter, will “think” its way to interacting RGB-D mapping: Using depth
cameras for dense 3D modeling of
with humans.
sterility around the “A lot of the estimation and plan-
indoor environments, Proceedings
of the International Symposium on
patient—is an early ning algorithms my students have Experimental Robotics, New Delhi, India,
Dec. 18–21, 2010.
developed for the helicopter, we ac-
enterprise triumph tually reused in the context of inter- Shotton, J., et al.
for the NUI concept. acting with robots,” Roy says. “If the Real-time human pose recognition in parts
from single depth images, Proceedings of
problem is no longer how the vehicle
the 24th IEEE Computer Vision and Pattern
plans to get from Point A to Point B, Recognition, Colorado Springs, CO, June
but the problem is how does the ve- 20–25, 2011.
hicle understand what the human Underkoffler, J., Ullmer, B., and Ishii, H.
wants in terms of some task, then our Emancipated pixels: Real-world graphics
research programs have a lot of com- in the luminous room, Proceedings of
tics at the Massachusetts Institute monality between those two seem- the Special Interest Group on Computer
of Technology, were among the first ingly very different domains.” Graphics, Los Angeles, CA, August 8–13,
1999.
to adopt the PrimeSense sensor, and
it may be their work that shows the Further Reading Gregory Goth is an Oakville, CT-based writer who
specializes in science and technology.
longest-range potential for, and a new Cohn, G., Morris, D., Patel, S.N., and Tan, D.S.
concept of, what a NUI will be. Your noise is my command: Sensing © 2011 ACM 0001-0782/11/12 $10.00

Milestones

Katayanagi Prizes and Other CS Awards


President Obama, ACM, IEEE, contributions to America’s Emerging Leadership Prize for Computer Engineering Award “for
Carnegie Mellon University, and competitiveness and quality of his demonstration of leadership innovations in high-performance
Tokyo University of Technology life and helped strengthen the promise. message-passing architectures
recently honored some of the nation’s technological work and networks.”
nation’s leading computer force. The recipients were Rakesh ACM and IEEE Awards
scientists. Agrawal, Purdue University; B. Susan L. Graham, the Pehong SIGSAC Awards
Jayant Baliga, North Carolina Chen Distinguished Professor ACM’s Special Interest Group
National Medals of State University; C. Donald of Electrical Engineering and on Security, Audit and Control
Science, Technology, Bateman, Honeywell; Yvonne Computer Science Emerita (SIGSAC) presented its top
and Innovation C. Brill, RCA Astro Electronics of the University of California honors to Virgil Gligor, co-
President Obama honored (retired); and Michael F. Tompsett, Berkeley, received the ACM-IEEE director of Carnegie Mellon
Richard A. Tapia, director of the TheraManager. Computer Society Ken Kennedy University’s CyLab, and
Center for Excellence and Equity Award “for contributions to Ravishankar Iyer, director of the
in Education at Rice University, Katayanagi Prizes computer programming tools Center for Reliable and High-
with the 2010 National Medal Carnegie Mellon University in that have significantly advanced Performance Computing at the
of Science for “his pioneering cooperation with the Tokyo software development.” Cleve University of Illinois. Gligor
and fundamental contributions University of Technology Moler, chairman of MathWorks, received the SIGSAC Outstanding
in optimization theory and presented the Katayanagi was this year’s winner of the Innovation Award for innovations
numerical analysis and for Prize for Research Excellence IEEE Computer Society Sidney in secure operating systems as
his dedication and sustained to Barbara Liskov, Institute Fernbach Award “for fundamental well as covert channel analysis,
efforts in fostering diversity and Professor at the Massachusetts contributions to linear algebra, intrusion detection, and secure
excellence in mathematics and Institute of Technology, for her mathematical software, and wireless sensor networks. Iyer
science education.” record of outstanding, sustained enabling tools for computational received the SIGSAC Outstanding
The President also presented achievement. Scott Klemmer, science.” Charles Seitz, former Contributions Award for his
National Medals of Technology associate professor of computer president of Myricom, Inc., was fundamental contributions to the
and Innovation, recognizing science at Stanford University, awarded the IEEE Computer assessment and design of secure,
scientists who have made lasting received the Katayanagi Society’s 2011 Seymour Cray dependable computing systems.

16 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


news

Society | doi:10.1145/2043174.2043182 Dennis McCafferty

Activism Vs. Slacktivism


Today’s activists are highly plugged into social media, mobile apps, and
other digital tools. But does this make a difference where it matters most?

I
f yo u ne e d convincing that the
state of activism in the digi-
tal age is alive and well, look
no further than the Web site
for the Program on Liberation
Technology at Stanford University.
On the program’s high-profile email
list group, a consumer advocate gives
updates about the California Public
Utilities Commission’s investigation
into a proposed merger of AT&T and
T-Mobile. Another user promotes a
letter-writing campaign to urge fed-
eral lawmakers to protect funding for
the Directorate for Social, Behavioral
and Economic Sciences at the National
Science Foundation. And a third offers
cautionary advice to fellow organiz-
ers: “Don’t type anything you wouldn’t
want snooped on your iPad. Someone Engaged or disengaged? A pair of protesters with their smartphones at an anti-Al Khalifa
has developed software which uses protest last February in London, England.
computer vision to do keylogging.”
Other postings focus on the Arab the poor, promote economic develop- participatory support through Face-
spring, environmental sustainability, ment, and pursue a variety of other so- book, Kickstarter, and other sites.
and a host of other progressive causes, cial goods.” Yet, while no one disputes that on-
which is understandable since the Of course, there’s plenty to find on line initiatives like these draw greater
Stanford program’s stated purpose is the right-leaning side of the ideologi- attention to a cause, opinion varies
“to understand how information tech- cal table. At TeaPartyPatriots.org, for with respect to whether they make a
nology can be used to defend human example, you can use a locator to track significant, lasting impact. A number
rights, improve governance, empower down events scheduled in your city or of respected thinkers say technol-
state, buy a Tea Party Patriots coloring ogy does not really advance activism
book, and join a Government Account- to achieve its most critical goals: to
No one disputes ability Project group. change the hearts and minds of the
The upshot is no matter what your public, and effect real change.
that activists’ online cause is, you can find a great way to On the other side of the debate are
efforts draw greater connect these days. Activists are mak- activists and other influencers who
ing full use of blogs, social media sites, counter that the impact on hearts
attention to a cause,
Photogra ph By Ga il Orenstein, Co urt est y of t h e web 3 .0 l a b/cli ma

mobile apps, and other tools to pro- and minds cannot be measured.
but opinion varies mote their message and gain support. What can be measured are user-traf-
Nothing grabs the heartstrings like fic numbers generated, e-petition
with respect video, and participants are producing signatures delivered, Facebook “like”
to whether they streaming content to take advantage counts, and other metrics that convey
of this. It makes one think of how ef- growing support.
make a significant, fective technology could have been
lasting impact. through history. Consider how the U.S. A Contrarian View
founding fathers would have tweeted The conversation here is essentially
Paul Revere’s famous cry as “Brits R positioned as a debate over activism
Coming,” post real-time video of his versus slacktivism. The latter term re-
nighttime ride on Facebook, and so- fers to people who are happy to click
licit the French and other sympathetic a “like” button about a cause and may
European supporters for financial and make other nominal, supportive ges-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 17


news

tures. But they’re hardly inspired with much more efficient phone trees.”
the kind of emotional fire that forces Some of those downplaying the
a shift in public perception. A telling, impact of online activism will even ar-
supportive anecdote: A popular tech- gue that its ability to generate “boots-
nique of organizers on all sides of the on-the-ground” user engagement is
political spectrum is an online letter- overstated. Tufts University sociology
writing campaign in which support- professor Sarah Sobieraj likens mod-
ers are encouraged to simply copy and ern efforts as more of an infatuation
paste from a template form of the let- with technology with little to show
ter. Participants aren’t asked to come for it. For her book, Soundbitten: The
up with their own words. It’s not even Perils of Media-Centered Political Activ-
clear if they read the entire content of ism, Sobieraj researched the methods
the letters they send. Does a simple of more than 50 different groups fo-
“copy/paste/send” act constitute activ- cused on shaping discourse—includ-
ism at its finest? ing United for Change, Pre-Born Pro-
In one of the more widely discussed tectors, and the Freedom and Equality
articles casting doubt, New Yorker League—and concluded that their In-
contributor Malcolm Gladwell main- ternet strategies have done little to in-
tains that successful efforts must en- fluence the public.
gage participants by convincing them Perhaps the greatest irony? As much
that they have a great personal stake as these groups enjoy beating up the
in the consequences. Traditionally, A protester captures the scene at an Occupy mainstream media or claim that their
highly effective movements evolved Portland rally in Portland, OR last October. use of new media is infinitely more ef-
from within parties built upon “strong fective than traditional media, these
tie” personal connections, such as tests over collective-bargaining rights same groups covet coverage from ma-
those among classmates and church for state union employees in Wiscon- jor journalism outlets. “They’re very
members. Activism associated with sin, as the liberal public-policy group old- media-centric,” Sobieraj says.
social media, however, is dependent MoveOn.org led a solidarity day in “When they talk about strategies,
upon “weak tie” relationships, writes which 50,000 supporters turned out they’re most focused on broadcast TV
Gladwell. Organizers seek involve- in all 49 other state capitals and raised and even newspapers. If they get men-
ment from Twitter followers they have more than $3 million to support Wis- tioned in a New York Times or Boston
never met or Facebook friends with consin Democrats. Globe feature, that’s what they’re really
whom they would never otherwise stay “The Wisconsin protest was old- after.”
in touch, according to Gladwell. These school organizing, with a digital
are loose networks, whereas meaning- edge,” says Dave Karpf, an assistant Committed to Tech
ful activism requires strong, robust or- professor in communications/infor- People both involved with and sup-
ganizational structure. mation at Rutgers University and a portive of online activism concede that
Even in the case of the Arab spring— leading researcher on political blogs they really cannot measure how much
arguably the political movement most and Internet-mediated activist orga- technology inspires people to “do
enhanced by multiple digital means— nizations. “Angry citizens felt their something.” But they say any kind of
those casting doubt upon the influence rights were being trampled, so they attention generated—either by main-
of technology contend that the events showed up and demonstrated. It was stream press or otherwise—increases
would have mattered little if old-fash- the largest extended labor action in a the opportunity to change minds and
ioned principles of activism were not generation, and it was led by labor or- instigate action. The Internet has es-
applied: effectively planned mass as- ganizations, fighting for collective bar- tablished platform upon platform to
semblies in which passionate pleas for gaining rights.” present a position in multiple formats.
change were expressed. The fact that Similarly, the Tea Party isn’t a It allows for the exchange of views on
the Arab spring demonstrations got new social movement either, accord- a said position. It increases the capa-
YouTubed, Facebooked, and tweeted ing to Karpf. It’s traditional conser- bility for calls to action and pure orga-
is simply a logical progression in the vatism that intelligently embraces nizational logistics. In other words, if
continuing advancement of multime- new-media technologies. “The Tea the new techniques of activism serve
dia, just as broadcasting civil rights Party’s biggest successes—disrupt- to amplify and even help better orga-
demonstrations on TV news during the ing health-care town hall meetings, nize the old, what is wrong with that?
1960s at one time seemed novel in its winning Republican primaries— Besides, technology and activism
Photogra ph by Willia m Wa lsh

ability to connect a cause with a nation- were a boots-on-the-ground affair, are a perfect match, says Brie Rogers
wide audience. with people arriving and causing a Lowery, a contributing strategist for
In the end, activism has always ruckus,” says Karpf. “Web sites and FairSay, an eCampaign consultancy.
been—and will always be—about Twitter were useful in helping activ- The very founding principle of Web
people. Specifically, people who show ists identify those meetings more 2.0 itself is based upon the same ideas
up in person. Just witness the pro- easily. But they’re basically acting as that fuel efforts toward change. Those

18 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


news

principles include the need to interact, using Twitter, blogs, and wikis. Data
share, and pursue goals. drives activism. The dialogue has
“Technology offers huge potential The Tea Party reached a deafening point online and
to connect,” says Rogers Lowery. “An isn’t a new social everyone has a cause. So it takes hard
obvious example is Obama’s election evidence to turn heads.”
campaign, which was mobilized pri- movement, says Whether those heads remain
marily online and utilized the full range Dave Karpf. turned—and join the cause—is subject
of new media. But the use of technology to continued debate.
in activism extends to all kinds of cam- It’s traditional
paigns, such as the use of SMS in South conservatism Further Reading
Africa to report cases of child abuse in
remote communities.” that intelligently Durbin, P.T.
Philosophy, activism, and computer and
Rogers Lowery, who organized a digi- uses new-media information specialists, Ubiquity, November
tal activism debate at Oxford University
earlier this year, says it is time to move technologies. 2007.
GetInvolved.ca
the discussion from the “cyber-skeptic Social Media: Politics 2.0—The Power of the
view” that online activism is somehow Citizen, http://www.youtube.com/watch?v=
less legitimate and inferior to older ap- 1vrczoLm7Es&feature=autoplay&list=PLE
proaches. “Instead,” she says, “there’s a 8382F8E085EFF12&index=3&playnext=2,
Jan. 21, 2010.
need to show how ‘old’ and ‘new’ activ-
ism can work together to serve.” sewage and other contaminants in an Gladwell, M.
Small change: Why the revolution will not
It is not simply a matter of using effort to get citizens to keep the har-
be tweeted, The New Yorker, Oct. 4, 2010.
technology in greater numbers. It is bor cleaner,” says Ed Borden, who
about everyday citizens finding cre- oversees technology and business Karpf, D.
Wisconsin and the limits of web power, The
ative ways to exploit it in ways previ- development for Pachube. “We have Guardian, Feb. 25, 2011.
ously not conceived to advance a cause, another New Yorker who’s collect-
Land, M.B.
supporters say. Pachube.com, for ex- ing data to support his contention of Networked Activism, Harvard Human Rights
ample, links activists to data tools that noise pollution created by the Federal Journal 22, 9/10, Sept. 28, 2009.
can help establish, manage, and share Aviation Administration. In Japan,
the quantified basis of their positions. the citizens crowdsourced to come up Dennis McCafferty is a Washington, D.C.-based
“We’ve had a Brooklyn user who with radiation data after the Fukushi- technology writer.

built an alert system to help monitor ma disaster in March, self-organizing © 2011 ACM 0001-0782/11/12 $10.00

Technology

Low-Cost Robots Could Transform Science


A new generation of inexpensive is hoping for funding to allow They contain integrated radio thousands of robots, a
robots could make the him to sell his R-one machines communications, infrared daunting computational task,
machines ubiquitous, opening to researchers and educators sensors, motors, and an he hopes a physical algorithm
up robotics to new areas of at cost. embedded Python interpreter of a handful of robots can act
research, says James McLurkin, “I think what he’s doing is for programming. as an accurate representation
assistant professor of computer great,” says Rodney Brooks, McLurkin is interested of a larger group. That in
science and director of the professor emeritus of robotics in physical data structures, turn will let him write new
robotics lab at Rice University. at the Massachusetts Institute using robots as elements algorithms so the swarms can
“I wanted to have something of Technology, and McLurkin’s in an algorithm. A robot’s perform complex tasks.
the community could use to do undergraduate advisor. Brooks position in space can be a Going from a handful to
research,” McLurkin says. “In thinks cheap robots could have unit of information that can a large number of robots can
order for this to have an impact, the same effect on his field be manipulated by moving actually transform a problem,
it has to be low cost.” that moving from expensive it around or keeping it in a he says. Instead of four robots
McLurkin studies multi- mainframes to desktops particular orientation. A simple wandering through a building
robot systems in which swarms had in computing. “Every bubble sort algorithm, which with laser scanners to make
of robots work together to student having a robot, and sorts a list into the right order, a map, for instance, he could
perform a task, like searching then being able to get them can be rendered physical with send in hundreds and make
a building for earthquake to work together, will unleash robots. “Their position in the a map simply by noting the
survivors. Much of the work in creativity on the physical world world indicates the state of the positions of the machines.
such systems has been done in the way that the PC did on sort,” he explains. Having many affordable
through computer simulations, the cyberworld.” This approach could robots will let him test his ideas.
because building many robots The robots are inexpensive provide a new way of thinking “Until you put your robots
is too expensive. But now mainly because the spread about the behavior of multi- where your mouth is, you
McLurkin has built a robot for of smartphones has robot systems. Instead of really don’t know if you’ve got
about $280, compared to $2,000 driven down the cost of modeling the individual something,” McLurkin says.
for the previous version. He sophisticated electronics. motions of hundreds or —Neil Savage

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 19


news

Education | doi:10.1145/2043174.2043183 Samuel Greengard

CSEdWeek Takes Hold


Groups in more than 130 countries will participate
in Computer Science Education Week this year.

I
t i s n ot h i ng short of ironic
that in the digital age, instruc-
tion about computers and
computing is woefully lacking.
A 2010 ACM report, Running
on Empty, found that only nine states
in the U.S. count computer science
courses as a core academic subject in
high school graduation requirements
and the total number of courses of-
fered by secondary schools has de-
clined over the last several years. Yet,
by 2018, a projected 1.4 million new
computing jobs will exist and the cur-
rent pipeline of graduates will fill only
about half of these positions.
“In some schools where AP com- MIT’s Leah Buechley engages students with her method of creating cloth printed circuit boards
puter science was once taught, classes in the form of wallpaper.
have been eliminated,” says Debra
Richardson, a professor of informat- The fallout is significant, says teacher at Henry M. Gunn High School
ics at University of California, Irvine. Ruthe Farmer, director of the Nation- in Palo Alto, CA. Last year, he led the
“In others, real computer science has al Center for Women & Information visit to Berkeley and helped develop a
never been taught and what is called Technology and vice chair for CSEd- programming contest that attracted
computing or computer science is just Week. Businesses and other institu- nearly 50 students. “It offered a win-
literacy in technology and applica- tions consistently lose out on talent dow into computer science puzzles
tions. The difference is whether you as individuals that could find work as and problems,” he explains.
understand how to create computing computer engineers, designers, and At the University of Puerto Rico,
technology or are just able to use it.” developers stream into other fields. Mayaguez Campus, a group of pro-
As a result, computing scientists, ed- The lack of women and minorities fessors and students focused on the
ucators, and others are banding togeth- compounds the problem—particular- theme “Our Lives Without Computer
er to raise awareness about the impact ly as organizations focus on designing Science.” They created an award-win-
of computing in today’s society. CSEd- better products and solutions across a ning video as well as a demonstration,
Week, which originated in 2009, focus- diverse group of consumers. with both hardware and software, of
es on how computer science education CSEdWeek aims to address these the classic Simon Says game. School
prepares today’s youth for the digital gaps and social inequities. The orga- children could play with the hardware,
age. The December 4–10 event features nization has asked individuals from edit the software, and learn about how
programs at businesses, universities, around the world to pledge support everything interconnects. “It’s power-
and K–12 schools that are designed to and develop an educational activity or ful because they can see themselves as
stimulate interest in computing sci- program in their community. Groups future computer scientists,” says Nay-
ences and show the viability of careers from more than 130 countries, includ- da G. Santiago, an associate professor
in the field. ing Brazil, India, and Kenya, are now in the school’s electrical and comput-
According to Richardson, who involved in the initiative. er engineering department.
Photogra ph court esy of Lea h Buech ley, MIT

chairs CSEdWeek, the U.S. and other At the University of California, “Schools must move beyond basic
countries are falling further behind the Berkeley, more than 50 high school technology literacy curriculum and
computing curve. From 2005 to 2009, students visited the campus in 2010 to add courses that warm students up to
the percentage of U.S. high schools of- learn about robotics, animation, arti- computer science,” concludes Rich-
fering classes in computing sciences ficial intelligence, game analysis, and ardson. “The future depends on it.”
has fallen from 40% to 27%. In addi- other topics. “It’s important to get stu-
tion, only 17% of those taking advanced dents exposed to computing sciences Samuel Greengard is an author and journalist based in
West Linn, OR.
placement computing science tests are at a young age,” says Joshua Paley, a
women and 11% are minorities. computer science and mathematics © 2011 ACM 0001-0782/11/12 $10.00

20 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


news

In Memoriam | doi:10.1145/2043174.2043202 Paul Hyman

Dennis Ritchie, 1941–2011


Colleagues recall the creator of C and codeveloper Unix, an unassuming
but brilliant man who enjoyed playing practical jokes on his coworkers.

O
f t h e t h ree giants in the
computer industry who
passed away last October,
Steve Jobs was easily the
most recognizable one.
And that is exactly how Dennis Ritchie
preferred it.
Even though much of today’s digi-
tal world is built from tools he created,
Ritchie, who authored the C program-
ming language and cocreated Unix
with Ken Thompson, never sought the
spotlight.
Brian Kernighan, who worked at Bell
Labs alongside Ritchie and Thompson
for more than 30 years and is now a
computer science professor at Princ-
eton University, observes, “Jobs was
very out in public, which was one of his
strengths. Dennis was a private person Ken Thompson (left) and Dennis Ritchie received the National Medal of Technology in 1999
and didn’t do any self-salesmanship. from President Clinton.
But the work Jobs did at NeXT and Ap-
ple built on what Dennis did because Douglas McIlroy, an adjunct professor 1989 with the help of magicians Penn
all those programs are fundamentally of computer science at Dartmouth Col- and Teller. [See the prank at http://www.
written in C or derivatives like C++ and lege, who had been a manager at Bell youtube.com/watch?v=fxMKuv0A6z4.]
Java. Life would be very different with- Labs and knew Ritchie for nearly 50 His sense of humor also shows in
out the work Dennis did singlehand- years ever since Ritchie’s first summer his work. “In perhaps the trickiest
edly in just a few months.” job there in 1962. part of the Unix code,” notes McIlroy,
C might be Ritchie’s crowning “Dennis was a fixture at meetings “where a couple of instructions play
achievement as it is regarded as one of the Usenix users group,” McIlroy with hardware registers as if by mag-
of the world’s two most influential noted. “Crowds networking in the cor- ic, there is a comment by Dennis that
programming languages (the other is ridors would break to pack his talks. says: ‘You are not expected to under-
Fortran). C, of course, is not a very large Every newcomer wanted to see and stand this.’ That’s been published over
language, mainly because the DEC hear the man behind the system. Old and over again on T-shirts.”
PDP-11 minicomputer Ritchie ran it hands came to listen to the master Ritchie was 70 when he was found
on was technologically constrained, so perhaps even more eagerly. If you read dead in his Berkeley Heights, NJ,
there wasn’t much room to get fancy, one of his papers, you’ll see why. Den- home. He had been in frail health in re-
which, Kernighan notes, was fine given nis combines a perfect control of the cent years after treatment for prostate
Photogra ph court esy of B ell L a bs / Lucen t T ec hn ologies

Ritchie’s minimalistic approach. technical matter with a polished but cancer and heart disease.
“Dennis and Ken worked together easy writing style, and an unerring “Dennis was thoughtful, he was to-
on Unix,” says Kernighan, for which sense of how much to say. That felic- tally approachable,” says McIlroy, “but
the duo received the ACM A.M. Turing ity is also on display on his home page, I think he will best be remembered as
Award in 1983. “He always said Ken did which offers engaging pieces about an extremely talented, bright guy who
most of the work with just some of his his work.” created something we absolutely all
assistance, but that’s characteristically But not everything on his Bell Labs use—and he never really sought credit
modest on Dennis’ part.” home page relates to work. A practical for it.”
Last May, Bell Labs hosted a cer- joker, Ritchie also details “Labscam,”
emony in Murray Hill, NJ, in honor an elaborate prank that he and col- Paul Hyman is a science and technology writer based in
Great Neck, NY.
of Ritchie and Thompson who had league Rob Pike pulled on their boss,
won the Japan Prize. One speaker was Nobel prize winner Arno Penzias, in © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 21


V
viewpoints

doi:10.1145/2043174.2043204 Jaron Lanier

The Most Ancient Marketing

B
fa-
e fo r e Appl e, Ste ve j o bs Another way in which Jobs emulated
mously went to India with the practices of gurus is in the psychol-
his college friend Dan Kottke. Jobs imported ogy of pseudo-asceticism.
While I never had occasion the marketing Consider the way he used physical
to talk to Jobs about it, I did spaces. Jobs always created personal
hear many a tale from Kottke, and I techniques of India’s and work spaces that were spare like
have a theory I wish I had a chance to gurus to the business an ashram, but it is the white Apple
try out on Jobs. store interior that most recalls the
Jobs loved the Beatles and referred of computation. ashram. White conveys purity, a holy
to them fairly often, so I’ll use some place beyond reproach. At the same
Beatles references. When John Lennon time, the white space must be highly
was a boy, he once recalled seeing El- structured and formal. There must be
vis in a movie and suddenly thought to a tangible aura of discipline and ad-
himself, “I want that job!” The theory The process is described in an essay herence to the master’s plan.
is that Jobs saw gurus in India, focal by Alan Watts on how to be a guru that The glass exteriors and staircases
points of love and respect, surrounded was well known around the time Apple of elite Apple stores go further. They
by devotees, and he similarly thought was first taking off. The successful guru are temples, and I imagine they might
to himself, “I want that job!” is neither universally nor arbitrarily someday be repurposed for use along
This observation is not meant as scornful to followers, but there should those lines. (Maybe, some decades
a criticism, and certainly not as an be enough randomness to keep them from now, our home 3D printers will
insult. It simply provides an explana- guessing and off guard. When praise just pop out the latest gadgets, leaving
tory framework for what made Jobs a comes, it should be utterly piercing stores empty.)
unique figure. and luminous, so as to make the recipi- There is yet another Beatles refer-
For instance, he liberally used ent feel as though they’ve never known ence to bring up: It was Yoko Ono who
the guru’s tactic of treating certain love before that moment. first painted a New York City artist’s
devotees badly from time to time as Apple’s relationship with its cus- loft white. Conceptual avant-garde art
a way of making them more devoted. tomers often followed a similar course. invites people to project whatever they
I heard members of the original Ma- There would be a pandemic of bleat- will project into it, and yet the artist of-
cintosh team confess that they suc- ing about a problem, such as a phone fering a white space, or the silence of
cumbed. They were tangibly stunned that lost calls when touched a certain John Cage’s “4'33"” still becomes well
by it, repeatedly. They recognized it way, and somehow the strife seemed to known. This is the template followed
happening in real time, and yet they further cement customer devotion in- by Apple marketing.
consented. Jobs would scold and hu- stead of driving them away. What other A dual message is conveyed. The
miliate people and somehow elicit an tech company has experienced such a white void is empty, awaiting you and
ever more intense determination to thing? Jobs imported the marketing almost anything you project into it.
attempt to win his approval, or more techniques of India’s gurus to the busi- The exception is the surrounding in-
precisely, his pleasure. ness of computation. stitution—the business—which is not

22 communicat ions of t he ac m | D ec emb e r 2 0 1 1 | vo l. 5 4 | no. 1 2


viewpoints

something to be projected away. Lennon’s “Sexy Sadie” ridiculed the It is perhaps surprising that so few
While that setup might seem to only guru shtick, while McCartney’s “Fool figures in tech companies have been
benefit the establishment offering the on the Hill” praised it, and they were able to push engineers around enough
white space, there is actually a benefit singing about the same guru. These to enforce principles of elegance and
to the visitor who projects what they two songs could well be applied to simplicity, as understood by non-engi-
will into it. It’s like a good parent or the appeal of Apple under Jobs. Yes, neers. Apple’s commercial success has
lover who will listen endlessly without he manipulated people and was often created a better atmosphere for such
complaint but also sets boundaries. not a nice guy, and yet he also did ei- things in all the companies. But how
Narcissism can then be indulged with- ther elicit or anticipate the passions did Jobs do it in the first place?
out the terror of being out of touch or of his devotees, over and over. (No one My impression, based on a number
out of control. This formula is a mag- can say what the mix of eliciting ver- of interactions I witnessed over many
net for human longings. sus anticipating really was.) years, is that Jobs traded one form
It’s all about you, iThis and iThat, Jobs didn’t just use pseudo-asceti- of obsessive, principled nerdiness
but we will hold you, so you won’t cism for marketing. He wielded purist against another. It was useless for a
screw yourself up. Of course, that’s not fanaticism so as to have power in the typical designer or marketing person
really a possible bargain. To the degree world of nerds. This is how it came to to plead with engineers during the
you buy into the ashram, you do give be that Jobs is so often remembered early years of personal computers. En-
Illustration by glueki t / Photo gra ph by AP P hoto/Paul Sa kuma

up a certain degree of yourself. Maybe as an “inventor,” though he rarely was gineers had airtight criteria and data,
that’s not a bad thing. It’s like how one. His genius was not technical, but and that trumped mere opinions and
Apple customers experience culture in he was a genius at manipulating tech- intuitions. But Jobs didn’t plead. He
general through the lens of Apple cura- nical minds. declared even more rigid and exact-
tion whenever they use a tablet. Maybe An example is Jobs’ obsession with ing criteria.
it’s the right mix for some people. But engineering beautiful fonts into per- Jobs won the arms race of control
one ought to be aware. sonal computers. While plenty of peo- freakery. He remains the only figure in a
It’s tempting to ridicule this aspect ple wanted this (Don Knuth comes to non-engineering role I have ever seen win
of Job’s legacy, but everything people mind), it wasn’t easy to make such a this race against engineers outright.
do is infused with some degree of du- luxury into a high-priority item in the
plicity. This is doubly true of marketing. engineering culture that drove early Jaron Lanier is Partner Architect, Microsoft Research,
and Innovator in Residence, USC Annenberg School.
Putting the duplicity up front PC companies. But Jobs often men-
might be best. Back to the Beatles: tioned his pride at having done it. Copyright held by author.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 23


V
viewpoints

doi:10.1145/2043174.2043205 Genevieve Bell

Life, Death, and the iPad:


Cultural Symbols
and Steve Jobs

I
n 1 9 9 7 , St e v e j o b s rejoined the
Apple he had left years earlier.
For the next 14 years, he led that
company to create technology
that found a global audience.
Indeed, the period of his leadership
coincided with astonishing changes in
the profile of technology users. In 1998,
more than 75% of the world’s Internet
users were in the U.S., today it is less
than 15%. The complexion of the Web—
its users, their desires, their languages,
points of entry and experiences—has
subtly and not so subtly changed over
that period. All these new online par-
ticipants bring with them potentially
different conceptual models of infor-
mation, knowledge, and knowledge sys-
tems with profound consequences for
the ideological basis of the Net. These
In traditional Chinese culture, people burn paper offerings for their ancestors. In a funerary
new participants also operate within goods store in Singapore, Genevieve Bell bought the last two paper iPads—complete with
different regulatory and legislative re- paper travel case and finger smudges—one of the most popular items for the hereafter.
gimes, which will bring markedly dif-
ferent ideas about how to shape what ble to make products that would find religious occasions, for festivals, for
happens online. And in this same time their place in living rooms from Ba- ceremonial events—part of both public
period, the number and kind of digi- kersfield to Beijing and many points worship and private devotion. In this
tal devices in people’s lives has grown in between. Furthermore, Jobs was cosmology, or world view, fire trans-
and changed. Devices have proliferated much clearer than Ford or Edison that forms all these paper objects into real
with ensembles and debris collecting in he was creating experiences, not tech- things in the other world. At funerals,
the bottom of backpacks, on the dash- nologies or products. He, and Apple, and during Qingming—a yearly festival
boards of dusty trucks, and in drawers, were creating a new symbolic register at which ancestors and family are hon-
cabinets, and baskets. in which we all might participate, even ored—you burn paper money, paper
Many of those devices owe their if we all didn’t purchase. gold nuggets, paper clothes, paper cars,
contours, if not their direct produc- As an anthropologist, I am always in- paper cigarettes, paper beer, paper
photogra ph court esy of G enevi ev e Bell

tion, to Apple and Steve Jobs. In the terested in this notion of symbols, and pork buns, paper false teeth; and even
days that followed Steve Jobs’ death, I have collected all manner of things a range of everyday household items. In
he was frequently compared to Henry along the way. My fondest collection, Shanghai in the 1930s, wealthy families
Ford and Thomas Edison, inventors by far, is that of paper offerings from burned full-sized paper copies of Rolls
both and men who helped shape the Chinese funerary goods stores. In tra- Royce and Bentley cars to ensure their
American landscape. But Jobs was an ditional Chinese culture, people burn ancestors had appropriate transporta-
inventor who came of age in a very dif- paper offerings for gods, ghosts, and tion. Family obligations, in this world
ferent world, one where it was possi- ancestors. There are paper objects for view, do not end with death. Instead, as

24 commun ications of t h e ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

a good child, you take care of your fam-


ily and see to their needs in perpetuity.
Queuing up to buy
Calendar
Today, in many Chinese communities
around the world, a range of newer in-
formation and communication tech-
a paper iPad for of Events
nologies have been added to the pyre of your dead relatives December 17–20
paper goods. was a quintessential International Conference on
Wireless Technologies for
And over the years, in funerary goods
stores in Penang, Ipoh, Kuala Lumpur, expression of Humanitarian Relief,
Amritapuri, India,
Singapore, Hong Kong, and even New filial piety, an act Contact: R. Chidambaram,
Email: ammasvcoffice@gmail.com
York City, I have found all manner of
paper technologies—a desktop PC of love and care. December 18–20
with a Windows operating system, USB Seventh Asia Information
ports, and a mouse; a flat panel LCD TV Retrieval Societies Conference,
screen with a remote control and HDMI Dubai, UAE,
Contact: Oroumchian Farhad,
outputs; game consoles with all the but- Email: farhadOroumchian@
tons and hints of small blinking lights; uowdubai.ac.ae
and a branded mobile phone, prepaid complete with paper leather traveling
phone cards, a charger and a carrying case and the smudges of fingerprints December 18–21
International Conference on
case. The products are always subtly re- on the paper screen, the logo tweaked Wireless Technologies for
branded—Nakia, Panosonic—and the to have two bites out of the apple. There Humanitarian Relief,
logos are tweaked, but they are recog- were only two and by the usual econo- Amritapuri, India,
nizable technology. mies of the store, they weren’t cheap. Contact: R. Chidambaram,
Email: ammasvcoffice@gmail.com
On my first field trip years ago, as I More expensive than the phones and
bought my first paper Nekia candy-bar the foot-massage machine, but I bought January 3–7
phone, I had to ask the question: “Who them both. Chatting with the shop- Fourth International
are the ancestors calling, are they calling keeper, I asked: “Why so few iPads?” He Conference on Communication
Systems and Networks,
you?” This question made one woman laughed and said “Well, they are iPads— Bangalore, India
laugh out loud. “No,” she said emphati- they are scarce.” Retelling this story to Contact: Don Towsley,
cally, “don’t be foolish; they are calling colleagues in Penang several days later, Email: towsley@cs.umass.edu
each other.” At least one other family I revealed a similar shortage. Laughingly,
January 6–8
spoke with told me of upgrading their Penang residents said: “That’s nothing. Innovations in Theoretical
ancestors’ mobile phones every year Here the newspapers carried the ad- Computer Science,
during Qingming—burning them a dresses of the stores that had iPads and Cambridge, MA,
Sponsored: SIGACT,
newer model or a better brand—and of iPhones and people actually queued to Contact: April Mosqus,
providing them with more prepaid min- get them, and they sold out. We even Email: mosques@hq.acm.org
utes. Keeping up and keeping connect- posted on Facebook looking for extras.”
ed are pressures that can, apparently, The dead it appears also like Jonathan January 10–12
Annual Conference of ACM
transcend death. I have learned to see Ive’s sleek lines and Steve Jobs’ extraor- SIGART
these stores, and the things they sell, as dinary vision. on Artificial Intelligence
a proxy for what is desirable in the world It was clear in that moment that Innovation,
of the living. I sometimes think you can Apple products had crossed the line be- San Francisco, CA,
Contact: Qiang Yang,
visit these stores and know all you need tween mere technologies and beloved Email: qyang@cse.ust.hk,
to know about people’s daily preoccu- object, they had become something you Phone: 852-235-88768
pations and their aspirations. would not live without, or indeed be dead
I was in Malaysia and Singapore do- without. They were as valuable a pur- January 17–19
ACM-SIAM Symposium on
ing fieldwork in the run-up to Qing- chase for your ancestors as for you. And Discrete Algorithms,
ming this year and the stores were full queuing up to buy a paper iPad for your Kyoto, Japan,
again of this year’s aspirations and de- dead relatives was a quintessential ex- Contact: David S. Johnson,
Email: dsj@research.att.com,
sires. I happened into one such store in pression of filial piety, an act of love and Phone: 908-582-4742
Singapore near Toa Payoh with its fa- care. Those paper iPads and iPhones that
mous beef noodles. I poked around the were burned this year remind us that Ap- January 18–20
store—paper Hermès suitcases, paper ple under Steve Jobs’ leadership wasn’t ACM International Workshop
on Timing Issues in the
automatic foot-massage machines, pa- just making technology; it was making Specification
per Mercedes cars with capped driver, experiences, and those experiences tran- and Synthesis of Digital
full paper suitcases of paper clothes, scend death. Systems,
and paper piles and piles of money Taipei, Taiwan,
Sponsored: SIGDA,
were also on display. And then tucked Genevieve Bell is director of Interaction and Experience
Contact: Charlie Chung Ping
Research for Intel Labs.
in a corner, I spied the first paper Apple Chen,
product I have ever seen—a paper iPad Copyright held by author. Email: cchen@cc.ee.ntu.edu.tw

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 25


V
viewpoints

doi:10.1145/2043174.2043184 Michael A. Cusumano

Technology Strategy
and Management
The Legacy of Steve Jobs
Reflecting on the career and contributions of the Apple cofounder.

M
uc h h a s b e e n written ers and often determine who wins or
about Steve Jobs since his loses a platform battle.
announcement in August The problem with many platforms,
2011 that he was step- though, is that they involve design
ping down as CEO of Ap- compromises; they need to accom-
ple and his death less than two months modate the needs of many users and
later in October. In the past, I have been partners, as well as maintain continu-
disappointed that Apple did not pursue ity with the past, which constrains in-
a more “open” strategy for the Macin- novation. The Macintosh was a break-
tosh (1984) as well as early versions through product, pioneering new
of the iPod (2001), iTunes (2003), and ground with its graphical user inter-
the iPhone (2007) (see “The Puzzle of face, mouse, language and graphics
Apple,” Communications, Sept. 2008). processing capabilities, among other
I have noted that Apple did become a innovations. Yet it was expensive, was
better platform leader, gradually, and incompatible with DOS, had relatively
in May 2010 topped Microsoft to be- few business applications, and failed
come the world’s most valuable tech- to become adopted by the mass mar-
nology company (see “The Resurgence ket. The NeXT workstation computer,
of Apple,” Communications, Oct. 2010). which Steve Jobs introduced in 1988,
Jobs probably did not care much about was an even more expensive marvel of
what professors write or what other hardware and software design; it at-
companies do; he always followed a there is a better product that we might tracted even fewer customers.
unique path in life and in business. benefit more from as consumers.a Today, Windows running on Intel-
Nonetheless, anyone who cares about We saw this with the Macintosh com- compatible chips remains the most
technology and innovation, or the type puter, which was far superior to the common software platform for per-
of entrepreneurship that Americans DOS-Windows PCs that won the mass sonal computers (though cellphones
should be most proud of, should take market. Dominant platforms need far outsell PCs and have become the
the time to reflect on the career and to be sufficiently open and modular dominant mode of computing). But Mi-
contributions of Steve Jobs. technologically as well as priced right crosoft has introduced only incremen-
for the mass market but also attractive tal innovations, following the path set
Products, Not Just Platforms for other companies to adopt as foun- by the Macintosh more than 25 years
Photogra ph by AP Ph oto/Paul Sa kum a

The point I made about Apple in the dations to produce their own comple- ago. And Android-based smartphones
past was simple: In platform mar- mentary products and services. These and tablets, which rely on Google’s
kets (those defined by a core technol- outside innovations tend to make the “free” and “open” operating system,
ogy and complementary innovations, platform increasingly valuable to us- follow the lead of the iPhone and the
driven by “network effects”—see iPad. My point is that Microsoft, Intel,
a See Annabelle Gawer and Michael A. Cusuma-
“The Evolution of Platform Think- no, Platform Leadership (Free Press, 2002) as
and Google have taken the usual route
ing,” Communications, Jan. 2010), the well as Michael A. Cusumano, Staying Power to platform leadership, with inexpen-
best platform will usually win—even if (Oxford, 2010), among other publications. sive or free products, relatively open

26 communicat ions of t he acm | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

interfaces, and extensive efforts to cul- It’s in the Details


tivate a broad ecosystem of partners. When Jobs and Wozniak cofounded
But Jobs and Apple have shown us an- Beating Microsoft Apple in 1976, they believed, along with
other path to platform leadership, and or Google at their Bill Gates and Paul Allen, who cofound-
not just for a niche product segment: ed Microsoft in 1975, that the world
Design breakthrough products that set own platform game would one day be full of personal com-
new standards for form, function, and is not what seems puters. These entrepreneurs had some-
aesthetics; market them creatively and thing else in common: They all had
aggressively, with some modest reduc- to have motivated the skills, in varying degrees, to build
tions in price over time; open them up Steve Jobs. the products they dreamed of. Jobs
gradually as industrywide platforms, needed Wozniak’s technical wizardry
and let the chips fall where they may. to shrink down the number of chips
This strategy will be very hard to dupli- and construct the internals of Apple’s
cate without a Steve Jobs at the helm. early computers. Gates and Allen were
But it is more of a win-win scenario for relatively clumsy and difficult to use preeminent software programmers,
the innovator (still lots of money to be for the average consumer. Today’s PCs, especially Gates. But Jobs stands far
made) and the user (we all end up with digital media players, smartphones, above his peers for the degree to which
better products, not just platforms). and tablets based on Windows or even he combined extremely astute techno-
Android are as good as they are only logical vision with an ability to dive into
Raising the Bar because of how much Steve Jobs and the smallest details of his products, in-
But beating Microsoft or Google at Apple raised the bar—for everyone. cluding hardware, software, industrial
their own platform game is not what design, and marketing.
seems to have motivated Steve Jobs. Charisma and Leadership We have heard about the care Jobs
He appears to have cared most about In the 1996 PBS documentary, “Tri- took to design the Apple II case, with
the impact that technology and innova- umph of the Nerds,” Larry Tesler, who consumer electronics as his model,
tion, delivered in their most cultivated used to work at Apple, discussed how and how he tackled the many chal-
forms, can have on people’s lives. For Steve Jobs was able to inspire people lenges posed by the Macintosh, rang-
example, he is famous for a quip about to surpass what even they believed they ing from reducing the price tag to the
Microsoft back in the mid-1990s that could accomplish. He would never time it took to boot up. The users that
the company “has no taste” and did not settle for anything less than someone’s bought these and other Apple products
bring the best of human culture into absolutely best effort, and then some. quickly came to love them—truly love
its products. Jobs cited the example of That is how Jobs raised the bar for the their elegant look and feel in a way that
proportionally spaced fonts in the Ma- Macintosh project—whose competi- seems unparalleled in any other com-
cintosh, an idea he got from looking at tion was the character-based IBM PC petitor’s products.
beautiful books and the history of print- and compatibles—and many products Until recently, Jobs continued to be
ing, and which Windows later copied.b since then, most recently the iPad. Tes- deeply involved in the iPhone and iPad
From the beginning, Jobs wanted ler recalled: “When I wasn’t sure what designs, both of which have the same
Apple to create computers that would the word charisma meant, I met Steve look and feel about them that we first
be as elegant and simple to use as a type- Jobs and then I knew.”c marveled at in the original Macintosh.
writer or even a toaster. Now, looking Let’s be sure to give adequate credit I should not have been surprised to
back, we can see that every product Jobs to Apple cofounder Steve Wozniack for learn from the recent reporting that
championed, whether or not it succeed- Apple’s early products as well as to Jona- Jobs is listed as a coinventor of 313
ed commercially, set new standards for than Ive for being chief designer of the patents, beginning with personal com-
aesthetics as well as utility, such as in iPod, iPhone, iPad, and several hit Ma- puter cases but extending to internal
ease-of-use or handling graphics and cintosh desktop computers and laptops. PC electronics and designs for lap-
multimedia. What stands out most to And to Scott Forstall, who headed iOS tops, multimedia devices (the iPod),
me are the ultra-simple, intuitive user software development. New CEO Tim smartphones (the iPhone), operating
interfaces of the Macintosh (GUI plus Cook, formerly the COO, has also been systems (NeXT, iOS), keyboards, mice,
mouse, albeit invented earlier at the a highly effective leader of sales and op- and Apple TV.d
Stanford Research Institute and Xerox erations since Jobs recruited him from I am also left with the thought that
PARC) and then the iPod’s clickwheel Compaq Computer in 1998. But it has great entrepreneurs do not really see
and the iPhone and iPad touchscreens. taken extraordinary charisma and lead- the future as much as they create the
By contrast, Bill Gates and Microsoft ership skills to bring so many diverse future they envision. Steve Jobs knew
focused on software operating systems personalities together and channel their how to build and sell game-changing
that led to cheap and powerful PCs as considerable talents so productively. products, down to the smallest details.
well as lots of applications but were This does not happen often or by chance.
d Shan Carter, “Steve Jobs’ Patents,” The New
b Public Broadcasting System, “Triumph of the c Public Broadcasting System, “Triumph of the York Times (Aug. 25, 2011); http://www.ny-
Nerds,” The Television Program Transcript, Nerds,” The Television Program Transcript, times.com/interactive/2011/08/24/technol-
Part III; http://www.pbs.org/nerds/part3.html. Part III; http://www.pbs.org/nerds/part3.html. ogy/steve-jobs-patents.html?emc=eta1

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 27


viewpoints

Computers and The Legacy


Consumer Electronics And so now Apple grows not at the
As Steve Jobs moved forward in his Early observers slow pace of the personal computer
career, he also brought related but of Jobs and Apple business, like Microsoft, or of Inter-
formerly distinct technologies—and net advertising, like Google, but at the
businesses—together. In fact, he felt underestimated breathtaking speed of exciting new
compelled to shed the historic “Apple his ability to master global markets led by smartphones,
Computer” name in 2007 in favor of tablets, app stores, cloud services
“Apple, Inc.” to reflect the broader set the business side (iCloud), and digital content distribu-
of aspirations that he and the company of technology. tion (music, video, and TV, as well as
had adopted. It is instructive again to books, magazines, and newspapers—
compare Jobs and Apple with Gates and what we used to call “print”). Looking
Microsoft. Gates’ main entrepreneurial at rates of growth in sales and profits,
legacy has been to create a mass-mar- or increases in market value, Apple has
ket software products company that left Microsoft and Google, as well as
continues to “print money” and ex- er the same year, but this company also IBM, Hewlett-Packard, Dell, Nokia, In-
ploit those remarkable gross margins failed. Jobs recovered by selling NeXT to tel, Cisco, EMC, Sony, AOL, and many
of packaged software (see “The Legacy Apple in 1996, a brilliant stroke because other prominent technology firms, in
of Bill Gates,” Communications, Jan. he not only rejoined Apple but was able the dust. Jobs and Apple defied con-
2009). Indeed, Apple has yet to match to use NeXT’s novel software technol- ventional strategy Microsoft-style and
Microsoft in profitability. But Microsoft ogy to replace the aging Macintosh op- have shown that more substantive in-
is now a slow-growth gold mine, strug- erating system. Meanwhile, Jobs’ love novation—coming up with consumer
gling to make money beyond the per- of graphics, movies, and computers led products that truly do seem new or al-
sonal computer industry and the Win- him to found Pixar in 1986 by acquiring most new to the world, and not just to
dows and Office franchises. the computer animation department the company—can be exciting and fun,
By contrast, Apple has fully inte- from Lucas Films (which he sold to Dis- as well as enormously profitable.
grated computers with consumer elec- ney in 2006 for a cool $7.4 billion). Apple should do well for several
tronics, including telephony and the Jobs made other moves that showed years to come, as Tim Cook’s team ex-
mobile Internet (apps, music, video, he could put aside personal pride or bi- ecutes on the vision and product port-
text). Many firms played a role in merg- ases to do what was necessary to save folio that Jobs has left behind. But most
ing computers with personal digital and grow the company. Two incidents observers worry about what will hap-
assistants, digital media players, and stand out. First, when he rejoined pen when Apple exhausts the ideas still
cellphones. But no one has pushed Apple in 1996, the firm was practically on the drawing board. Jobs will not be
this convergence as far and as elegantly bankrupt, with only a few months of around to champion yet another prod-
as Steve Jobs, especially as seen in the cash left. But Jobs got a $150 million uct that changes the world and fills up
iPhone and the iPad. It was also impor- investment from archrival Microsoft yet another sales pipeline.
tant that Apple combined hardware as well as a commitment from Bill The end had to come, of course; no
and software in one company—some- Gates that Microsoft would continue one lives forever and no company is im-
thing Microsoft could never do except to produce Office for the Mac. This mune to competition. We were some-
in video games because it does not agreement was critical to maintain the what prepared: Jobs was seriously ill
make hardware. And, with iTunes, Jobs Macintosh business, then the only real for several years and had a limited role
solved an extremely vexing problem for source of revenue for Apple. Second, in Apple for the past couple of years.
the industry and for consumers: how in 2005, Jobs abandoned his 20-year But words are inadequate to express
to price digital content in the form of commitment to the Motorola micro- what Steve Jobs meant to Apple and to
music, video clips, movies, and TV pro- processor and adopted archrival Intel’s the world of technology, innovation,
grams. This innovation in digital servic- technology. This move helped bridge and high-tech entrepreneurship Amer-
es is no less profound than Steve Jobs’ the growing cost-performance gap with ican-style. Surely, as an entrepreneur
innovations in consumer products. Windows PCs, and enabled the Macin- and innovator, he represented the very
tosh to continue as a second platform best that the U.S., or any country, has
The Master Strategist that was also much more interoperable to offer. Our thoughts must be with the
Early observers of Jobs and Apple, in- with the Windows world. As I look back people closest to him, beginning with
cluding myself, underestimated his at this history—disappointment with his family and intimate friends. They
ability to master the business side of the original Macintosh, failure at NeXT, will miss Steve Jobs the most, but they
technology. Clearly, over time, Jobs got success with Pixar, awkward but highly are not alone.
better at this—much better—perhaps useful alliances with Microsoft and In-
as the world caught up to what he was tel, the two companies Apple custom- Michael A. Cusumano (cusumano@mit.edu) is a
professor at the MIT Sloan School of Management and
trying to do. Recall that he resigned un- ers loved to hate—one must conclude School of Engineering and author of Staying Power: Six
der pressure from Apple in 1985, after that, in addition to his other extraordi- Enduring Principles for Managing Strategy and Innovation
in an Unpredictable World (Oxford University Press, 2010).
the Macintosh failed to catch on in the nary talents, Steve Jobs truly was a mas-
marketplace. He started NeXT Comput- ter strategist, second to no one. Copyright held by author.

28 commun ications of t he ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


V
viewpoints

doi:10.1145/2043174.2043185 Kentaro Toyama

Emerging Markets
On Turbocharged,
Heat-Seeking,
Robotic Fishing Poles
Applying a well-known proverb to socio-technical transformation.

I
n th e b a s e me nt of an office
building in Bangalore, India,
a housekeeper sat at a PC and
painstakingly typed search
terms into a browser. The PC
was part of an early experiment at Mi-
crosoft Research India, which I co-
founded in 2005. In the experiment, we
were interested in what lower-income
adults would do with an Internet-con-
nected PC, if they had unrestricted ac-
cess to one.
We were part of a larger movement
called “information and communica-
tion technologies for development”
(ICT4D), and at the time interest fo-
cused on what PCs and the Internet
could do for international develop-
ment. Digital technologies had trans-
formed the lives of wealthy, educated
people in developed countries. Could
they help solve the challenges of pover- OLPC delivered by boat as part of OLPC Mexico Nayarit.
ty in the developing world? Proponents
argued, for example, that telemedicine of the technology sector. The trend has trained teachers in developing-country
would revolutionize health care, that only grown with the advent of the mo- education; and M-PESA, a mobile pay-
distance learning would close educa- bile phone, the numbers of which— ment system widely used in Kenya that
tional gaps, and that village telecenters over five billion accounts worldwide— allows users to send money via SMS text
would double rural incomes in even comfortably exceed the total adult messages and a nationwide network of
the poorest countries. population of the planet. agents. Related projects have been fea-
Photogra ph court esy of O LPC m exico

ICT4D has been gaining momen- The dominant model of ICT4D is tured previously in Communications.1,5
tum since the late 1990s: On the one to seek to apply technology innova- I have conducted or supervised ap-
hand are technologists and entrepre- tions for the benefit of very low-income proximately 50 research projects in
neurs looking for ways to contribute to communities. Among the best known ICT4D, but while a few projects dem-
society beyond novel toys for rich folks; examples are One Laptop Per Child onstrated meaningful impact and
on the other hand, there is the interna- (OLPC), initially announced as a spe- continue to do so in some form, the
tional development community hop- cially designed $100 laptop that would vast majority ended as temporary pilot
ing to learn from the economic success fill the hole left by absent or under- projects with learning outcomes but

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 29


viewpoints

a day; if you teach them how to fish,


they’ll eat for a lifetime.” The main
point is obvious enough: Yes, it is great
to give someone food when they are
starving, but doing so is a short-term,
stopgap measure. What is really worth-
while is to teach them how to grow (or
catch) their own food, so that they can
independently help themselves.
The saying, however, also packs sev-
eral layers of additional insight that are
particularly relevant for the technolo-
gist interested in international devel-
opment. First, it is interesting to note
what the saying does not say: It doesn’t
say, “If you give someone a fish, they’ll
eat for a day; if you give them a turbo-
charged, heat-seeking, robotic fishing
pole, they’ll eat for a lifetime.” That’s
An M-PESA stand in Kenya. because, while such technology might
result in more than one day of fish, it
little long-term effect. My experiences This realization is ironic and disheart- still leaves the person no better able to
taught me a single, simple lesson: ening for the technologist interested in fend for themselves. Now, they need
Technology is an amplifier of human social causes. Technology is supposed to adapt, maintain, and upgrade a
intent and capacity, and only an am- to be a means to scale the ingenuity of technology—which is likely at least as
plifier.4 So, in well-meaning, capable a few inventors for the benefit of many. great a challenge as fishing. The his-
hands, technology can work wonders; We all grew up inspired by stories of tory of international development is
but absent good intentions or capa- Thomas Edison and Jonas Salk. Yet, full of rusting tractors, broken medical
bility to use and support it fully, tech- with information and communication equipment, and increasingly, defunct
nology often ends up having zero or technologies, it is exactly those com- PCs and mobile apps that worked until
negative impact. Technology never munities that most lack information- well-meaning specialists left when the
guarantees net positive impact. processing skills, a strong foundation funding dried up.
of knowledge, and connections to in- Second, if we draw an analogy be-
Technology and Societal Change fluential social networks—and that are tween fish and technology, the saying
To be clear, it is not that technology therefore poor—that are also the least suggests that simply providing tech-
cannot play a role in positive change. interested or least able to make pro- nology (or selling it at low cost) to poor
For example, M-PESA increases in- ductive use of the technology. people is just another kind of charity:
comes in some rural areas, as urban great as a stopgap measure, but not a
migrants send money home with great- Returns on Investment long-term solution. A narrow inter-
er frequency. This kind of evidence has Over the years, I have received many in- pretation of the analogy suggests we
enamored the international develop- quiries from computer scientists and should instead teach people how to
ment community to mobile-payment engineers who say, “I’ve achieved com- create technology themselves.
systems as a way to provide financial fort and security in my own life, but The larger point is that there is
services to people who are “unbanked.” I’d now like to apply my skills for the a world of difference between con-
But potential does not always trans- less privileged people in the world.” sumption of goods and production of
late to actuality. It is not at all clear that Most of them then follow up with the goods—whether it be of fish, technol-
the net effect of systems like M-PESA question, “How can I apply my techni- ogy, or anything else. The ability to con-
will be positive overall, especially when cal skills to the challenges of impover- sume what you want is more a result
one considers that they are two-way ished people?” of productive capacity than vice versa.
pipes between the pockets of poor, less Poverty, though, cannot be engi- ICT4D discourse tends to conflate the
educated people, and powerful corpo- neered away, any more than a fail- two, believing that any association with
rations with savvy, well-funded market- ing business’s problems can be. The technology is good, but it is those who
ing departments. deeper challenge lies with people and produce, not consume, technology that
We have seen cycles of hype and dis- institutions, not technology. Perhaps are best protected against poverty in
Photogra ph by Andrew Cu rrie

appointment before: In the 1960s, the sensing this issue, a few people ask a the long term. If you had to give up one
television was hailed as a revolution- broader question: “What is the best or the other, which would you rather do
ary technology that would replace the way that someone like me can contrib- without…? All of the electronic devices
need for schools altogether. Today, it is ute to the lives of the less privileged?” you currently own (which will break or
better understood as a means by which There is a well-known proverb, “If become obsolete within a few years), or
millions of people watch reality TV. you give someone a fish, they’ll eat for all of your education, professional ex-

30 commun ications of t h e acm | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

perience, leadership skills, and social tion in West Africa. Many of its grad- Aishwarya Ratan, one of the research-
contacts (which will serve you for the uates now write code for Ghanaian ers in my group then, was unsatisfied
rest of your life, and propagate to peo- corporations or run start-up compa- with this outcome. Though she ac-
ple you raise, teach, or mentor)? nies, thereby supplying the engine of knowledged the value the staff got out
Finally, the saying implicitly rec- growth for the country. of watching free movies, she felt that
ommends teaching over giving. The Or, consider Trish Dziko. After 15 true development ought somehow to
most meaningful contribution is to years as a developer, designer, and contribute to the staff’s capabilities
help another person grow, in knowl- manager, she founded the Technology (along the lines argued by Nobel econ-
edge, in new skills, and in forward- Access Foundation (TAF), which runs omist Amartya Sen3). So, she decided
looking attitudes. Imagine a strange educational programs that focus on it was important to provide more than
utopia in which technology feeds, science, technology, engineering, and just the technology, and she ran a
heals, and generates income for the mathematics for students of color in computer literacy course that taught
poor, so that the appearance of pov- the Greater Seattle area. (Sometimes, the staff the basics of word process-
erty itself is eliminated, but people the developing world is in your own ing, spreadsheets, and some educa-
remain unable to take care of them- backyard.) TAF provides children of tional software.2
selves absent the technology. Is that low-income households hands-on ex- For some members of the staff,
the outcome we’re seeking? posure to robotics, chemistry experi- this was all the encouragement they
ments, and other experiences that are needed. One of the building’s secu-
Real-World Applications all too often cut from public schools. rity guards began using the PC in the
Reality, of course, is more complex Then, through supplementary pro- basement to practice data-entry skills
than the black-and-white alternatives I grams like internships and interview that he learned in an outside evening
have articulated in this column. Rarely training, they prepare students for a class. One day, he came in and told
are real-life choices constrained to two strong future. Students who might oth- us he was moving on. He had been
options of pure giving or pure teaching. erwise fall through the cracks are nur- offered a job in computer data entry.
In any case, we could not teach mil- tured through to college and beyond. Though the job involved an initial
lions of non-literate people how to be- We do not have to be as bold as cut in pay, his future prospects were
come world-class software engineers Awuah or Dziko; individuals who are much brighter, as he had effectively
overnight, even if we wanted. And, just less bold can also make a difference. crossed over from a blue-collar job
to do productive work often requires The reason I know their stories is be- to a white-collar profession. He told
consumption of technology. cause I took personal leave from my us proudly, “Today I can stand up in
Nevertheless, the deeper wisdom job to teach calculus at Ashesi in its front of my father and friends and say
of the fish proverb remains. Wherever first year, and I am now considering that I am no more a watchman, but I
possible, it is more meaningful, and how best I can volunteer time with TAF. am doing a computer job.” What al-
more sustaining, to support the growth Good organizations often need expe- lowed this transformation was less
of productive capacity within people, rienced employees, volunteers, board the technology in the basement, but a
than to simply supply technologies for members, and mentors. solid secondary-school education and
them to consume. Teaching and mentorship, of the inspiration, instruction, and en-
For international development, that course, must be tailored to the individ- couragement he received from Ratan
means that our skills as engineers, ual, and for many people in the devel- and his data-entry teachers.
computer scientists, managers, and oping world, we may have to start with In short, it was the fishing les-
leaders are better applied to teaching the basics. Budding entrepreneurs sons, not the fish, that made all the
and mentorship than for technological might benefit from management ad- difference.
innovation on behalf of poor popula- vice and introductions to investors,
tions. The greatest contributions we but for illiterate children, we would References
1. Dias, M.B., and Brewer, E. How computer
can make are not displays of our own need to start with simple reading science serves the developing world. Commun.
brilliance and heroism, but helping skills. In between, there are rural teen- ACM 52, 6 (June 2009), 74–80; http://doi.acm.
org/10.1145/1516046.1516064.
people to help themselves. agers who would benefit from expo- 2. Ratan, A. et al. Kelsa+: Digital literacy for low-
income office workers. In Proceedings of the Third
What would this mean in prac- sure to careers in engineering, college International Conference on Information and
tice? One example was set by Patrick students who could use a course on Communication Technologies and Development (ICTD
‘09). IEEE Press, Piscataway, NJ, 150–162.
Awuah, who left a successful career interviewing skills, and inexperienced 3. Sen, A. Development as Freedom. Oxford University
as a program manager in the U.S. to computer programmers who would Press, 2000.
4. Toyama, K. Can technology end poverty? Boston
establish a ground-breaking new pri- benefit from a good code review. Review 35, 6 (Nov./Dec. 2010).
vate college in his home country of And, that brings us back to the 5. Underwood, S. Challenging poverty. Commun.
ACM 51, 8 (Aug. 2008), 15–17; http://doi.acm.
Ghana. Still less than 10 years old, Bangalore basement mentioned ear- org/10.1145/1378704.1378710.
Ashesi University just inaugurated a lier. At the lab, we quickly found that
new campus for over 400 students in free access to the Internet was most Kentaro Toyama (kentaro_toyama@hotmail.com) is a
researcher in the School of Information at the University
business administration, computer often used for entertainment. Under- of California, Berkeley, and the former assistant managing
science, and management informa- standably after a long day of work, the director of Microsoft Research India.
tion systems, and it has won awards staff would search for the latest Tamil
for raising the bar for tertiary educa- movies and watch them on YouTube. Copyright held by author.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 31


V
viewpoints

doi:10.1145/2043174.2043186 George V. Neville-Neil


Article development led by
queue.acm.org

Kode Vicious
Debugging on
Live Systems
It is more of a social than a technical problem.

Dear KV,
I have been trying to debug a problem
on a system at work, but the control
freaks that run our production systems
don’t want to give me access to the sys-
tems on which the bug always occurs.
I have not been able to reproduce the
problem in the test environment on my
desktop, but every day the bug happens
on several production systems. I am at
the point of thinking about getting a
key logger so I can steal the passwords
necessary to get onto the production
systems and finally see the problem “in
the wild.” I have never worked for such
a bunch of fascists in my entire career.
Locked Down and Out

Dear Locked,
First of all, while most companies
are inherently nondemocratic, few of
them are fascist. Fascism went out of
style sometime around 1945 and really Debugging a program or a system a production system outside of the
hasn’t made a comeback since. Sec- can, and often does, have negative side production environment first, as a
ondly, I do sympathize—no one should effects, either by slowing down the sys- test machine. I am surprised by how
be prevented from fixing a bug simply tem or changing the results of some many companies work without such
because of lack of access to the appro- calculation in an unintended fashion. staging machines, going directly
priate systems. The people who run your production from the developers’ desktops to
IllustraTIon by ABA/S h utt erstock.co m

What many programmers and systems are right to be wary of letting their production environments. If
technical people fail to comprehend any random programmer loose in their the bug won’t happen without real
is that, as a colleague recently put it, domain. If you break something, it is workloads, then it is time to get a ma-
“access implies responsibility.” This likely to come down on their heads, chine in the production environment
is why the sudo program has the warn- and they will have to fix it while you sufficiently isolated so that it can be
ing, stolen from the Spider-Man com- stand there glumly repeating, “Well, it given a workload without destroying
ics: “With great power comes great re- wasn’t supposed to do that!” the machines that are doing produc-
sponsibility.” Your best bet is to try setting up tive work.

32 commun ications of t h e acm | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

By now you might have noticed that


this advice is less technical and more
about social engineering. Program- It is common practice
mers must be willing to work with the for programmers to
people who have to keep systems up 24
hours a day, 7 days a week, if they want set memory to an
to be trusted enough to be able to de- easily recognizable
bug live or near-live systems.
Two final thoughts: using a key- value when they
board logger is not a way to gain trust, are trying to debug
and telling someone in a public col-
umn that you’re thinking about it is as memory-smash bugs.
dumb as tweeting your murder plans.
KV

Dear KV, x86 architecture, which is known in


A program I have just been handed at network parlance as a little-endian sys-
work keeps crashing, and each time I tem. A little-endian system stores the
look at it in the debugger and exam- most significant byte of a multibyte
ine various bits of memory I see the word last. Network protocols are big
pattern 0xdeadc0de in different endian, which is the opposite of how
parts of allocated memory. Is this a x86 processors store data in memo-
joke? Do you think that my coworkers ry. All network programmers know
are hazing me? the C macros htonl(), ntohl(),
0xDead Tired of this Code htons(), and ntohs(), which do
the proper swapping of host-to-net- ACM’s
work endianness and back. A good
interactions
Dear 0xDead, way to debug a network protocol is to
It is common practice for program- transmit data such as 0xdeadc0de magazine explores
mers to set memory to an easily rec- in the packets and then make sure it critical relationships
ognizable value when they are trying does not look like 0xdec0adde when between experiences, people,
to debug memory-smash bugs. You it arrives in your program’s memory.
might think they would clear all the Using this trick makes it easier to fig- and technology, showcasing
bytes in the buffer to be 0x00, but that ure out where you might have left out emerging innovations and industry
does not help if some piece of code is a byte-swapping macro. leaders from around the world
writing NULL bytes all over your buf- So, much as I would like to think
fers. Using a known pattern such as your coworkers are hazing you, it is far
across important applications of
0xdeadc0de makes it easier to find more likely they are trying to be helpful. design thinking and the broadening
these problems in a debugger. As you KV field of the interaction design.
have seen, you print a buffer and you
Our readers represent a growing
see the pattern. If instead you saw,
Related articles community of practice that
say, 0xde00c0de, you would know on queue.acm.org
that someone had written a NULL byte
Massively Multiplayer Middleware
is of increasing and vital
in the middle of your memory. Maybe Michi Henning global importance.
you wanted that, maybe you didn’t, http://queue.acm.org/detail.cfm?id=971591
but now, at least, you can clearly see
Cybercrime 2.0: When the Cloud Turns Dark
it. For extra cleverness points you can Niels Provos, Moheeb Abu Rajab,
set a watchpoint—if it is supported and Panayiotis Mavrommatis
e

by your hardware—which stops the http://queue.acm.org/detail.cfm?id=1517412


ib
cr

program if some variable or part of


s

Enhanced Debugging with Traces


ub

memory does not equal 0xdeadc0de. Peter Phillips


/s
rg

I tend to set buffers I am debugging to http://queue.acm.org/detail.cfm?id=1753170


.o
cm

be all 0x69, because if I see that num-


a
w.

ber, then I know it is my own personal George V. Neville-Neil (kv@acm.org) is the proprietor of
w

Neville-Neil Consulting and a member of the ACM Queue


w

bit of work. editorial board. He works on networking and operating


://
tp

For programmers who deal with systems code for fun and profit, teaches courses on
ht

various programming-related subjects, and encourages


network packets, a known pattern has your comments, quips, and code snips pertaining to his
Communications column.
another advantage. Most people write
code on systems based on the Intel Copyright held by author.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 33


V
viewpoints

doi:10.1145/2043174.2043187 Valerie Taylor and Richard Ladner

Broadening Participation
Data Trends on Minorities
and People with
Disabilities in Computing
Seeking a comprehensive view of minority student demographics to
determine what programs and policies are needed to promote diversity.

I
ncr e a s ing d iversi ty in com- Data System (IPEDS)—which is a sur-
puting is very important for vey conducted by the U.S. Department
multiple reasons. First, there Examining multiple of Education’s National Center for
is the issue of the work force. data sources can help Education Statistics (NCES), to obtain
According to the U.S. Census, data for race and ethnicity. The Web-
Blacks and Hispanics were approxi- find gaps in some CASPAR database provides easy access
mately 12% and 16% of the U.S. resi- data sources and to a large body of statistical data re-
dents in 2010, respectively. According sources for science and engineering at
to the 2008 Census Bureau projections, help validate data in U.S. academic institutions. The focus,
Hispanics, African-Americans, and other data sources. however, is on the field of computer
Native Americans/Alaska Natives are science. (Data source used for minori-
projected to account for 47% of the ties at over 1,000 institutions, includ-
U.S. population by 2050. Second, there ing community colleges, for-profit in-
is the issue of having diverse perspec- stitutions, undergraduate institutions,
tives involved in the design of products and Ph.D.-granting institutions.)
thereby having more robust end prod- people with disabilities. The graphs The different data sources have dif-
ucts on the market. Lastly, there is the shown in the accompanying figures ferent sets of U.S. institutions for which
issue of inclusion—that the field be were developed by the Center for Mi- data is obtained. Examining multiple
representative of society. norities and People with Disabilities data sources can help find gaps in
Given the importance of increasing in IT (CMD-IT).a some data sources and help validate
diversity, it follows that trends about For this column, the focus is on two data in other data sources. The union
the demographics of students in the major data sources: of the data sources helps give a picture
computing field are necessary to de- ˲˲ Computing Research Association of the demographics of the broad com-
termine what programs and policies (CRA) Taulbee reports (http://www.cra. puting community. In particular, it is
are needed to promote diversity. To org/resources/taulbee/) for computer important to include non-Ph.D.-grant-
this end, we present different sources science only. (Data source used for mi- ing institutions, community colleges,
for data on minorities and discuss the norities at CRA-affiliated universities, for-profit institutions, as well as Ph.D.-
importance of having multiple sourc- which are primarily Ph.D.-granting in- granting institutions. For example, in
es to get a comprehensive view. In ad- stitutions.) fall 2006, there were approximately 11.2
dition, we begin a discussion about ˲˲ WebCASPAR (https://webcaspar. million students enrolled in four-year
what the data indicates with respect nsf.gov/), using IPEDS/NCES—the institutions and approximately 6.5 mil-
to minorities and the difficulties in Integrated Postsecondary Education lion students enrolled in two-year insti-
the data collection process for people tutions.b It is important to consider all
with disabilities. In particular, the fo- a Center for Minorities and People with Disabil-
degree levels: associate’s, bachelor’s,
cus is on Blacks/African Americans, ities in IT, CMD-IT (pronounced “command
Hispanics, Native Americans, and it”); http://www.cmd-it.org. b Digest of Education Statistics 2008, Table 194.

34 commun ications of t he ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

master’s, and doctorates because they Table 1. Number of associate’s and bachelor’s degrees awarded.
represent stages in the pipeline. Fur-
ther, it is important to have the data
broken down by gender and ethnicity to Associate’s Degree Bachelor’s Degree
allow analysis of trends related to mi- Native Total No. Native Total No.
nority women. It is recognized that sur- Year Blacks Hispanics Amer. Degrees Blacks Hispanics Amer. Degrees
veys regarding ethnicity and gender are 2005 5,119 3,888 352 36,140 5,815 3,529 281 54,588
usually based upon self-identification, 2006 4,617 3,261 325 31,170 5,275 3,351 274 48,000
for which people may select the option 2007 3,988 2,980 291 27,680 4,588 2,970 249 42,596
to not provide the information. The sur- 2008 4,171 2,897 298 28,327 4,011 2,923 221 38,922
vey results, however, provide the best 2009 4,316 2,995 293 30,050 3,868 2,999 213 38,496
data available for understanding trends. Source: WebCASPAR; https://webcaspar.nsf.gov/

Associate’s Degrees
The primary data source for the as- Table 2. Number of minority women for the associate’s degree.
sociate’s degree is WebCASPAR. With
respect to number of institutions, for
2009, WebCASPAR included 1,065 in- Year Black Women Hispanic Women Native Amer. Women Total No. Degrees
stitutions for the associate degree. The 2000 1,711 1,097 153 23,576
CRA Taulbee data does not report on 2005 2,239 1,159 156 36,140
the number of associate degrees; data 2009 1,567 675 107 30,050
is given for bachelor’s, master’s, and Source: WebCASPAR; https://webcaspar.nsf.gov/
doctorate degrees. Table 1 provides
the number of degrees awarded to stu-
dents from the different ethnic groups Table 3. Number of minority women for the bachelor’s degree.
in addition to the total number of de-
grees awarded for the past five years.
With respect to associate’s degrees, the Year Black Women Hispanic Women Native Amer. Women Total No. Degrees

data indicates the number of degrees 2000 1,698 693 59 37,519


awarded is along the same order of 2005 2,383 930 86 54,588
magnitude as the bachelor’s degree for 2009 1,330 591 60 38,496
minorities. This trend, however, does Source: WebCASPAR; https://webcaspar.nsf.gov/

not occur when considering the total


number of degrees at the different lev-
els; the total number of bachelor’s de- PAR included at least 1,283 institutions primary membership of CRA entails
grees far outnumbers the total number for the bachelor’s degree, 442 institu- Ph.D.-granting institutions.
of associate’s degrees. This high partic- tions for master’s degree, and 97 in- Figure 1 compares the data from
ipation of minorities at the associate’s stitutions for the doctorate degree. In WebCASPAR with that obtained from
degree level indicates the importance particular, the institution counts cor- CRA Taulbee for the bachelor’s degree.
of encouraging students at the com- respond to those that reported award- We consider the computer science
munity colleges to complete the bach- ing at least one degree at the given bachelor’s degree only because for the
elor’s degree. Hence, while significant level. By contrast, for the 2009–2010 WebCASPAR classification variable of
recruiting for minorities occurs at the academic year the CRA Taulbee data “Academic Discipline, Detailed (stan-
high school level, significant effort is based upon completed surveys from dardized)” computer science is given,
needs to be devoted to recruiting mi- 150 CS programs. It is noted that CRA but not computer engineering. The
norities at the associate’s degree level. is focused on research, for which the data indicates a significant difference
Table 2 identifies the number of mi- in the trends reported by the two data
nority women for the associate’s degree. sources regarding minorities. For Web-
The data indicates the number of mi- It is recognized that CASPAR, the percentage for Blacks is
nority women for all three ethnic group in the 10%–11% range, in contrast to
numbers increased from 2000 to 2005, surveys regarding 3%–4% as indicated with Taulbee. In
but then decreased by 2009. This trend, ethnicity and gender the case of Hispanics, there is some
however, is consistent with the trend for difference in the percentages, with Web-
total number of associate’s degrees. are usually based on CASPAR indicating percentages in the
self-identification. range of 5%–8% with Taulbee indicat-
Bachelor’s Degree ing percentages in the range of 3%–6%.
The WebCASPAR and Taulbee data For the case of American Indian/Alaska
sources have vastly different numbers Native, the percentages are less than
of institutions for the bachelor’s de- 1% for both data sources. Further, it is
gree. For example, for 2009, WebCAS- noted that the Taulbee data indicates a

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 35


viewpoints

recent decline in the percentage of His- Figure 1. WebCASPAR and Taulbee data for percentage of bachelor’s degrees awarded to
panic bachelor’s degrees in contrast to minorities.
the WebCASPAR data, which indicates
a recent increase in the percentage of  American Indian or Alaska Native (WebCASPAR)  Hispanic (Taulbee)
 American Indian or Alaska Native (Taulbee)  Black, Non-Hispanic (WebCASPAR)
Hispanic bachelor’s degrees. Hence,
 Hispanic (WebCASPAR)  Black, Non-Hispanic (Taulbee)
the data indicates a large number of 12
minorities at the bachelor’s level are
10
not at the Ph.D.-granting institutions.
With respect to the number of mi- 8
nority women at the bachelor’s degree
6
level indicated in Table 3, we see simi-
lar trends as that given with the associ- 4
ate’s degree. The numbers increased
2
from 2000–2005 and then decreased
from 2005–2009. Similarly, the total 0
number of degrees had a similar trend. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
It is noted that the number of minority
women at the bachelor’s level is very
small in comparison to the total num-
ber of degrees. Hence, significant ef- Figure 2. WebCASPAR and Taulbee data for percentage of master’s degrees awarded to
fort is needed to increase the number minorities.
of minority women.
 American Indian or Alaska Native (WebCASPAR)  Hispanic (Taulbee)
 American Indian or Alaska Native (Taulbee)  Black, Non-Hispanic (WebCASPAR)
Master’s Degrees  Hispanic (WebCASPAR)  Black, Non-Hispanic (Taulbee)
Figure 2 provides data on the per- 6

centages of the different ethnic groups 5


awarded master’s degrees for Taulbee
and WebCASPAR. While the number of 4

WebCASPAR institutions is much small-


3
er for the master’s degrees than the
bachelor’s degrees, there is still a signifi- 2
cant difference between the percentag-
1
es for the Black and Hispanic groups for
WebCASPAR versus Taulbee. For Blacks, 0
the WebCASPAR source indicates per- 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
centages in the range of 4%–5.5% in
contrast to the Taulbee source, which
indicates percentages in the range of
Figure 3. WebCASPAR and Taulbee data for percentage of doctorate degrees awarded to
1%–2%. Both sources indicate a slight minorities.
decline in the percentages when going
from 2008–2009. For Hispanics, the  American Indian or Alaska Native (WebCASPAR)  Hispanic (Taulbee)
WebCASPAR range is between 2%–3%  American Indian or Alaska Native (Taulbee)  Black, Non-Hispanic (WebCASPAR)
in contrast to the Taulbee range, which 3
 Hispanic (WebCASPAR)  Black, Non-Hispanic (Taulbee)

is between 1%–2%. Both data sources


provide similar trends. For the case of 2.5

American Indian/Alaska Native, the


2
percentages from both sources are
consistently small, less than 1%. With 1.5

respect to the percentage of minority


1
women at the master’s degree level, the
percentage of women from the three 0.5
groups remains approximately flat in
0
the range of 2% for Black women, 0.8%
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
for Hispanic women, and 0% for Native
American women from 2000–2009.

Doctorate Degrees maximum percentage is only 2.80% in the NCES data source, the percentage
In Figure 3, which focuses on the doc- 2002. Because the focus is on Ph.D.- of minority women at this level has
torate degrees, the numbers are very granting institutions, the data from remained flat in the range of 0.7% for
small from both data sources as the the two sources are fairly close. From Black women, 0.3% for Hispanic wom-

36 commun ications of t h e ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

en, and 0% for Native American women approximately 16% representation in


from 2000–2009. the general population (2010 census).
What are the factors By contrast, the percentages reported
People with Disabilities influencing the in the most recent Taulbee survey are
People with disabilities are an impor- 3.4% (Blacks) and 5.3% (Hispanics).
tant group to consider because they are institution choice for These large differences reflect the dif-
underrepresented in science and engi- minorities majoring ferent sets of institutions represented
neering fields, and there are a number in the two data sources. However, at the
of programs to increase their number in computer science? doctoral level, the two sources of data
in computing fields. The data collec- show more similar percentages for all
tion process, however, is very difficult groups, Blacks, Hispanics, and Native
for a number of reasons. First, institu- Americans. This reflects the similarity
tions differ in how they count students between 97 WebCASPAR institutions
with disabilities. The counts can be close to the percentage of all students and the 150 Taulbee institutions.
based on services or accommodations who have a disability attending college. Further, the data indicates minori-
provided, self-reporting to the disabil- The low percentage reported in the Sur- ties at the bachelor’s level are not at the
ity support service office, verification vey of Earned Doctorates may have two Ph.D.-granting institutions. The num-
of disabilities, or external/proxy report contributing factors. First, data from ber of minorities receiving associate’s
to the disability support service office. the 2008 National Postsecondary Stu- degrees is approximately the same as
Further, institutions differ in how they dent Aid Study indicates students with the number receiving bachelor’s de-
maintain records of students with dis- disabilities are significantly less likely grees; this is not the case when com-
abilities. Some institutions include to persist to obtain a bachelor’s degree paring total number of degrees at the
data on students with disabilities in than those without a disability, with two degree levels. These trends raise
the general student record system from about 40% persistence for those with a number of questions. First, what are
which degree data is reported to the a disability compared with 60% persis- the factors influencing the institution
Department of Education. Further, it tence for those without a disability. Sec- choice for minorities majoring in com-
is noted that one cannot consider the ond, the data in the Survey of Earned puter science? Second, how much re-
pipeline with people with disabilities, Doctorates is self-reported, and there cruiting for minorities at the bachelor’s
as a person can become or be recog- may be some reluctance for a person degree level is targeted to community
nized as disabled at any point in one’s with a disability who has achieved such colleges? Lastly, for the Ph.D.-granting
life. For example, a person with the de- a high level to report their disability. institutions, how much recruiting for
generative syndrome retinitis pigmen- minorities for the Ph.D. is done at non-
tosa is not born blind, but will become Summary Ph.D.-granting institutions? These are
blind gradually over time. Moreover, a The data presented in this column all important questions, whose answers
student may not be recognized with a demonstrates the importance of us- could lead to actions that improve the
learning disability until problems arise ing multiple sources with respect to number of undergraduate and gradu-
when the student attends college. obtaining the data about minorities ate minority students at Ph.D.-granting
We are very fortunate that Joan and people with disabilities in com- institutions.
Burrelli, retired Senior Science Re- puter science. To understand the broad Finally, our comparison of the Taul-
sources Analyst from the NSF National trends about minorities in computer bee and WebCASPER data and the re-
Center for Science and Engineering science, one must consider two-year in- sults from the recent TauRUs (Taulbee
Statistics was able to provide us with stitutions, for-profit institutions, non- for the Rest of Us) survey1 indicate a
data for this column from the National research institutions, as well as Ph.D.- need for a more comprehensive an-
Center for Education Statistics, 2008 granting institutions. For example, the nual report of the demographics of
National Postsecondary Student Aid number of minority students receiving computing students than is currently
Study (available through their Data associate’s degrees is in the same range being done. Decision makers at all lev-
Analysis System and the National Sci- as the number receiving bachelor’s de- els need better data about minority and
ence Foundation) and the 2008 Survey grees in computer science. disabled students on which to base
of Earned Doctorates. These reports It is good to find the trends for their decisions.
indicate that in terms of enrollment Blacks and Hispanics at the bachelor’s
between the years 2004–2008, 12% of and master’s levels are not as bleak as Reference
1. Goldwasser, M. TauRUs: A Taulbee Survey for the Rest
undergraduate IT majors and 8% of portrayed in the Taulbee data. The per- of Us. ACM Inroads 2, 2 (2011), 38–42.
graduate IT students had a disability centage of Blacks earning bachelor’s
(where IT refers to computer science, degrees at the 1,283 WebCASPER insti- Valerie E. Taylor (taylor@cse.tamu.edu) is the Royce E.
information science and systems, and tutions is about 10%, which approaches Wisenbaker Professor in the Department of Computer
Science and Engineering at Texas A&M University and
computer engineering). the approximately 12% representation Executive Director of CMD-IT.
By contrast, only 0.7% (63 Ph.D.’s) in the general population (2010 census). Richard Ladner (ladner@cs.washington.edu) is the
received a doctorate in computer sci- The percentage of Hispanics earning Boeing Professor in Computer Science and Engineering at
the University of Washington, Seattle, WA.
ence in the same period. The 12% at bachelor’s degrees at the WebCASPER
the undergraduate level is considered institutions is about 7.8% compared to Copyright held by author.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 37


V
viewpoints

doi:10.1145/2043174.2043188 Peter J. Denning

The Profession of IT
The Grounding Practice
The skill of making and recognizing grounded claims is essential for professional
practice. Getting objective data to support your conclusions is not enough.

I
n my wo r k ,I constantly have a claim accompanied by sufficient, rel-
to assess whether claims are evant supporting evidence.
valid. This applies not only If you reflect for a moment on claims
to my own claims, but to the you have heard, you will see that most
claims of others that I am con- claims are actually subjective. They are
sidering as evidence to support my hypotheses, judgments, evaluations,
claims or to launch actions. or opinions that something is true.
The problem of validating claims That is why supporting evidence is so
seems to be growing in recent years. important. Good evidence makes the
Google searches yield many exagger- claim credible to listeners and makes
ated claims that are not useful as evi- it easy for them to accept. The evi-
dence. Many apparently independent dence can be either facts or opinions,
news items all derive from a single or a mixture of the two:
source, such as a press release, whose ˲˲ Objective evidence consists of facts.
accuracy cannot be verified. Even Facts are statements generally accept-
the crowdsourced Wikipedia can be ed as true. Facts can be independently
untrustworthy. How do we recognize re-verified or possibly falsified.
or generate valid claims in this envi- ˲˲ Subjective evidence consists of
ronment? opinions. Opinions are evaluations,
Some Web services already offer form experiments to back up your hy- judgments, or assessments. Whether
help with the quality of evidence. Rep- potheses before asking others to act we accept an opinion as supportive of
utation.com, a for-profit, locates de- on them. Another is the agile devel- a claim depends on how much we trust
rogatory information about its clients oper mantra to “fail fast and often”— the opinion maker.
and tries to neutralize it or cut off the meaning organize your project so Evidence is sufficient if it deals with
sources. Snopes.com investigates ur- that you only move forward with com- all the objections listeners are likely to
ban myths and other hot “memes” and ponents that pass quick field tests. As have. Evidence is relevant if it supports
rates them according to whether they useful as these practices are, they do the claim and omitting it would weak-
can be independently verified. Truth- not directly address the formation of en the claim. In the next sections, I will
Seal.org vets and guarantees claims, valid and compelling claims. Bad de- give examples of objective grounding
and pays bounties to those who suc- cisions based on (insufficient) data, from science and subjective ground-
Illustration by sim plegra ph ic / sh utt erstock.com

cessfully refute them. Idoscience.org and failures that teach us nothing, are ing from team-member selection.
helps kids doing science experiments all too common. The preceding structural descrip-
obtain data to sustain or refute their Let us examine the deeper structure tion is not sufficient to guarantee that
science claims. of valid claims. We will see a practice listeners will actually accept a claim.
For our daily work we need not called “grounding claims” that consis- Various other factors influence listen-
Web services, but practices that en- tently produces them. ers, including:
able us to generate valid claims and ˲˲ Plausibility—does the claim make
recognize when others’ claims are The Deep Structure of Many sense?
valid. One commonly recommended Professional Claims ˲˲ Balance—does the evidence deal
practice toward this goal is to “base A claim is a statement that asserts with competing or opposing claims?
decisions on data”—meaning per- something is true. A grounded claim is ˲˲ Commitment—does the speaker

38 communicat ions of t he acm | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

defend the claim and deal with its con- What was Sims’s problem? He
sequences? made a claim and backed it up with a
Sometimes, even the speaker’s com- Everywhere we turn, lot of objective evidence. Historians
portment will affect listeners’ willing- we are making tell us the reason leadership rejected
ness to accept your claim. his claim was they believed his pro-
Often unconsciously, we rely on or hearing claims, posal would disrupt the “ship society.”
these distinctions in our daily work. and we base our The gunner corps was elite and spe-
As professionals, we size up a client’s cially trained. Sims advocated a capa-
problem and claim whether or not actions on them. bility that would allow any sailor to be
we can help. As managers, we evalu- a good gunner. Sims’s mistake was to
ate alternative means to get projects assume hit rate was the main criterion
done and claim the least expensive or of importance to the Navy leadership.
fastest one. As leaders, we try to mo- They had other standards around the
bilize people to take care of a concern social impact of the new technology.
by claiming a path of acceptable risk. public experiment showing that the Had Sims included arguments about
Everywhere we turn, we are making or O-ring material became hard and brit- how the technology would enhance
hearing claims, and we base our ac- tle in a glass of ice water. His simple the ship society, he might have gotten
tions on them. demonstration instantly pushed the a different response.
Note that there are many other claim over a threshold of credibility. These examples illustrate a key
forms of argument and rhetoric than The commission concluded that al- point about grounded claims. It is wise
the type being considering here. In the though the data was available it had to learn all the criteria important to the
sciences and professions, we want to not been presented in a compelling listeners and provide relevant and suf-
persuade based on evidence. That is way to NASA managers determined ficient evidence for each criterion. Oth-
the sole focus of this column. to launch. Although the potential erwise, the grounding offered will not
for a well-grounded claim was there, be compelling.
Objective Grounding the NASA managers did not “hear”
Some professions such as science, en- the engineers’ actual claims as well Subjective Grounding
gineering, and medicine have strong grounded. The commission also con- As a manger or leader, you hire or select
traditions of grounding their claims. cluded the NASA managers were not people to be on your team. You are very
In science someone who makes a new open that morning to any claim that interested in their competence because
claim (hypothesis) is expected to sup- launching was too dangerous. without it your team cannot perform.
port the claim with data, logic, and oth- In the early 1900s, U.S. Navy Lieu- When you interview people for a place
er evidence that will allow others to ac- tenant William Sims observed that on your team, you have to evaluate their
cept the claim. The peer review process British ships whose gunners used competence claims. These claims can
for publication tries to evaluate wheth- hand cranks to dynamically adjust the seldom be objectively grounded, but
er claims are well grounded, and seeks angle of cannons had much higher hit they can be subjectively grounded.
to reject papers whose claims are not. rates during battle.1 He measured the To be competent means to be able
In these professions a claim evolves British hit rate around 10% and the to perform standard actions in a com-
from the status of hypothesis to fact U.S. rate less than 1%. He advocated munity without supervision and with-
over a period of time. Initially, a hy- to the U.S. Navy that continuous-aim out causing breakdowns for custom-
pothesis will have few followers. Over gunnery would turn more battles ers. Communities develop criteria for
time, it will gain allies as others test to U.S. wins. Navy officials ignored assessing competence and awards for
and confirm it for themselves. Even- Sims’s initial technical reports. He recognizing outstanding examples. In
tually, when it is universally accepted wrote more reports, offering more assessing a competence claim, it is very
and no one can find contrary evidence, data; they continued to ignore him. important to learn what the commu-
the hypothesis will be accepted as a He became very critical of their at- nity members say about the prospect
fact by the community. Even so, scien- titude. They saw him as an egotisti- in terms of performance tests, recogni-
tific facts are subject to refutation later cal crank. Eventually he decided his tions, and testimonials.
if new evidence turns up, for example, career was tanked and wrote a com- The first thing to notice about a
new data from more precise instru- plaint directly to his Commander-in- competence claim is that it enters
ments. This is why science sociologist Chief, President Theodore Roosevelt. your awareness with the status of a
Bruno Latour says that science is a pro- Lucky for him, Roosevelt thought his hypothesis, and evolves to an accept-
cess of constructing facts.4 claim had merit and brought him to able statement as you consider the
In its investigation of the space Washington to oversee Naval Target evidence in light of your acceptance
shuttle Challenger disaster on a cold Practice. This got his innovation ad- criteria. Unlike a scientific hypothesis,
morning in January 1986, the Rogers opted and, in the end, won him great this claim cannot evolve to the status
Commission debated without resolu- respect and honor. But Roosevelt’s in- of a fact. The reason is that your “data”
tion the hypothesis that O-ring failure tervention was a stroke of luck. Most is actually the opinions of others, who
was the cause. Physicist Richard Feyn- officers who buck their chains of com- may not agree on the interpretation of
man confirmed it dramatically with a mand so flagrantly are dismissed. what they have witnessed. Since your

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 39


viewpoints

Anatomy of a grounded claim. Conclusion


The grounding practice is the skill of
making claims that move people to ac-
Distinction Explanation tion. Your goal as a speaker is to give
Structure of Claim Statement that something holds; the something your listeners plenty of well-crafted
a supported claim is usually a hypothesis, evaluation, judgment, or opinion. support for accepting your claim. Your
Fact Statement that something is verifiably true; can goal as a listener is to decide whether
be independently re-verified, and possibly falsified.
you accept the claim based on the sup-
Opinion A judgment, assessment, or estimate of something.
porting evidence.
Grounding Series of statements relevant to and supportive
of the claim. For objective grounding, all supporting state-
The accompanying table summa-
ments are facts. For subjective grounding rizes the distinctions for grounding
some supporting statements may be opinions by trusted claims. A grounded claim consists
speakers.
of the claim statement and a set of
Acceptance criteria Domain Community, discourse, discussion, situation, event
supporting statements. The accep-
of the claim and its to which the claim applies.
support tance criteria can vary according to
Purpose The point of making the claim. What are the
concerns? Who cares? the listeners and their standards, but
Relevance Are supporting statements connected to the domain and will include one or more of the eight
purpose of the claim? criteria listed.
Sufficiency Are supporting statements sufficient in number to support When gathering data, therefore,
the claim? you need to fully understand the lis-
Frame Is the claim credible to the community to which tener’s concerns and interests. You
it is offered?
want to supply relevant data—directly
Balance Does the speaker deal adequately with opposing argu-
supporting the claim and the purpose
ments?
for which you make the claim—and
Commitment Is speaker committed to defending the claim and taking care
of its consequences? Is the speaker’s purpose genuine? omit irrelevant data no matter how in-
Comportment Does the speaker display confidence? Authenticity? Cen- teresting they may be to you. You want
teredness? to have a sufficient amount of data to
be convincing.
The grounding practice is also
helpful for assessing background as-
data are is all facts, the claim cannot “truthiness” to mean the presenta- sumptions to see whether they are
be objectively grounded. However, it tion of claims supported only by gut well grounded in the current environ-
can be subjectively grounded—mean- feelings or emotional beliefs. Truthy ment. For example, is my self-assess-
ing that you are willing to accept both claims are nothing more than un- ment that “I’m not good at manage-
the supporting facts and opinions and grounded assessments. Nevertheless, ment,” or my group’s assumption
bring the candidate onto your team. they can gain allies. that “There’s no way to cut the red
In addition to community opinions, Negative political advertisements, tape,” grounded? If an important as-
you will want to evaluate whether the which are common during elections, sumption is not grounded, we would
prospect’s expertise matches what you are effective because so few listen- want to stop accepting and acting on
need for your team and whether their ers are skeptical; they are willing to it, and replace it with a new, grounded
expertise meets all your requirements. accept claims without evidence. If assumption.
For example, you may not want a com- they actually checked for supporting Grounding claims is an essential
petent Web programmer when your evidence, they would find the claims professional skill for practitioners,
team needs a game programmer, or unsupportable, and reject them. Rick managers, and leaders. The skill sup-
you may want someone who is compe- Hayes-Roth has recently devoted an ports many decisions, and will help us
tent both at programming and team entire book to this problem and to design trustworthy systems.
management. You may have other cri- means for combating it.2,3
teria to evaluate as well. It is quite easy to confuse a truthy References
1. Denning, P.J. and Dunham, R. The Innovator’s Way:
Once the person joins your team, claim with a truth because the claim Essential Practices for Successful Innovation. MIT
you are hardly done in evaluating their is often worded in a manner such as “I Press (2010).
2. Hayes-Roth, R. (interviewed by Peter Denning).
claims. For example, in daily work, claim that…” or “I assert that…” or even Honesty is the best policy. ACM Ubiquity (July 2011);
http://ubiquity.acm.org/article.cfm?id=2002437. DOI:
the person will claim they can deliver “It is true that…” Do not let the choice 10.1145/2002436.2002437.
a result by a certain time. Can you ac- of words obscure the basic distinctions 3. Hayes-Roth, R. Truthiness Fever. Booklocker.com, 2011.
4. Latour, B. Science in Action. Harvard University Press,
ceptably ground that their claim is of claims and grounding. Get into the 1987.
within their competence, that they are general habit of noticing whether a
sincere, and that they are reliable? statement is a claim, an opinion, or Peter J. Denning (pjd@nps.edu) is Distinguished
Professor of Computer Science and Director of the
a fact. Then you are on solid ground Cebrowski Institute for information innovation at the
Truthiness when it is time to decide whether to act Naval Postgraduate School in Monterey, CA, and is a past
president of ACM.
In 2005 Stephen Colbert, the politi- on the claim or not. Make acceptance
cal satirist, proposed the new word of a claim a conscious choice. Copyright held by author.

40 communicat ions of t he ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


V
viewpoints

doi:10.1145/2043174.2043203 Andrew Bernat and Eric Grimson

Viewpoint
Doctoral Program Rankings
for U.S. Computing
Programs: The National
Research Council Strikes Out
A proposal for improving doctoral program ranking strategy.

W
hy do we care about dimensional space into a one-dimen-
rankings of graduate sional ordered set of integers. (That
programs? Beyond the this cannot really be done in a princi-
ability to cheer “We’re pled or defensible manner is one of the
Number One!” there fundamental problems with rankings.)
are very practical reasons. For exam- Of course the practical difficulties
ple, resource allocation is often based are enormous. Among them:
on using rankings as synonyms for ˲˲ What metrics should you be in-
quality indicators. An institution re- cluding?
cently decided it would become a “top ˲˲ How do you get (accurate) values
25 institution” by ensuring that each for these metrics?
of its graduate programs was ranked ˲˲ The mapping is going to require
within the top 25% of all the graduate weighting of these metrics and how
programs in the corresponding fields. do you determine these weights? Ef-
And it was going to accomplish this by fectively how should one weigh pub-
simply eliminating any program that lication counts vs. citation counts vs.
was not—mission accomplished! Be- external grants vs. faculty awards vs.
sides resource allocation, prospective entering student GRE scores vs. any
graduate students and faculty candi- other factors?
dates look to rankings when deciding So there are ample reasons why
where to apply, so the rankings for rankings based upon a transparent
U.S. institutions considered in this comprehensive analysis are not done
Illustration by Serg ej Kh a ki mullin / Sh utt erstock. co m

Viewpoint are of considerable interest frequently. Nevertheless, the U.S.


both within the U.S. and internation- research funding, number of quality National Research Council (NRC),
ally. Funders look at rankings when publications, impact of publications, through its Committee on an Assess-
considering ability to perform the awards, entering student GRE scores, ment of Research Doctorate Programs,
proposed research. Alumni look to placement success of graduates, plus bravely tackled this thorny problem for
rankings when making donation deci- any other factor deemed important by U.S. institutions. (The NRC is the oper-
sions. Despite all their acknowledged the community). ating arm of the U.S. National Acad-
warts, rankings do matter. 2. Plot the value of these n metrics emies of Science and Engineering and
In principle, generating rankings is for each institution in an n-dimen- the Institute of Medicine, honorific
straightforward mathematically: sional space in which the axes are the academies with a mission to improve
1. Develop a list of the n most im- metrics. government decision making and
portant metrics (amount of external 3. Develop a mapping to turn this n- public policy, increase public educa-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 41


viewpoints

tion and understanding, and promote vey and Regression rankings—and nals (possibly primarily because they
the acquisition and dissemination of reported these probabilistically. Spe- had low-cost access to the ISI database
knowledge in matters involving sci- cifically, they ran a set of samples using of citations for journal publications).
ence, engineering, technology, and weights derived from these acquired We know that this will not provide ac-
health. In many respects, the acad- distributions, and then reported the curate results for computer science, as
emies and NRC represent the “gold range of rankings corresponding to a it misses almost all of the conference
standard” of technical policy advice in 90th percentile, meaning that with 95% publications (and corresponding cita-
the U.S. Because of the prestige of the probability, an institution’s rank would tion data), and the specific choice of
academies and the NRC, their method- lie within the designated range. In oth- this database also means that many
ologies and reports have considerable er words, as an example, the NRC states journal publications are missed.
international impact as well.) that with 95% probability Georgia Tech ˲˲ The descriptions of data to be pro-
The NRC last ranked doctoral pro- ranks somewhere between 14th and 57th vided were often ambiguous, leading
grams in the mid-1990s and these using the Survey weights and some- different institutions to respond differ-
rankings are clearly out of date. Fur- where between 7th and 28th using the ently. Thus the data being compared
ther, the earlier rankings depended Regression weights. was often not measuring the same
heavily on “reputation” as determined The first issue is that this range, aris- parameters across departments—this
by respondents and this is often an in- ing out of the probabilistic analysis, is was especially true when gathering
exact and lagging indicator. This time difficult to reconcile. What does a rank lists of faculty to be included in the
around the NRC sought to focus on a between 14th and 57th mean? How does data gathering, a factor that has impact
purely quantitative approach. one reconcile differences between the on many of the data categories since
In this Viewpoint we describe how two ranking systems—between the parameters were often measured per
this process has played out for com- Survey weights which measure what faculty member.
puting. While these comments clearly respondents claim is important and ˲˲ Measurements of scholarly quality
apply directly only to the NRC rankings the Regression weights which measure are not equivalent to measurements
effort, they are relevant to other simi- these claims against departmental rep- of scholarly quantity, that is, the most
lar efforts. utations? Of how much value is a range impactful publication is not necessar-
if a 95th percentile span is being used? ily the one with the most citations nor
The NRC Ranking Process Even if the rankings were not as im- is the most impactful professor neces-
The specifics of the NRC process were pactful as in prior NRC studies, a rigor- sarily the one with the most publica-
the following. The NRC developed a ous data collection process could have tions. There is considerable literature
single set of metrics for all 62 disci- yielded valuable data, which depart- on resolving this issue; for example, by
plines being analyzed, covering disci- ments could use to assess their stand- measuring publication quality via the
plines in science, engineering, human- ing relative to peers. Unfortunately, impact factor of the journal.
ities, social sciences, and others. It then there were a number of issues with the ˲˲ The NRC did not measure scholar-
collected the data for these metrics via quality of the data: ly productivity other than publications
questionnaires administered to insti- ˲˲ Data collection took place in 2006 and grants. For example, software arti-
tutions, programs, faculty, and Ph.D. but the ultimate release of data and facts and patents were not considered.
students plus submitted faculty CVs. rankings was in 2010 (with corrections ˲˲ The NRC did not get CVs from all
Determining the weights was done via well into 2011). For some metrics small faculty so they simply scaled results by
two related approaches: ask a set of changes might have large consequenc- the number of faculty in a given depart-
participants how much various metrics es, for example, given our low num- ment. This approach is easily gamed by
mattered in their perception of depart- bers of female faculty the addition of having only the most productive facul-
ment rankings, and a linear regression a single woman would result in a large ty provide CVs.
of a set of rankings vs. these metrics. percentage impact on the diversity ˲˲ The list of faculty awards was seri-
Because these two approaches yielded metric or the departure/arrival of a sin- ously incomplete—computer science
substantively different results, the NRC gle highly productive faculty member was not even listed as a distinct cate-
established two sets of rankings—Sur- would similarly have a large impact on gory. The ACM A.M. Turing Award was
the scholarly productivity metric. not considered “Highly Prestigious”;
˲˲ The metrics to be used are not no awards from organizations other
Despite all discipline specific—exactly the same than ACM and IEEE were included;
information was collected for physics, and many other gaps in awards were
their acknowledged for English, for computing, and for ev- apparent.
warts, rankings ery other discipline. But we know that ˲˲ The NRC chose to invent data
publication practices in particular vary when they could not obtain it, for ex-
do matter. significantly across disciplines: the hu- ample, for entering student GRE scores
manities rely heavily on book publica- the NRC used the national average for
tions; the computing fields rely heavily these scores when an institution did
on conferences. However, the NRC de- not collect or provide them.
cided that the metric to use for schol- The second issue noted here has
arly publication was going to be jour- gained the most attention from our

42 commun ications of t h e ac m | D ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


viewpoints

community. CRA and ACM provided ˲˲ Realize that reputation does mat-
testimony to the NRC in 2002 when ter and include it in the metrics. There
the study was just beginning, point- There are ample is an interesting feedback loop be-
ing out the importance of conferenc- reasons why tween rankings and reputation, of
es to our field. Unfortunately, this ad- course. But this also means reputation
vice was simply ignored by the NRC, rankings based has some validity as a measure of rank,
a fact we did not discover until Feb- upon a transparent so incorporate it.
˲˲ Explore making the rankings
ruary 2010. We immediately notified
the NRC, urging it to include confer- comprehensive subdiscipline-dependent. It is clear
ence publications, both for measur- analysis are not that different departments have dif-
ing publication productivity and for ferent strengths. Thus, enabling a
measuring citation impact. The NRC done frequently. finer-grained assessment would allow
ultimately agreed to do so after exten- a department with strength in a sub-
sive discussion at various levels. CRA field, but perhaps not the same across-
worked with its member societies to the-board strength, to gain appropri-
provide a list of quality conferences; ate visibility. This may be particularly
due to the tight deadline we know valuable for students deciding where
that this list is not 100% complete or deal with the multiple possible titles of to apply.
accurate. The NRC took this list and publications—Commun. ACM = CACM = ˲˲ Use data mining to generate schol-
then searched all vitae provided by Communications—self-reported by fac- arly productivity data to replace com-
CS faculty (which we also know to be ulty on their CVs. The conference pub- mercially collected citation data that is
incomplete) to generate conference lication numbers do not provide much incomplete and expensive.
publication counts. Since citations confidence that they were. ˲˲ Have institutions collect the re-
for conference publications were not One might suggest that the cen- maining data under clear guidelines.
available via the ISI database used by tral problem is that computer sci- ˲˲ Provide a time period during
the NRC, citation data was not used ence is unusual in its practices, and which departments can correct errors
at all for computer science as alterna- that our field is simply an outlier. in the data collected. The NRC did al-
tives were not acceptable to the NRC. This does not appear to be the case. low institutions to correct some errors
Based upon the NRC’s analysis, a typi- The Council of the American Socio- of fact, but the allowable corrections
cal department had one conference logical Association recently passed did not include publication counts and
publication per faculty member per a resolution condemning the NRC other information. And the NRC appar-
year. In our view, this is not credible. rankings and saying that they should ently refused to remove data it invent-
Further, the NRC claims that more not be used for program evaluation. ed, such as substituting national GRE
computing publications appear in Input from colleagues suggests that average scores for institutions that do
journals than in conferences, which other fields, such as aeronautics/as- not record such information.
is very difficult to reconcile with what tronautics and chemical engineering ˲˲ Provide sample weights but al-
we see in practice. are uncomfortable with the NRC pro- low individuals to develop their own
Similarly, CRA worked with its cess, for many of the reasons we have weights and apply them to the col-
member societies to put together lists raised in this Viewpoint. lected data so that they can generate
of the awards that should be included So we have a situation in which in- rankings of interest to them. We real-
and to correctly categorize them as correct data are provided for invalid ize this does not satisfy the desire for
“Highly Prestigious” or “Prestigious.” metrics and rankings are calculated us- single overarching rankings. However,
This is not a trivial process; for exam- ing weights that are not readily under- it does provide a tool of potential value
ple, does one include the many SIG stood. It would be easy to dismiss the for individual departments seeking to
awards? Again, the deadline to provide entire process except that institutions compare themselves against peers.
the list was tight and we are unable to are using the results to make program- We do not claim that this strategy
verify that our list was applied. Thus, it matic decisions including closing pro- will eliminate all of the many issues
is not clear that the NRC even now has grams. At a recent symposium, many with rankings, but it will provide a con-
a meaningful method for measuring university administrators expressed sistent set of fundamental data that
faculty awards. considerable support for continuing administrators, faculty, students and
Just as troubling is that various the data collection effort, and generat- others can use to understand depart-
member departments have not been ing rankings if it can be accomplished mental strengths and weaknesses in a
able to verify the data that the NRC in a meaningful way. way that matters to them.
presents. That is, using the same vita
and publication and awards listings, Conclusion Andrew Bernat (abernat@cra.org) is the executive
director of the Computing Research Association in
they simply cannot reproduce the So how should the process work? Here Washington, D.C.
numbers that the NRC provides for are our suggestions: Eric Grimson (welg@csail.mit.edu) is a professor of
their departments. The NRC process ˲˲ Work with the relevant societies in computer science and engineering at Massachusetts
Institute of Technology in Cambridge, MA.
used temporary workers trained by the order to generate metrics that matter
NRC staff. Perhaps they were unable to to their constituents. Copyright held by author.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 43


practice
doi:10.1145/2043174.2043189
Since then, postmortem debugging
Article development led by
queue.acm.org
technology has been developed and
used in many different systems, includ-
ing all major consumer and enterprise
Many modern dynamic languages lack tools operating systems, as well as the na-
for understanding complex failures. tive execution environments on those
systems. These environments make
By David Pacheco up much of today’s core infrastructure,
from the operating systems that under-

Postmortem
lie every application to core services
such as Domain Name System (DNS),
and thus form the building blocks of
nearly all larger systems. To achieve the

Debugging
high levels of reliability expected from
such software, these systems are de-
signed to restore service quickly after
each failure while preserving enough

in Dynamic
information that the failure itself can
later be completely understood.
While such software was histori-
cally written in C and other native

Environments
environments, core infrastructure is
increasingly being developed in dy-
namic languages, from Java over the
past two decades to server-side JavaS-
cript over the past 18 months. Dynam-
ic languages are attractive for many
reasons, not least of which is that they
often accelerate the development of
complex software.
Conspicuously absent from many of
D espite the be st efforts of software engineers to these environments, however, are fa-
produce high-quality software, inevitably some bugs cilities for even basic postmortem de-
bugging, which makes understanding
escape even the most rigorous testing process and production failures extremely difficult.
are first encountered by end users. When this Dynamic languages must bridge this
happens, such failures must be understood quickly, gap and provide rich tools for under-
standing failures in deployed systems
the underlying bugs fixed, and deployments patched in order to match the reliability de-
to avoid another user (or the same one) running manded from their growing role in the
bedrock of software systems.
into the same problem again. As far back as 1951, To understand the real potential
the dawn of modern computing, Stanley Gill6 wrote for sophisticated postmortem analysis
that “some attention has, therefore, been given to tools, we first review the state of debug-
ging today and the role of postmortem
the problem of dealing with mistakes after the analysis tools in other environments.
program has been tried and found to fail.” Gill went We then examine the unique chal-
lenges around building such tools for
on to describe the first use of “the post-mortem dynamic environments and the state
technique” in software, whereby the running program of such tools today.
was modified to record important system state as it
Debugging in the Large
ran so that the programmer could later understand To understand the unique value of
what happened and why the software failed. postmortem debugging, it is worth

44 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


examining the alternative. Both na- der to figure out where it goes wrong. various events is significantly affected
tive and dynamic environments today Often this process is repeated to test by the debugger itself. More impor-
provide facilities for in situ debugging, successive theories about the problem. tantly, in situ debugging is often un-
or debugging faulty programs while This technique can be very effective tenable on production systems: many
they’re still running. This typically in- for bugs that are readily reproducible, debuggers rely on an unoptimized
volves attaching a separate debugger but it has several drawbacks. First, the debug build that’s too slow to run in
Illustration by Gary Neill

program to the faulty program and act of stopping a program often chang- production; engineers often do not
then directing execution of the faulty es its behavior. Bugs resulting from have access to the systems where the
program interactively, instruction by unexpected interactions between par- program is running (as in the case of
instruction or using breakpoints. The allel operations (such as race condi- most mobile and desktop applications
user thus stops the program at various tions) can be especially challenging to and many enterprise systems); and the
points to inspect program state in or- analyze this way because the timing of requisite debugging tools are often not

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 45


practice

available on those systems anyway. common technique for dealing with use.” Most importantly, after the sys-
Even for cases where engineers can the reproducibility issue. In this ap- tem saves all the program state, it can
access the buggy software with the proach, engineers modify the software restart the program immediately to
tools they need, pausing the program to log bits of relevant program state restore service quickly. With such sys-
in the debugger usually represents an at key points in the code. This causes tems in place, even rare bugs can often
unacceptable disruption of produc- data to be collected without human be root-caused and fixed based on the
tion service and an unacceptable risk intervention so it can be examined first occurrence, whether in develop-
that a fat-fingered debugger command after a problem occurs to understand ment, test, or production. This enables
might cause the program to crash. Ad- what happened. By automating the software vendors to fix bugs before too
ministrators often cannot take the risk data collection, this technique usually many users encounter them.
of downtime in order to understand a results in significantly less impact to To summarize, in order to root-
failure that caused a previous outage. production service because when the cause failures that occur anywhere
More importantly, they should not have program crashes, the system can im- from development to production, a
to. Even in 1951 Gill cited the “extrava- mediately restart it without waiting postmortem debugging facility must
gant waste of machine time involved” for an engineer to log in and debug the satisfy several constraints:
in concluding that “single-[step] op- problem interactively. ˲˲ Application software must not
eration is a useful facility for the main- Extracting enough information require modifications that cannot be
tenance engineer, but the programmer about fatal failures from a log file is used in production in order to support
can only regard it as a last resort.” often very difficult, however, and fre- postmortem debugging, such as unop-
The most crippling problem with quently it is necessary to run through timized code or additional debug data
in situ debugging is it can only be used several iterations of inserting addi- that would significantly impact perfor-
to understand reproducible problems. tional logging, deploying the modified mance (or affect correctness at all).
Many production issues are either very program, and examining the output. ˲˲ The facility must be always on: It
rare or involve complex interactions This, too, is untenable for production must not require an administrator to
of many systems, which are often very systems since ad hoc code changes are attach a debugger or otherwise enable
difficult to replicate in a development often impractical (in the case of desk- postmortem support before the prob-
environment. The rarity of such is- top and mobile applications) or pro- lem occurs.
sues does not make them unimport- hibited by change control policies (and ˲˲ The facility must be fully auto-
ant: quite the contrary, an operating common sense). matic: It should detect the crash, save
system crash that happens only once a The solution is to build a facility program state, and then immediately
week can be extremely costly in terms that captures all program state when allow the system to restart the failed
of downtime, but any bug that can be the program crashes. In 1980 Douglas component to restore service as quick-
made to occur only once a week is very R. McGregor and Jon R. Malone9 of the ly as possible.
difficult to debug live. Similarly, a fatal University of Strathclyde in Glasgow ˲˲ The dump (saved state) must be
error that occurs once a week in an ap- observed that with this approach comprehensive: a stack trace, while
plication used by thousands of people “there is virtually no runtime overhead probably the single most valuable
may result in many users hitting the bug in either space or speed” and “no ex- piece of information, very often does
each day, but engineers cannot attach a tra trace routines are necessary,” but not provide sufficient information to
debugger on every user’s system. the facility “remains effective when a root-cause a problem from a single oc-
So-called printf debugging is a program has passed into production currence. Usually engineers want both
global state and each thread’s state
Figure 1. A simple MDB example. (including stack trace and each stack
frame’s arguments and variables). Of
$ mdb core course, there’s a wide range of pos-
Loading modules: [ ld.so.1 ] sible results in this dimension; the
> ::status “constraint” (such as it is) is that the
debugging core file of example1 (32-bit) from solaron
file: /export/home/dap/tmp/example1
facility must provide enough informa-
initial argv: ./example1 tion to be useful for nontrivial prob-
threading model: native threads lems. The more information that can
status: process terminated by SIGSEGV (Segmentation Fault), addr=10
be included in the dump, the more
> ::walk thread | ::findstack -v likely engineers will be able to identify
stack pointer for thread 1: 8047b98 the root cause based on just one oc-
[ 08047b98 func+0x20() ] currence.
08047bbc main+0x21(1, 8047bdc, 8047be4)
˲˲ The dump must be transferable to
08047bd0 _start+0x80(1, 8047cc4, 0, 8047ccf, 8047cdc, 8047ced)
other systems for analysis. This allows
> func+0x20::dis engineers to analyze the data using
...
whatever tools they need in a familiar
func+0x20: movl $0x0,(%eax)
... environment and obviates the need
for engineers to access production sys-
tems in many cases.

46 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

Postmortem Debugging Figure 2. Analyzing thread stacks.


in Native Environments
To understand the potential value of
postmortem debugging in dynamic
languages, it is also helpful to exam-
> ::stacks -m zfs
ine those areas where postmortem THREAD STATE SOBJ COUNT
analysis techniques are well developed ffffff0007c0fc60 SLEEP CV 2
and widely used. The best examples swtch+0x147
cv_wait+0x61
are operating systems and their na- txg_thread_wait+0x5f
tive execution environments. Histori- txg_quiesce_thread+0x94
cally this software has comprised core thread_start+8
infrastructure; failures at this level of
ffffff0007f51c60 FREE <NONE> 1
the stack are often very costly either be- cpu_decay+0x2f
cause the systems themselves are nec- bitset_atomic_del+0x38
essary for business-critical functions apic_setspl+0x5c
do_splx+0x50
(as in the case of operating systems on disp_lock_exit+0x55
which business-critical software is run- cv_signal+0x96
ning) or because they are relied upon taskq_dispatch+0x351
by business-critical systems upstack zio_taskq_dispatch+0x6b
zio_interrupt+0x1a
(as in the case of infrastructure services vdev_disk_io_intr+0x6b
such as DNS). biodone+0x84
Most modern operating systems can dadk_iodone+0xe7
dadk_pktcb+0xc6
be configured so that when they crash, ata_disk_complete+0x119
they immediately save a “crash dump” ata_hba_complete+0x38
of all of their state and then reboot. ghd_doneq_process+0xb3
Likewise, these systems can be config- 0x16
dispatch_softint+0x3f
ured so that when a user application
crashes, the operating system saves a ffffff0007b25c60 SLEEP CV 1
“core dump” of all program state to a swtch+0x147
cv_timedwait+0xba
file and then restarts the application.
arc_reclaim_thread+0x17b
In most cases, these mechanisms al- thread_start+8
low the operating system or user ap-
plication to return to service quickly ffffff0007b2bc60 SLEEP CV 1
swtch+0x147
while preserving enough information cv_timedwait+0xba
to root-cause the failure later. l2arc_feed_thread+0xa5
As an example, let’s look at core thread_start+8
dumps under Illumos, an open source
ffffff0009b95c60 SLEEP CV 1
Solaris-based system. Take the follow- swtch+0x147
ing broken program: cv_timedwait+0xba
txg_thread_wait+0x7b
txg_sync_thread+0x114
1 int thread_start+8
2 main(int argc, char *argv[])
3 { ffffff01e26d08e0 SLEEP CV 1
4 func(); swtch+0x147
cv_wait+0x61
5 return (0); txg_wait_synced+0x7f
6 } spa_sync_allpools+0x76
7 zfs_sync+0xce
vfs_sync+0x9c
8 int syssync+0xb
9 func(void) sys_syscall32+0x101
10 {
11 int ii; ffffff0007c15c60 SLEEP CV 1
swtch+0x147
12 int *ptrs[100]; cv_wait+0x61
13 zio_wait+0x5d
14 for (ii = -1; ii < 100; ii++)
 dsl_pool_sync+0xe1
spa_sync+0x32a
15 *(ptrs[ii]) = 0; txg_sync_thread+0x265
16 thread_start+8
17 return (0);
18 }

This simple program has a fatal flaw:

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 47


practice

while trying to clear each item in the preter and native module developers core dump than to try to guess what
ptrs array at lines 14–15, it clears an to pick apart the C representations of steps they took that led to the crash
extra element before the array (where Python-level objects. and then reproduce the problem from
ii = -1). When running this program, MDB takes this idea to the next lev- those steps. Examining the core dump
you see: el: it was designed specifically around is also the only way to be sure the prob-
building custom tools for understand- lem you found is the same one the bug
$ gcc -o example1 example1.c ing specific components of the system reporter encountered.
$ ./example1 both in situ and postmortem. On Il- Higher-level dump analysis tools can
Segmentation Fault (core dumped) lumos systems, the kernel ships with be built explicitly for development as
MDB modules that provide more than well. Libumem, a drop-in replacement
and the system generates a file called 1,000 commands to iterate and inspect for malloc(3c) and friends, provides
core. The Illumos modular debugger various components of the kernel. (among other features) an MDB mod-
(MDB) shown in Figure 1 can help in Among the most frequently used is the ule for iterating and inspecting objects
examining this file. ::stacks command, which iterates related to the allocator. Combined with
MDB’s syntax may seem arcane to all kernel threads, optionally filters an optional feature to record stack
new users, but this example is rather them based on the presence of a par- traces for each allocator operation, the
basic. First the ::status command ticular kernel module or function in ::findleaks MDB command can be
produces a summary of what hap- the stack trace, and then dumps out a used to identify various types of mem-
pened: the process was terminated as a list of unique thread stacks sorted by ory leaks very quickly without having
result of a segmentation fault attempt- frequency. Figure 2 offers an example added any explicit support for this in
ing to access memory address 0x10. from a system doing some light I/O. the application itself. The ::findleaks
Next the ::walk thread | ::find- This invocation collapsed the com- command literally prints out a list of
stack -v command is used to exam- plexity of more than 600 threads on leaked objects and the stack trace from
ine thread stacks (in this case, just this system to only about seven unique which each one was allocated—point-
one), and it shows that the program thread stacks that are related to the ing directly to the location of each leak.
died in function func at offset 0x20 in ZFS file system. You can quickly see Libumem is based on the kernel memo-
the program text. Then the file dumps the state of the threads in each group ry allocator, which provides many of the
out this instruction to see that the pro- (e.g., sleeping on a condition variable) same facilities for the kernel.2
cess died on the store of 0 into the ad- and examine a representative thread
dress contained in register %eax. for more information. Dozens of other Postmortem Debugging
While this example is admitted- operating-system components deliver in Dynamic Environments
ly contrived, it illustrates the basic their own MDB commands for inspect- While operating-system and native
method of postmortem debugging. ing specific component state, includ- environments have highly developed
Note that unlike in situ debugging, this ing the networking stack, the NFS serv- facilities for handling crashes, saving
method scales well with the complex- er, DTrace, and ZFS. dumps, and analyzing them postmor-
ity of the program being debugged. If Some of these higher-level analysis tem, the problem of postmortem analy-
instead of one thread in one process tools are quite sophisticated. For exam- sis (and software observability more
there were thousands of threads across ple, the ::typegraph command3 ana- generally) is far from solved in the realm
dozens of components (as in the case lyzes an entire production crash dump of dynamic environments such as Java,
of an operating system), a comprehen- (without debug data) and constructs Python, and JavaScript. In the past post-
sive dump would include information a graph of object references and their mortem analysis was arguably less criti-
about all of them. The next challenge types. With this graph, users can query cal for these languages because crashes
would be making sense of so much in- the type of an arbitrary memory object. in these environments are less signifi-
formation, but root-causing the bug is This is useful for understanding mem- cant: most end-user applications save
at least tractable because all the infor- ory corruption issues, where the main work frequently anyway, and the operat-
mation is available. problem is identifying which compo- ing system or browser will often restart
In such situations, the next step is nent overwrote a particular block of the application after a crash. These
to build custom tools for extracting, memory. Knowing the type of the cor- crashes still represent disruptions to
analyzing, and summarizing specific rupting object narrows the investiga- the user experience, however, and post-
component state. A comprehensive tion from the entire kernel to the com- mortem debugging is the only hope of
postmortem facility enables engineers ponent responsible for that type. understanding such failures.
to build such tools. For example, gdb Such tools are by no means limited More importantly, dynamic lan-
supports user-defined macros. These to production environments. On most guages such as Node.js are exploding in
macros can be distributed with the systems, it is possible to generate a popularity as building blocks for larger
source code so that all developers can core dump from running processes distributed systems, where what might
use them both in situ (by attaching gdb too, which make core-dump analy- seem like a minor crash can cause cas-
to a running process) and postmortem sis attractive during development as cading failures up the stack. As a result,
(by opening a core file with gdb). The well. When testers or other engineers just as with operating systems and core
Python interpreter, for example, pro- file bugs on application crashes, it is services, fully understanding each fail-
vides such macros, allowing both inter- often easier to have them include a ure is essential to achieving the levels

48 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

of reliability expected of such founda- ated with them may very well be critical
tional software. to understanding a fatal failure.
Providing a postmortem facility for This problem is even more acute
dynamic environments, however, is with Node.js on the server, which is
not easy. While native programs can
leverage operating-system support for The most crippling frequently used to manage thousands
of concurrent connections to many dif-
core dumps, dynamic languages must
present postmortem state using the
problem with ferent types of components. A single

in situ debugging
Node program might have hundreds of
same higher-level abstractions with outstanding HTTP requests, each one
which their developers are familiar. A
postmortem environment for C pro-
is it can only be waiting on a database query to com-
plete. The program may crash while
grams can simply present a list of glob- used to understand processing one of the database query
al symbols, pointers to thread stacks,
and all of a process’s virtual memory
reproducible results because it encountered an in-
valid database state resulting from one
(all of which the operating system has problems. of the other outstanding queries. Such
to maintain anyway), but a similar facil- problems beg for postmortem debug-
ity for Java must augment (or replace) ging because each instance is seen rela-
these with analogous Java abstrac- tively rarely; they are essentially impos-
tions. When Java programs crash, Java sible to understand from just a stack
developers want to look at Java thread trace, but they can often be identified
stacks, local variables, and objects, not from the first occurrence, given enough
(necessarily) the threads, variables, information from the time of the crash.
and raw memory used by the Java vir- The challenge is presenting informa-
tual machine (JVM) implementation. tion about outstanding asynchronous
Also, because programs in dynamic events (that is, callbacks that will be
languages run inside an interpreter or invoked at some future time) in a mean-
VM, when the user program “crashes,” ingful way to JavaScript developers,
the interpreter or VM itself does not who generally do not have direct access
crash. For example, when a Python pro- to the event queue or the collection of
gram uses an undefined variable (the outstanding events; these abstractions
C equivalent of a NULL pointer), the are implicit in the underlying APIs, so
interpreter detects this condition and exposing this requires first figuring out
gracefully exits. Therefore, to support how to express these abstractions.
postmortem debugging, the interpret- Finally, user-facing applications have
er would need to trigger the core-dump the additional problem of transferring
facility explicitly, not rely on the oper- postmortem state from the user’s com-
ating system to detect the crash. puter to developers who can root-cause
In some cases, presenting useful the bug (while preserving user privacy).
postmortem state requires formal- As Eric Schrock11 details, this problem
izing abstractions that do not exist remains largely unsolved for one of the
explicitly in the language at all. JavaS- most significant dynamic environments
cript presents a particularly interesting today: the JavaScript Web application.
challenge in this regard. In addition There is no browser-based facility for
to the usual global state and stack de- automatically uploading postmortem
tails, JavaScript maintains a pending program state back to the server.
event queue, as well as a collection of Despite these difficulties, some
events that may happen later—both of dynamic environments do provide
which exist only as functions with as- postmortem facilities. For example,
sociated context that will be invoked the Oracle Java HotSpot VM supports
at some later time by the runtime. extracting Java-level state from JVM
For example, a Web browser might native core dumps. When the JVM
have many outstanding asynchronous crashes, or when a core file is manually
HTTP requests. For each one, there is created using operating system tools
a function with associated context that such as gcore(1), you can use the jdb(1)
may not be reachable from the global tool to examine the state of the Java
scope, and so would not be included program (rather than the JVM itself)
in a simple dump of all global state when the core file was generated. The
and thread state. Nevertheless, under- core file can also be processed by a
standing which of these requests are tool called jmap(1) to create a Java heap
outstanding and what state is associ- dump that can in turn be analyzed us-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 49


practice

ing several different programs. above languages, is widely deployed


Such facilities are only a start, how- under several completely different
ever: setting up an application to trigger runtime environments such as Mozil-
a core dump on crash in the first place la’s SpiderMonkey, Google’s V8 (used
is nontrivial. Additionally, these facili-
ties are very specific to the HotSpot VM. There is no in both Chrome and Node.js) and the
WebKit JavaScript engine. Although
There’s an active Java Community Spec- browser-based JavaScript in situ debugging facilities

facility for
ification proposal for a common API to have improved substantially in recent
access debugging information, but at years in the form of improved browser
the time of this writing this project is
stalled pending clarity about Oracle’s
automatically support for runtime program inspec-
tion, there remains no widely used
commitment to the project.8 uploading postmortem facility for JavaScript.
While the Java facility has several
important limitations, many other dy-
postmortem A Primitive Postmortem
namic environments do not appear to program state Facility for Node.js
have postmortem facilities at all—at
least not any that meet the constraints back to the server. Despite the lack of JavaScript language
support, we have developed a crude
just described. but effective postmortem debugging
Python10 and Ruby4 each has a facil- facility for use in Joyent’s Node.js pro-
ity called a postmortem debugger, but duction deployments. Recall that Node
these refer to starting a program under typically runs on a server rather than a
a debugger and having the program Web browser and is commonly used to
break into an interactive debugger ses- implement services that scale to hun-
sion when the program crashes. This is dreds or thousands of network connec-
not suitable for production for several tions. We use the following primitives
reasons, not least of which is that it is provided by Node and the underlying
not fully automatic. As described earli- V8 virtual machine to construct a sim-
er, it is not tenable to interrupt produc- ple implementation:
tion service while an engineer logs in to ˲˲ An uncaughtException event,
diagnose a problem interactively. which allows a program to register a
Erlang5 provides a rich crash-dump function to be invoked when the pro-
facility for the Erlang runtime itself. It gram throws an exception that bubbles
works much like a native crash dump all the way to the top level (that would
in that on failure it saves a comprehen- normally cause the program to crash).
sive state dump to a file and then exits, ˲˲ Built-in mechanisms for serial-
allowing the operating system to see izing/deserializing simple JavaScript
the program has exited and restart it objects as a text string (JSON.strin-
immediately. The crash dump file can gify() and JSON.parse()).
then be analyzed later. ˲˲ Synchronous functions for writing
The bash shell1 is interesting be- to files.
cause its deployment model is so dif- The first challenge is actually iden-
ferent even from other dynamic envi- tifying which state to dump. JavaScript
ronments. Bash provides a mechanism provides a way to introspect global
called xtrace for producing a compre- state, but Node.js programs that declare
hensive trace file describing nearly ev- variables do not use global state per se.
ery expression that the shell evaluates What looks like the top-level scope is
as part of execution. This is very useful actually contained inside a function
for understanding shell script failures scope, and function scopes cannot be
but can produce a lot of output even for introspected. To work around this, pro-
simple scripts. The output grows un- grams using our postmortem facility
bounded as the program runs, which must explicitly register debugging state
would normally make it untenable for ahead of time. While this solution is
production use in servers or applica- deeply unsatisfying because it is always
tions, but since most bash scripts have difficult to know ahead of time what in-
very finite lifetimes, this mechanism formation would be useful to have when
is an effective postmortem facility as debugging, it has proved effective in
long as the output can be reasonably practice because each of our programs
stored and managed (that is, automati- essentially just instantiates a singleton
cally deleted after successful runs). object representing the program itself
JavaScript, unlike many of the and then registers that with the post-

50 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

mortem facility. Most relevant program ments set forth earlier: it is always-on popularity for building critical software
state is referenced by this pseudo-global in production, fully automatic, the components, this gap is becoming in-
object in one way or another. result is transferable to other systems creasingly important. Languages that
The next challenge is serializing cir- for analysis, and it is comprehensive ignore the problems associated with
cular objects. JSON.stringify() does enough to solve complex problems. To debugging production systems will in-
not support this for obvious reasons, address many of the scope, robustness, creasingly be relegated to solving sim-
so our implementation avoids this is- and richness problems described here, pler, well-confined, well-understood
sue by pruning all circular references however, and to provide such a facility problems, while those that provide rich
before serializing the debug object. for all users of a language, the postmor- tools for understanding failure post-
While this makes it harder to find in- tem facility must be incorporated into mortem will form the basis of the next
formation in the dump, we know that the VM itself. Such an implementation generation of software bedrock.
at least one copy of every object will be would work similarly in principle, but
present somewhere in the dump. it could include absolutely all program Acknowledgments
Given all this, the implementation state, be made to work reliably in the Many thanks to Bryan Cantrill, Peter
is straightforward: on the uncaughtEx- face of failure of the program itself, Memishian, and Theo Schlossnagle
ception event, we prune circular refer- stream the output to avoid using much for reviewing earlier drafts of this ar-
ences from the debug state, serialize it additional memory, and use a format ticle and to Adam Cath, Ryan Dahl,
using the built-in JSON.stringify() that preserves the underlying memory Robert Mustacchi, and many others
routine, and save the result to disk in structures to ease understanding of for helpful discussions on this topic.
a file called core. To analyze the core the dump. Most importantly, including
file, we use a tool that reads core using tools for postmortem analysis out of
Related articles
JSON.parse() and presents the seri- the box would go a long way toward the on queue.acm.org
alized state for engineers to examine. adoption of postmortem techniques in
Erlang for Concurrent Programming
The implementation is open source these environments. Jim Larson
and available on GitHub.7 http://queue.acm.org/detail.cfm?id=1454463
In addition to the implementation Conclusion
Orchestrating an Automated Test Lab
challenges just described, this ap- Postmortem debugging facilities have Michael Donat
proach has several significant limita- long enabled operating-system engi- http://queue.acm.org/detail.cfm?id=1046946
tions. First, it can save only state that neers and native-application develop- Scripting Web Services Prototypes
programmers can register ahead of ers to understand complex software Christopher Vincent
time, but as already discussed, there failures from the first occurrence in http://queue.acm.org/detail.cfm?id=640158
is a great deal of other important state deployed systems. Such facilities form
inside a JavaScript program such as the backbone of the support process References
1. Bash Reference Manual (2009); https://www.gnu.
function arguments in the call stack for enterprise systems and are essential org/s/bash/manual/bash.html.
and the contexts associated with pend- for software components at the core of 2. Bonwick, J. The slab allocator: An object-caching
kernel memory allocator. Usenix Summer 1994
ing and future events, none of which is a complex software environment. Even Technical Conference.
reachable from the global scope. simple platforms for recording post- 3. Cantrill, B.M. Postmortem object type identification.
In Proceedings of the 5th International Workshop on
Second, since the entire point of mortem state enable engineers to de- Automated and Algorithmic Debugging. (2003)
this system is to capture program velop sophisticated analysis tools that 4. Debugging with ruby-debug. 2011; http://bashdb.
sourceforge.net/ruby-debug.html#Post_002dMortem-
state in the event of a crash, it must be help them to quickly root-cause many Debugging.
5. Erlang Runtime System Application User’s Guide,
highly reliable. This implementation is types of problems. version 5.8.4. 2011. How to interpret the Erlang crash
robust to most runtime failures, but it Meanwhile, modern dynamic lan- dumps; http://www.erlang.org/doc/apps/erts/crash_
dump.html.
still requires additional memory first guages are growing in popularity be- 6. Gill, S. The diagnosis of mistakes in programmes on
to execute the dump code and to seri- cause they so effectively facilitate rapid the EDSAC. In Proceedings of the Royal Society A 206
(1951), 538–554.
alize the program state. The additional development. Environments such as 7. GitHub Project. 2011; https://github.com/joyent/node-
memory could easily be as large as the Node.js also promote programming panic
8. Incubator Wiki. March 2011 Board reports. Kato
whole heap, which makes it untenable models that scale well, particularly in Project; http://wiki.apache.org/incubator/March2011.
for failures resulting from memory the face of latency bubbles. This is be- 9. McGregor, D.R., Malone, J.R. Stabdump—A dump
interpreter program to assist debugging. Software
pressure—a common cause of failures coming increasingly important in to- Practice and Experience 10, 4 (1980), 329–332.
in dynamic environments. day’s real-time systems. 10. Python Standard Library. Python v2.7.2 documentation
pdb—the Python debugger; 2011; http://docs.python.
Third, because the implementation Postmortem debugging for dynamic org/library/pdb.html.
removes circular references before se- environments is still in its infancy. Most 11. Schrock, E. Debugging AJAX in production. ACM
Queue 7, 1 (2009); http://queue.acm.org/detail.
rializing the program data, the result- such environments, even those consid- cfm?id=1515745.
ing dump is more difficult to browse, ered mature, do not provide any facil-
and the facility cannot support dumps ity for recording postmortem state, let David Pacheco is an engineer at Joyent where he leads
the design and implementation of Cloud Analytics, a real-
that are not intended for postmortem alone tools for higher-level analysis of time Node.js/DTrace-based system for visualizing server
analysis (such as live dumps). such failures. Those tools that do exist and application performance in the cloud. Previously a
member of the Sun Microsystems Fishworks team, he
Despite these deficiencies, this are not first-class tools in their respec- worked on several features of the Sun Storage 7000
appliances.
implementation has proved quite ef- tive environments and so are not widely
fective because it meets the require- used. As dynamic languages grow in © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 51


practice
doi:10.1145/2043174.2043190
such as the Large Synoptic Survey Tele-
Article development led by
queue.acm.org
scope (LSST), Atacama Large Millime-
ter Array (ALMA), and Square Kilometer
Array (SKA) move into operation. These
Astronomers are collecting more data than new projects will use much larger arrays
ever. What practices can keep them ahead of telescopes and detectors or much
higher data acquisition rates than are
of the flood? now used. Projections indicate that by
2020, more than 60PB of archived data
By G. Bruce Berriman and Steven L. Groom will be accessible to astronomers.10
The data tsunami is already affecting

How Will
the performance of astronomy archives
and data centers. One example is the
NASA Infrared Processing and Analysis
Center (IPAC) Infrared Science Archive

Astronomy
(IRSA), which archives and serves data
sets from NASA’s infrared missions.
It is going through a period of excep-
tional growth in its science holdings, as

Archives
shown in Figure 1, because it is assum-
ing responsibility for the curation of
data sets released by the Spitzer Space
Telescope and Wide-field Infrared Sur-

Survive
vey Explorer (WISE) mission.
The volume of these two data sets
alone exceeds the total volume of
the 35-plus missions and projects al-

the Data
ready archived. The availability of the
data, together with rapid growth in
program-based queries, has driven up
usage of the archive, as shown by the

Tsunami?
annual growth in downloaded data
volume and queries in Figure 2. Usage
is expected to accelerate as new data
sets are released through the archive,
yet the response times to queries have
already suffered, primarily because of
a growth in requests for large volumes
of data.
The degradation in performance
cannot be corrected simply by adding
A stronomy is a l r e a dyawash with data: currently infrastructure as usage increases, as is
1PB (petabyte) of public data is electronically common in commercial enterprises,
accessible, and this volume is growing at 0.5PB because astronomy archives generally
operate on limited budgets that are
per year. The availability of this data has already fixed for several years. Without inter-
transformed research in astronomy, and the Space vention, the current data-access and
computing model used in astronomy,
Telescope Science Institute (STScI) now reports in which data downloaded from ar-
that more papers are published with archived data chives is analyzed on local machines,
sets than with newly acquired data.18 will break down rapidly. The very
scale of data sets such as those just
This growth in data size and anticipated usage will described will transform the design
accelerate in the coming few years as new projects and operation of archives as places

52 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vo l. 5 4 | no. 1 2


that not only make data accessible to Astrophysics,6 commissioned by the strategies and techniques for manag-
users, but also support in situ process- National Academy of Sciences to recom- ing the data tsunami.
ing of these data with the end users’ mend national priorities in astronomy At the Innovations in Data-inten-
software: network bandwidth limita- for the coming decade. sive Astronomy workshop earlier this
tions prevent transfer of data on this Figure 3 illustrates the impact of year (Green Bank, WV, May 201115)
scale, and users’ desktops in any case the growth of archive holdings. As participants recognized that the prob-
generally lack the power to process holdings grow, so does the demand lems of managing and serving massive
PB-scale data. for data, for more sophisticated types data sets will require a community ef-
Moreover, data discovery, access, of queries, and for new areas of sup- fort and partnerships with national
and processing are likely to be distribut- port, such as analysis of massive new cyber-infrastructure programs. The
ed across several archives, given that the data sets to understand how astro- solutions will require rigorous inves-
maximum science return will involve nomical objects vary with time, de- tigation of emerging technologies and
photogra ph Courtesy of NASA/ JPL -Caltech

federation of data from several archives, scribed in the 2010 Decadal Survey innovative approaches to discovering
usually over a broad wavelength range, as the “last frontier in astronomy.” and serving, especially as archives are
and in some cases will involve confron- Thus, growth in holdings drives up likely to continue to operate on limit-
tation with large and complex simula- storage costs, as well as compute and ed budgets. How can archives develop
tions. Managing the impact of PB-scale database costs, and the archive must new and efficient ways of discover-
data sets on archives and the commu- bear all of these costs. Given that ar- ing data? When should, for example,
nity was recognized as an important chives are likely to operate on shoe- an archive adopt technologies such
infrastructure issue in the report of the string budgets for the foreseeable as graphical processing units (GPUs)
2010 Decadal Survey of Astronomy and future, the rest of this article looks at or cloud computing? What kinds of

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 53


practice

Figure 1. Growth in the scientific data holdings of IRSA, projected to 2014. The graphic technologies are needed to manage
calls out the dramatic impact of the Spitzer and WISE missions on the volume of the distribution of data time, compute-in-
archive’s science data holdings. tensive data-access jobs, and end-user
processing jobs?
This article emphasizes those is-
700
  IRSA General   Spitzer   WISE sues we believe must be addressed by
archives to support their end users in
600 the coming decade, as well as those is-
sues that affect end users in their inter-
500
actions with archives.
IRSA Holdings (TB)

400
Innovations in Serving
300
and Discovering Data
200
The discipline of astronomy needs
new data-discovery techniques that re-
100 spond to the anticipated growth in the
size of data sets and that support effi-
0
2008 2009 2010 2011 2012 2013 2014
cient discovery of large data sets across
Courtesy of IRSA
distributed archives. These techniques
must aim to offer data discovery and
access across PB-sized data sets (for ex-
ample, discovering images over many
Figure 2. Growth in usage of IRSA from 2005 until the beginning of 2011. WISE data wavelengths over a large swath of the
was not available until spring 2011. sky such as the Galactic Plane) while
preventing excessive loads on servers.
The Virtual Astronomical Observa-
 Other   Spitzer
tory (VAO),19 part of a worldwide effort to
25 offer seamless international astronomi-
cal data-discovery services, is exploring
20 such techniques. It is developing an R-
Data Downloaded (TB/month)

tree-based indexing scheme that sup-


15
ports fast, scalable access to massive
databases of astronomical sources and
imaging data sets.9 (R-trees are tree data
10
structures used for indexing multidi-
mensional information. They are com-
5
monly used to index database records
and thereby speed up access times.)
0 In the current implementation,
2005 2006 2007 2008 2009 2010 2011 the indices are stored outside the da-
Courtesy of IRSA tabase, in memory-mapped files that
reside on a dedicated Linux cluster. It
offers speedups of up to 1,000 times
 Total server load    Image queries    Catalog queries
over database table scans and has been
30
implemented on databases containing
two billion records and TB-scale im-
age sets. It is already in operation in
Queries (millions/year)

20 the Spitzer Space Telescope Heritage


Archive and the VAO Image and Cata-
log Discovery Service. Expanding tech-
niques such as this to PB-scale data is
10
an important next step.
Such custom solutions may prove
more useful than adapting an expen-
0 sive geographical information system
2006 2007 2008 2009 2010 (GIS) to astronomy. These systems are
Year
Courtesy of IRSA necessarily more complex than are
needed in astronomy, where the ce-
lestial sphere is by definition a perfect
sphere and the footprints on the sky of

54 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

instruments and data sets are generally mercial clouds should be made after a should monitor its performance.
simple geometric shapes. thorough cost-benefit study. It may be The SKA has rejected the use of
that commercial clouds are best suited commercial cloud platforms. Instead,
Investigations of for short-term tasks, such as regression after a successful prototyping experi-
Emerging Technologies testing of applications and handling ex- ment, it proposes a design based on
A growing number of investigators are cessive server load, or to one-time bulk- the open source Nereus V Cloud20 com-
taking part in a concerted and rigorous processing tasks, as well as supporting puting technology, selected because of
effort to understand how archives and end-user processing. its Java codebase and security features.
data centers can take advantage of new Implementing and managing new The prototype test bed used 200 clients
technologies to reduce computational technologies always have a business at the University of Western Australia,
and financial costs. cost, of course. Shane Canon4 and oth- Curtin University, and iVEC, with two
Benjamin Barsdell et al.1 and Chris- ers have provided a realistic assess- servers deployed through management
topher Fluke et al.7 have investigated ment of the business impact of cloud at a NereusCloud domain. The clients
the applicability of GPUs to astrono- computing. Studies such as these are include Mac Minis and Linux-based
my. Developed to accelerate the out- needed for all emerging technologies. desktop machines. When complete,
put of an image on a display device, Despite the high costs often as- “theskynet,” as it has been called,
GPUs consist of many floating-point sociated with clouds, the virtualiza- would provide open access to the SKA
processors. These authors point out tion technologies used in commer- data sets for professionals and citizen
that speed ups of more than 100 times cial clouds may prove valuable when scientists alike.12 The design offers a
promised by manufacturers strictly ap- used within a data center. Indeed, cheaper and much greener alternative
ply to graphics-like applications; GPUs the Canadian Astronomy Data Center to earlier designs based exclusively on
support single-precision calculations (CADC) is moving its entire operation a centrally based GPU cluster.
rather than the double precisions often to an academic cloud called Canadian
needed in astronomy; and their perfor- Advanced Network for Astronomical Compute Infrastructure
mance is often limited by data transfer Research (CANFAR), “an operational Astronomy needs to engage and part-
to and from the GPUs. The two studies system for the delivery, processing, ner with national cyber infrastructure
cited here indicate applications that storage, analysis, and distribution of initiatives. Much of the infrastructure
submit to “brute-force paralleliza- very large astronomical datasets. The to optimize task scheduling and work-
tion” will give the best performance goal of CANFAR is to support large Ca- flow performance and to support dis-
with minimum development effort; nadian astronomy projects.”11 To our tributed processing of data is driven
they show that code profiling will likely knowledge, this is the first astronomy by the needs of science applications.
help optimization and provide a first archive that has migrated to cloud Indeed, the IT community has adopted
list of the types of astronomical appli- technologies.8 It can be considered a the Montage image mosaic engine3 to
cations that may benefit from running first model of the archive of the future, develop infrastructure (for example,
on GPUs. These applications include and consequently the community task schedulers in distributed environ-
fixed-resolution mesh simulations, as
well as machine-learning and volume- Figure 3. Schematic representation of how growth in data holdings drives up demands on
the archive’s services and thereby drives up the archive’s costs.
rendering packages.
Others are investigating how to ex-
ploit cloud computing for astronomy.
Applications best suited for commer- Archive Growth
cial clouds are those that are process-
ing and memory intensive, which take
advantage of the relatively low cost of Storage Database
processing under current fee structures.2 Costs Richer Holdings Costs
Applications that are I/O intensive,
which in astronomy often involve pro-
cessing large quantities of image data, Data Access Times
are, however, uneconomical to run Advanced
because of the high cost of data trans- Time Domain Research Database/Search
fer and storage. They require high- Engine
In-situ Analysis
throughput networks and parallel file
systems to achieve best performance.
Under current fee structures, rent- Query Costs
ing mass storage space on the Amazon More Sophisticated Compute and Performance
Queries and Analysis Costs
cloud is more expensive than purchas-
ing it. Neither option offers a solution
to the fundamental business problem
that storage costs scale with volume,
while funding does not. Any use of com-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 55


practice

ments and workflow optimization tech- quate testing); how a computer works Information Extraction: Distilling Structured
niques). These efforts have not, howev- and what limits its performance; at Data from Unstructured Text
Andrew McCallum
er, been formally organized, and future least one low-level language and one
http://queue.acm.org/detail.cfm?id=1105679
efforts may well benefit from such. scripting language, development of
Cultural changes. There is at pres- portable code, parallel-processing References
ent no effective means of disseminat- techniques, principles of databases, 1. Barsdell, B.R., Barnes, D.G. and Fluke, C. J. Analysing
astronomy algorithms for graphics processing
ing the latest IT knowledge to the as- and how to use high-performance plat- units and beyond. Monthly Notices of the Royal
tronomical community. Information forms such as clouds, clusters, and Astronomical Society 408, 3 (2010), 1936−1944.
2. Berriman, G.B., Deelman, E., Juve, G., Regelson, M. and
is scattered across numerous journals grids. Teaching high-performance Plavchan, P. The application of cloud computing to
and conference proceedings. To rectify computing techniques is particularly astronomy: A study of cost and performance. Accepted
for publication in Proceedings of the e-Science in
this, we propose an interactive online important, as the load on servers needs Astronomy Conference (Brisbane, AU, 2011).
3. Berriman, G.B., Good, J., Deelman, E. and Alexov, A.
journal dedicated to information tech- to be kept under control. Such a curric- Ten years of software sustainability at the Infrared
nology in astronomy or even physical ulum would position astronomers to Processing and Analysis Center. Phil. Trans. R. Soc.
369 (2011), 3384−3397
sciences as a whole. develop their own scalable code and to 4. Canon, S. Debunking some common misconceptions of
Even more important is the need to work with computer scientists in sup- science in the cloud. Presented at ScienceCloud2011:
2nd Workshop on Scientific Cloud Computing
change the reward system in astron- porting next-generation applications. (San Jose, CA); http://datasys.cs.iit.edu/events/
omy to offer recognition for compu- Curricula designers can take ad- ScienceCloud2011/.
5. Chilingarian, I. and Zolotukhin, I. The true bottleneck
tational work. This would help retain vantage of existing teaching methods. of modern scientific computing in astronomy.
quality people in the field. Software Carpentry17 is an open source Astronomical Data Analysis Software and Systems XX.
I. Evans et al., Eds. ASP Conference Series 442, 471
Finally, astronomers must engage project that provides online classes (2010).
the computer science community to in the basics of software engineering 6. Committee for a Decadal Survey of Astronomy
and Astrophysics, National Research Council of the
develop science-driven infrastructure. and encourages contributions from its National Academy of Sciences. New Worlds, New
The SciDB database,16 a PB-scale next- user community. Frank Loffler et al.13 Horizons in Astronomy and Astrophysics, 2010.
7. Fluke, C.J., Barnes, D.G., Barsdell, B.R., Hassan, A.H.
generation database optimized for sci- described a graduate class in high- Astrophysical supercomputing with GPUs: critical
ence applications, is an excellent ex- performance computing at Louisiana decisions for early adopters. Publications of the
Astronomical Society of Australia 28, 15 (2011).
ample of such collaboration. State University in which they used the 8. Gaudet, S. et al.CANFAR: The Canadian Advanced
Network for Astronomical Research. Software and
Educational Changes. An archive TeraGrid to instruct students in high- Cyber Infrastructure for Astronomy. N. Radziwill and
model that includes processing of data performance computing techniques A. Bridger, Eds. SPIE 7740, 1I (2010).
9. Good, J. Private communication, 2011.
on servers local to the data will have that they could then use in day-to-day 10. Hanisch, R.J. Data discovery, access, and management
profound implications for end users, research. Students were given hands- with the virtual observatory. 2011, Paper presented at
Innovations in Data-intensive Astronomy, Green Bank,
who generally lack the skills not only on experience at running simulation WV, 2011; http://www.nrao.edu/meetings/bigdata/
to manage and maintain software, but codes on the TeraGrid, including presentations/May5/1-Hanisch/Hanisch VAO Green
Bank.ppt.
also to develop software that is en- codes to model black holes, predict 11. Hemsoth, N. Canada explores new frontiers in
vironment agnostic and scalable to the effects of hurricanes, and optimize astroinformatics. HPC in the Cloud; http://www.
hpcinthecloud.com/hpccloud/2011-01-17/canada_
large data sets. Zeeya Merali14 and Igor oil and gas production from under- explores_new_frontiers_in_astroinformatics.html.
Chilingarian and Ivan Zolotukhin5 have ground reservoirs. 12. Hutchinson, J. SKA bid looks to SkyNet for computing,
2011; http://www.cio.com.au/article/387097/
made compelling cases that self-teach- exclusive_ska_bid_looks_skynet_computing/.
ing of software development is the root Conclusion 13. Loffler, F., Allen, G., Benger, W., Hutanu, A., Jha, S. and
Schnetter, E. Using the TeraGrid to teach scientific
cause of this phenomenon. Chilingar- The field of astronomy is starting to computing. TeraGrid ’11: Extreme Digital Discovery
ian and Zolotukhin in particular pres- generate more data than can be man- Conference (Salt Lake City, UT; July 18−21, 2011);
https://www.teragrid.org/web/tg11/home.
ent some telling examples of clumsy aged, served, and processed by current 14. Merali, Z. Why scientific programming does not
compute. Nature 467 (2010) 775−777.
and inefficient design in astronomy. techniques. This article has outlined 15. National Radio Astronomy Observatory. Innovations
One solution would be to make practices for developing next-genera- in Data-intensive Astronomy Workshop (Green Bank,
WV, May 3−5, 2011); http://www.nrao.edu/meetings/
software engineering a mandatory tion tools and techniques for surviving bigdata/index.shtml.
part of graduate education, with a this data tsunami, including rigorous 16. SciDB Open Source Data Management and Analytics
Software System. 2011; http://scidb.org.
demonstration of competency as part evaluation of new technologies, part- 17. Software Carpentry; http://software-carpentry.org/.
of the formal requirements for gradu- nerships between astronomers and 18. Space Telescope Science Institute. Hubble Space
Telescope Publication Statistics 1991-2010 (2011);
ation. Just as classes in instrumen- computer scientists, and training of http://archive.stsci.edu/hst/bibliography/pubstat.html.
tation prepare students for a career scientists in high-end software engi- 19. Virtual Astronomical Observatory; http://us-vao.org.
20. Nereus overview. http://www-nereus.physics.ox.ac.uk/
in which they design experiments to neering skills. about_overview.html
obtain new data, so instruction in
computer science prepares them for G. Bruce Berriman is a senior scientist at Infrared
Related articles Processing and Analysis Center (IPAC). He is the program
massive data-mining and processing on queue.acm.org manager for the Virtual Astronomical Observatory and
tasks. Software has become, in effect, project manager for the W. M. Keck Observatory Archive
Why Your Data Won’t Mix and was formerly the manager of the NASA/IPAC
a scientific instrument. Alon Halevy Infrared Science Archive.
The software engineering curricu- http://queue.acm.org/detail.cfm?id=1103836 Steven L Groom is a systems engineer at IPAC and is
lum should include the principles of manager of the NASA/IPAC Infrared Science Archive.
If You Have Too Much Data, then He has worked with mass storage, parallel processing,
software requirements, design, and “Good Enough” Is Good Enough and data archiving in the space sciences, as well as
maintenance (version control, docu- commercial applications.
Pat Helland
mentation, basics of design for ade- http://queue.acm.org/detail.cfm?id=1988603 © 2011 ACM 0001-0782/11/12 $10.00

56 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


doi:10.1145/2043174 . 2 0 4 3 1 9 1

Article development led by


queue.acm.org

What separates good code from great code?


By Robert Green and Henry Ledgard

Coding
Guidelines:
Finding the
Art in the
Science
Computer science is both a science and an art. Its
scientific aspects range from the theory of computa-
tion and algorithmic studies to code design and pro-
gram architecture. Yet, when it comes time for imple-
mentation, there is a combination of artistic flare,
nuanced style, and technical prowess that separates
good code from great code.
Like art, code is simultaneously code the ability to clearly communi-
subjective and non-subjective. The cate intent, function, and usage.
non-subjective aspects of coding in- This separation between good and
clude “hard” ideas that must be fol- great code occurs because every per-
lowed to create good code: design son has an affinity for his or her own
patterns, project structures, the use particular coding style based on his or
of common libraries, and so on. Al- her own good (or bad) habits and pref-
though these concepts lay the foun- erences. Anyone can write code with-
dation for developing high-quality, in a design pattern or using certain
maintainable code, it is the nuances “hard” techniques, but it takes a great
of a programmer’s technique and programmer to fill in the details of the
tools—alignment, naming, use of code in way that is clear, concise, and
white space, use of context, syntax understandable. This is important be-
highlighting, and IDE choice—that cause just as every person may draw a
truly make code clear, maintainable, unique meaning or experience from
and understandable, while also giving a single piece of artwork, every devel-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 57


practice

oper or reader of code may infer differ- have defined a single coding standard a few broad principles that capture
ent meanings from the code depend- for an entire programming language,7 some fundamental principles of com-
ing on naming and other conventions, while others have acquiesced to ac- munication and elevate the notion of
despite the architecture and design of cepting naming conventions as long coding conventions to a higher level.
the code. as they are consistent.6 Beautiful code The use of these conventions will also
From another angle, programming has been defined in general terms as improve the sustainability of a code
may also be seen as a form of “encryp- readable, focused, testable, and el- base. This article looks at these un-
tion.” In various ways the program- egant.1 The more extreme case is the derlying principles.
mer devises a solution to a problem invention of an entire programming One area not considered here is
and then encrypts the solution in language built around a concrete set the use of syntax highlighting or IDEs.
terms of a program and its support of ideals, such as Ruby or Python. While either one may make code
files. Months or years later, when a Ruby emphasizes brevity, simplic- more readable (because of syntax
change is called for, a new program- ity, flexibility, and balance.4 The prin- highlighting or code folding, among
mer must decrypt the solution. This ciples behind Python are clear in the others) and easier to manage (for ex-
is usually not an enviable task, which Zen of Python,5 where the focus lies on ample, quickly looking up or refactor-
can mainly be blamed on a failure of beauty, simplicity, readability, and re- ing functions and/or variables), our
clear communication during the ini- liability. guidelines have been developed to be
tial “encryption” of the project. De- Our approach to this issue has IDE and color neutral. They are meant
crypting information is simple when been to develop a system of coding to reflect foundational principles that
the necessary key is present. So, too, is guidelines (available online3). While are important when writing code in
understanding old code when special these guidelines come from an edu- any setting. Also, while IDEs can help
attention has been paid to what the cational environment, they are de- improve readability and understand-
code itself communicates. signed to be useful to practitioners ing in some ways, the features found
To address this issue, some works as well. The guidelines are based on in these tools are not standard (con-
sider the different features found in
Figure 1. Use of vertical alignment to show symmetry. Visual Studio, Eclipse, and VIM, for
example). Likewise, syntax highlight-
char c1;
ing varies greatly among environ-
c1 = getChoice(); ments and may easily be changed to
switch(c1){ match personal preference. The goal
case 'q': case 'Q': quit(); break; of the following principles is to build
case 'e': case 'e': enterPerson(content); break;
case 'd': case 'd': delPerson(content); break; a foundation for good programming
case 's': case 's': sortByName(); break; that is independent of the program-
case 'l': case 'l': showAll(); break; ming IDE.
case 'f': case 'f': searchByName(content); break;
case default: System.out.printIn("--Invalid Command!!\n");
} Consider a Program as a “Table”
In a recent ACM Queue article, Poul-
Henning Kamp2 makes the fascinat-
ing point that much of the style of
Figure 2. Example of cluttered presentation. programming languages stems from
the ASCII character set and typewriter-
private JFrame mainFrame = new JFrame("Wind Power Calculator");
based terminals. Programming lan-
private JTextArea windVel = new JTextArea(VEL, 2, TEXT_WIDTH); guages make no use of the graphical
private JLabel velTag = new JLabel("Wind Velocity"); properties and options of modern de-
private JTextArea sweptArea = new JTextArea(SWEPT_AREA, 2, TEXT_WIDTH); vices. While code must be written with
private JLable sweptAreaTag = new JLabel("Swept Area");
private JTextArea genSize = new JTextArea(GEN_SIZE, 2, TEXT_WIDTH); the clarity of good English grammar, it
private JButton calculatePower = new JButton("Calculate Power"); is not English text. Instead it is more
like math and tables.
This is a far-reaching principle.
First, it speaks directly to the use of
Figure 3. Revision of code in Figure 2 showing tabular structure. fonts. Do not use a variable-width
(proportional) font for program
private JFrame mainFrame = new JFrame ("Wind Power Calculator");
code, as code is not text. Fixed-width
private JTextArea windVel = new JTextArea (VEL, 2, TEXT_WIDTH); fonts (for example, Courier and Data
private JLabel velTag = new JLabel ("Wind Velocity"); Gothic) look appealing and allow
private JTextArea sweptArea = new JTextArea (SWEPT_AREA, 2, TEXT_WIDTH); easy alignment of code. Proportional
private JLable sweptAreaTag = new JLabel ("Swept Area");
private JTextArea genSize = new JTextArea (GEN_SIZE, 2, TEXT_WIDTH); (variable-width) fonts prevent proper
private JButton calculatePower = new JButton ("Calculate Power"); alignment, and even more important-
ly, do not “look like” code.
While one should continue to think

58 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

of a program as a sequence of actions more specific guidelines: or verb phrases;


or as an algorithm at a high level, each ˲˲ Variables and classes should be ˲˲ Methods used to return a value
section of code should also be thought nouns or noun phrases; should be nouns or noun phrases;
of as a presentation of a chart, table, ˲˲ Class names are like collective ˲˲ Booleans should be adjectives;
or menu. In figures 1, 2, and 3 notice nouns; ˲˲ For compound names, retain con-
the use of vertical alignment to show ˲˲ Variable names are like proper ventional English syntax; and
symmetry. This is a powerful method nouns; ˲˲ Try to make names pronounce-
of communication. ˲˲ Procedure names should be verbs able.
In the case when a long line of
code spills into multiple lines, we Figure 4. Examples of basing names on conventional English usage.
suggest breaking and realigning the
code.a For example, instead of
Variables Class Names

participant newEntry = new participant Not the Right Noun Better Not the Right Noun Better
Round Wheel Accounting BankAccount
(id, name, address1, address2, city,
LoopTimes NumLoops SetPoint Point
state, zip, phone, email);
Valid InputStatus NodeNetworking SocketInfo
Starting Source
use Ending Destination
Rows NumRows
participant newEntry = new participant
(id, name, address1, address2, |
city, state, zip, phone, email); Problematic Preferable
Person personInfo; PersonInfo P1, P2;
Socket socketDesc; SocketDescription socket;
or
Frame TopFrameSection; Frame TopFrame;
Message = EmergencyAlertLabels[i] AlertText = EmergencyLabel[i]
participant newEntry = new participant
(id, name, address1, address2, city,|
state, zip, phone, email); Not the Right Verb More Readable
NameSet SetName
Let Simple English be Your Guide Modified Modify
A programmer creates a name for Withdrawal Withdraw
something with full knowledge of Right MoveRight
its use, and often many names make
sense when one knows what the name
represents. Thus, the programmer has Incorrect Function Names More Readable
numFiles = countFiles(directory); numFiles = fileCount(directory);
this problem: creating a name based on
A = computeArea(parcel); A = Area(parcel);
a concept. The true challenge, howev-
x = getImagePos(i).x; x = Image(i).xCoord;
er, is precisely the opposite: inferring
the concept based on the name! This is
the problem that the program reader Incorrect Boolean Vars Grammatically Better
has. Fill Full
Consider the simple name Terminate Terminated
sputn Real isReal
taken from the common C++ header Edit IsEditable
file <iostream.h>. An inexperienced Waits Waiting
or unfamiliar programmer may sud- License hasLicense

denly be mentally barraged with a bout


of questions such as: Is it an integer?
Grammatically Incorrect Better
A pointer? An array or a structure? A
IdVehicle VehicleID
method or a variable? Does sp stand for
NoSectors NumSectors
saved pointer? Is sput an operation to FormEnable(); EnableForm();
be done n times? Do you pronounce it
sputn or s-putn or sput-n or s-put-n?
We advocate basing names on Unpronouncable Pronouncable
conventional English usage—in par- Tbl Table
ticular, simple, informal, abbreviated GenYmDhMs GenerateTime
English usage. Consider the following Cntr Counter
Nbr Num

a Given the limited spacing here, the | denotes a


line break.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 59


practice

Some examples of this broad prin- ure 5. For major variables that are on white space—what is not said di-
ciple are shown in Figure 4. used throughout the program, a single rectly—in the code to communicate
There is an interesting but small is- letter may encourage program clarity. logic, intent, and understanding.
sue when considering examples such An example is the use of blank
as: Use White Space to Show Structure lines between conceptually different
While written and spoken communi- sections of code. Blank lines should
numFiles = countFiles(directory); cation may reach a high level of clar- improve readability as they separate
ity, it is often left wanting of meaning logically different segments of the
While countFiles is a good name, if not accompanied by the personal code and thus provide the literary
it is not an optimal name since it is touch of nonverbal cues and tenden- equivalent of a section break. Ap-
a verb. Verbs should be reserved for cies. An individual’s body language propriate places to use blank lines
procedure calls that have an effect on helps clarify the spoken word. In a include:
variables. For functions that have no similar sense, the programmer relies ˲˲ When changing from preproces-
side effects on variables, use a noun or
noun phrase. One does not usually say Figure 5. Keeping names short and simple.

y = computeSine(x);
or Too Lengthy Better
milesDriven = LoopIndex i, j
computeDistance(location1, location2); NumberOfTimes N (or n)
CheckIfEntryIsCorrect Validate
IsARealNumber IsReal
but rather
Temporary Temp

y = sine(x);
or
Too Verbose Preferable
milesDriven = Distance(location1, location2); Stack CurrentStack Stack S
Window Window1, Window2 Window W1, W2
We suggest that Frame TopFrame Frame Top
Counter Cntr Counter C
numFiles = fileCount(directory); SearchTree Tree SearchTree T

is a slight improvement. More impor-


tantly, this enforces the general rule Acceptable Preferable
that verbs denote procedures, and TreeNode Node
nouns or adjectives denote functions. CustomerID ID
StackStore Store
CarDriver Driver
Rely on Context to Simplify Code
NameStringInfo NameInfo
All other things being equal, shorter
programs are always better. As an ex-
ample, local variables that are used as
index variables may be named i, j, k,
and so on. An array index used on every Figure 6. Example of code that uses white space well.
line of a loop need not be named any
more elaborately than i. Using index public class SimpleAccount {
or elementNumber obscures the de- private double balance;
tails of the computation through exces-
public double getBalance() { return balance;}
sive description. A variable that is rare- public void setBalance(double b) { balance = b;}
ly used may deserve a long name: for public void deposit(double num) { balance = balance + num;}
example, MaxPhysicalAddr. When public void withdraw(double num) { balance = balance - num;}
variable names are long, especially if public static void main(String args[]){
there are many of them, it quickly be- SimpleAccount my_account;
comes difficult to see what’s going on.
my_account = new SimpleAccount();
A variable name can often be shortened
my_account.deposit(250);
by relying on the context in which it is System.out.printIn("Current balance " + my_account.getBalance());
used. For example, the variable Store my_account.withdraw(80.00);
in a stack implementation rather than my_account.withdraw(60.00);
System.out.printIn("Remaining balance " + my_account.getBalance());
StackStore. }
Major variables (objects) that are }
used frequently should be especially
short, as seen in the examples in Fig-

60 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

sor directives to code; as it is the indent that shows the situations, it does suggest that pro-
˲˲ Around class and structure decla- structure. grammers must use comments wisely
rations; and judiciously. The focus should be
˲˲ Around a function definition of Focus on the Code, on developing code that, for the most
some length; Not the Comments part, clearly communicates intent
˲˲ Around a group of logically con- The ability to communicate clearly is and functionality. This practice will
nected statements of some length; an issue that is faced in all facets of automatically reduce the need for
and the human experience. Programmers many comments.
˲˲ Between declarations and the ex- must achieve a level of clarity, conti-
ecutable statements that follow. nuity, and beauty when writing code. Discussion
Consider the code listing in Fig- This means focusing on the code and Although the guidelines presented
ure 6. Individual blank spaces should its clarity, balance, and symmetry, here are used in an educational set-
also be used to show the logical struc- not on its length or comments. While ting, they also have merit in industrial
ture within a single statement. Stra- this concept does not advocate the environments. Students who are edu-
tegic blank spaces within a line sim- removal of comments or negate their cated using these guidelines will most
plify the parsing done by the human use and importance in appropriate likely use them (or some variant) as
reader. At a minimum, blank spaces
should be included after the commas Figure 7. Decision statement structure, tersely presented.
in argument lists and around the as-
signment operator “=” and the redi- if(Card != null) display.setText(Card.getText());
rection operators “<<” and “>>”. else display.setText("No More Cards.");
On the other hand, blank spaces
should not be used for unary opera-
tors such as unary minus (-), address
of (&), indirection (*), member access Figure 8. Case statement presented as a chain.
(.), increment (++), and decrement
(--). if (result >= 90)
Also, if it makes sense, put two to cout << "Grade of A!";
three statements on one line. This else if (result >= 80)
practice has the effect of simplifying cout << "Grade of B”;
else if (result( >= 70)
the code, but it must be used with cout << "Sorry, grade of C";
discretion and only where it is sen- else
sible to do so. cout << "Not very good";

Let Decision Structures


Speak for Themselves
The case statement used in Figure 1 Figure 9. Examples of K&R, ANSI, and Whitesmiths coding styles.
brings up a general point: very simple
decision statement structures can be if (expression) { if (expression) if (expression)
tersely presented, showing the alter- statements { {
native code simply, and, if possible, } statements statements
without braces, as in the example in } }

Figure 7.
It is not uncommon for simple
conditions to be mutually exclusive, Figure 10. Example of a systems-programming coding style.
creating a kind of generalized case
statement. This, as is common prac-
//Unix Style
tice, can be printed as a chain, as in void tokenizeStr(string str, vector<string>& result, const string& delim = " "){
Figure 8. int pos = 0;
Of course, it may be that the struc- string strtok;
for(;;){
tures are truly nested, and then one
pos = str.find(delim);
must use either nested spacing or if(pos == (int)string::npos){
functions to indicate the alterna- result.push_back(str);
tives. Again, the general point is to let break;}
strtok = str.substr(0, pos);
the structure drive the layout, not the result.push_back(strtok);
syntax of the programming language. str = str.substr(pos+1);
In the brace wars, we do not take }
}
a strong stand on the various prefer-
ences shown in Figure 9, but we do
feel strongly that the indent is vital,

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 61


practice

they enter industry. To demonstrate power systems by developing a model ed here, one argument against such
this, we have developed an example for reliability evaluation using a Mon- guidelines is that making changes to
that applies these guidelines to two te Carlo simulation. keep a certain coding style intact is
very different styles. The first is the While the previous examples show time consuming, particularly when a
Unix style. It is terse, often making use the merit of the guidelines present- version-control system is used. In the
of vowel deletion, and is often found
in realistic applications such as oper- Figure 11. Example of a textbook coding style.
ating-system code. This is not to imply
that all or most system programmers // TEXTBOOK STYLE
use this style, only that it is not unusu- void tokenizeString(string myString, vector<string>& listOfTokens,
al. Figure 10 shows a small example of const string& tokenDelimiter = " ")
{
this style.
// Precondition: myString is not null
We call the second style the text- //
book style, as illustrated in Figure 11. // Parses myString into a list of tokens using the given delimiter.
Again, this in no way means to imply // If no specific delimiter is given, uses the space as a delimiter
//
that all or most textbooks use this // Postcondition: listOfTokens contains the individual tokens as values
style, only that the style in the example
is not unusual. In this style the focus int index = 0;
string nextToken;
is on learning. This means that there
boolean loop = true;
is frequent commenting, and the code
is well spread out. For the purposes of // Obtain tokens and store in vector
learning and understanding the de- while(loop)
{
tails of a language, this style can be index = myString.find(delimiter);
excellent. From a practical perspec- if(index == (int)string::npos)
tive or for any program of some scale, {
// end of string found
this style does not work well as it can
tokenList.push_back(myString);
be overwhelming to use or to read. loop = false;
Moreover, this style makes it difficult }
to see the overall design, as if one is else
{
stuck under the trees and cannot see // Append nextToken to vector
the forest around. nextToken = myString.substr(0, index);
Figure 12 is a rework of the func- tokenList.push_back(nextToken);
tion in figures 10 and 11, using the myString = myString.substr(index + 1);
}
guidelines discussed here to make a }
smooth transition between academic }
and practical code. This figure shows
a balance of both styles, relying more
directly on the code itself to commu-
nicate intent and functionality clearly. Figure 12. Example of a coding style using the guidelines presented here.
Compared with the textbook style, the
resultant code is shorter and more
// OUR STYLE
compact while still clearly communi- void tokenizeString(string S, vector<string>& tokenList,
cating meaning, intent, and function- const string& delimiter = " ") {
ality. When compared with the Unix // Given a string S, compute the list of its tokens.
style, the code is slightly longer, but int position;
the meaning, intent, and functionality string token;
are clearer than the original code. boolean moreTokens;

Figure 13 illustrates the guide-
moreTokens = true;
lines presented here in another set- while(moreTokens){
ting. This is a function taken from a position = S.find(delimiter);
complex program (10,000 lines) re- if(position == (int)string::npos){
tokenList.push_back(S);
lated to power-system reliability and moreTokens = false;
energy use regarding PHEVs (plug- }else{
in hybrid electric vehicles). The pro- token = S.substr(0, position);
gram makes numerous calculations tokenList.push_back(token);
S = S.substr(position + 1);
related to the effect that such vehi- }
cles will have on the current power }
grid and the effect on generation and }
transmission systems. This program
attempts to evaluate the reliability of

62 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


practice

face of a time-sensitive project or a thesis, or a temporary application. interactive Web site, or other useful
project that most likely will not be up- If, however, the codebase in ques- application), then almost any changes
dated or maintained in the future, the tion has a long lifespan or will be up- to improve readability are important,
effort may not be worthwhile. Typical dated and maintained by others (for and the time should be taken to en-
cases include class projects, a Ph.D. example, an operating system, server, sure the readability and maintainabil-
ity of the code. This should be a matter
Figure 13. Realistic and complex example of code following the guidelines presented here. of pride, as well as an essential func-
tion of one’s job.

void loadDataFile(double& pLoad, double& qLoad,


int& numBuses, int& numTransLines, string systemName, Related articles
vector<Generator>& gens, vector<Line>& transLines, vector<Bus>& buses){ on queue.acm.org
// This function loads the various system parameters from the power system data file.
// The power system data is encoded as a csv file, Beautiful Code Exists, if You Know
Where to Look
ifstream systemData;
string dataLine;
George Neville-Neil
vector<string> dataItem; http://queue.acm.org/detail.cfm?id=1454458
int numGens;
Software Development with Code Maps
systemData.open(("../Data/" + systemName).c_str()); Robert DeLine, Gina Venolia, and Kael Rowan
if (systemData.is_open()) { http://queue.acm.org/detail.cfm?id=1831329
systemData >> numGens;
systemData >> pLoad; Reading, Writing, and Code
systemData >> qLoad; Diomidis Spinellis
systemData >> numBuses;
systemData >> numTransLines; http://queue.acm.org/detail.cfm?id=957782

numGens = 0;
References
//Clear Vectors 1. Heusser, M. Beautiful code. Dr. Dobb’s (Aug. 2005);
gens.clear(); transLines.clear(); buses.clear(); http://www.ddj.com/184407802.
2. Kamp, P-H. Sir, please step away from the ASR-33!
// Set Generators ACM Queue 8, 10 (2010); http://queue.acm.org/detail.
for(int i = 0; i<numGens; i++){ cfm?id=1871406.
systemData >> dataLine; 3. Ledgard, H. Professional coding guidelines. 2011
Utils::tokenizeString(dataLine, dataItem,","); Unpublished report, University of Toledo; http://www.
eng.utoledo.edu/eecs/faculty_web/hledgard/softe/
gens.push_back(Generator( upload/.
atof(dataItem[3].c_str()), atof(dataItem[4].c_str()), 4. Molina, M. What makes code beautiful. Ruby
atof(dataItem[5].c_str()), atof(dataItem[6].c_str()), Hoedown, 2007.
atof(dataItem[7].c_str()), atoi(dataItem[0].c_str())) 5. Peters, T. The Zen of Python. PEP (Python
); Enhancement Proposals). Aug. 20, 2004; http://www.
python.org/dev/peps/pep-0020/.
gens[i].setIndex(i); 6. Reed, D. Sometimes style really does matter. J.
dataItem.clear(); Computing Sciences in Colleges 25, 5 (2010), 180-
} 187.
7. Sun Developer Network. Code conventions for the
// Set transmission lines Java programming language, 1999; http://java.sun.
for(int i = 0; i<numTransLines; i++){ com/docs/codeconv/.
systemData >> dataLine;
Utils::tokenizeString(dataLine, dataItem,",");
Acknowledgments
The authors would like to thank David Marcus and
transLines.push_back(Line(
Poul-Henning Kemp for their insightful comments
atoi(dataItem[0].c_str()), atoi(dataItem[1].c_str()),
while completing this work, as well as the software
atoi(dataItem[2].c_str()), atof(dataItem[3].c_str()),
engineering students who have contributed to these
atof(dataItem[4].c_str()), atof(dataItem[5].c_str()),
guidelines over the years.
atof(dataItem[6].c_str()), atof(dataItem[7].c_str()),
atof(dataItem[8].c_str()), atof(dataItem[9].c_str()),
atof(dataItem[10].c_str()), atof(dataItem[11].c_str()), Robert Green is pursuing his Ph.D. at the University of
atof(dataItem[12].c_str()), atof(dataItem[13].c_str())) Toledo. He has multiple years of experience developing
); software across a variety of industries. His research
dataItem.clear(); interests include biologically inspired computing, high-
} performance computing, and alternative energy.
// Set bus loadings Henry Ledgard was a member of the design team that
for(int i=0; i<numBuses; i++){ created the programming language ADA, a language he
systemData >> dataLine; believes was a creative, sound design. He is the author
Utils::tokenizeString(dataLine, dataItem,","); of several books on programming, and is a professor at
buses.push_back(Bus( the University of Toledo. His research interests include
atoi(dataItem[0].c_str()), atoi(dataItem[1].c_str()), principles of language design, human engineering and
atoi(dataItem[6].c_str()), atoi(dataItem[10].c_str()), effective ways to teach computer science.
atof(dataItem[2].c_str()), atof(dataItem[3].c_str()),
atof(dataItem[4].c_str()), atof(dataItem[5].c_str()),
atof(dataItem[6].c_str()), atof(dataItem[7].c_str()),
atof(dataItem[12].c_str()), atof(dataItem[11].c_str()),
atof(dataItem[9].c_str()))
);
dataItem.clear();
}
systemData.close();
}
}

© 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 63


contributed articles
doi:10.1145/2043174.2043192
individuals in crowds, and detect ab-
People in high-density crowds appear normal events.
This article explores cutting-edge
to move with the flow of the crowd, like techniques we have used in real-world
particles in a liquid. scenarios to provide solutions to such
problems.1,2,17 We developed them
By Brian E. Moore, Saad Ali, based on the notion that people in
Ramin Mehran, and Mubarak Shah crowds behave, in ways, like particles
in fluids. Hence, we treat crowds as
collections of mutually interacting

Visual Crowd
particles.
Typically, the motion of a high-
density crowd appears to behave like

Surveillance
a liquid, and interaction forces tend
to dominate the motion of the people.
This is in contrast to crowd motion
appearing in states like gases, where

through a
interactions between people are few
but random motions of individuals
tend to dominate the behavior. With
all this in mind, we contemplate visu-

Hydrodynamics
al crowd surveillance using ideas and
techniques based in hydrodynamics.
Hence, we say “fluid” and “liquid” in-
terchangeably, distinguishing our ap-

Lens
proach from aerodynamics, which con-
siders fluids in gaseous states.
Our hydrodynamics point of view is
well suited for analyzing high-density
crowds,9,12 with surveillance the pri-
mary concern. Though the number of
people will never reach the astronomi-
cal numbers of particles in fluids, we
pursue tasks in crowd analysis using a
similar concept of scale. Ranging from
Video c ameras mo ni to r i ng the activity of people in the macroscopic view of all particles
public settings are commonplace in cities worldwide. to the microscopic view of individual

At large events, where crowds of hundreds or thousands key insights


gather, such monitoring is important for safety and C omputer algorithms extract information
security purposes but is also extremely (technically) from digital videos of people in crowds as
a way to automatically track individuals,
challenging. Human operators are generally employed detect abnormal behavior, and segment
characteristic patterns of flow in crowds.
for the task, but even the most vigilant humans
Individuals in dense crowds, like
miss important information that could ultimately particles in a fluid, are restricted in
their motion by neighboring individuals,
contribute to unfavorable consequences. reflecting a kind of interdependence that
Major research efforts are under way to develop is pivotal for solution development.

systems that cue security personnel to individuals or T he tools of computational and applied
mathematics are indispensable for visual
events of interest in crowded scenes. Essential are analysis of crowds; pixel information
is translated into particle trajectories
methods by which information can be extracted from used to understand crowd flow on length
scales ranging from the macroscopic
video data in order to recognize crowd behaviors, track to the microscopic.

64 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

(a) (b) (c)

(a)
(a)City Marathon; (b) political rally in Los Angeles;
Figure 1. (a) New York (b)
(b)(c) pilgrims circling Kabba in Mecca. (c)
(c)

particles, we address the problems of


segmentation, abnormal-behavior de-
tection, and tracking.
Techniques devised by other re-
searchers have been used to consider
similar problems. Here, we briefly
review research in the area of visual
crowd surveillance, referring the read-
er to detailed articles on tracking23 and
crowd-behavior analysis.24
For problems involving crowd seg-
mentation, Chan and Vasconcelos8
used dynamic texture-based represen-
tations of scenes to determine how re-
gions differ, proposing a method7 for
counting pedestrians in high-density
crowds. Sand et al.20 implemented a
particle-based framework for the pur-
pose of estimating the motion in a
scene but did not use it for interpreta-
tion of significant segments. Figure 2. Flow field of a frame in the Kabba video.
For problems involving behavior
analysis, methodologies are available
for understanding crowd behavior.
The first, advanced by Marques et al.16
and Tu et al.,22 perceived a crowd as
an assembly of individuals, using seg-
mentation or tracking algorithms to
understand their behavior. The other,
promoted by Andrade et al.,3 viewed
a crowd as an organism, such that its
behavior is studied and accepted on a
global level. Reisman et al.19 proposed
that crowd behavior is recognized by
modeling the scene, giving a descrip-
tion of important features within it.
Kratz et al.15 detected anomalies as
statistical deviations from the ordi-
nary motion patterns in space-time
volumes to characterize the scene.
With regard to tracking in crowd-
ed scenes, one of the first important
methods was devised by Zhao et al.25
using ellipsoids to model the human
shape and color differences to mark
appearances. Another framework, by
Brostow et al.,5 assumed that points
appearing to move together are prob- Figure 3. Four frames from the video sequence of pilgrims circling Kabba and the FTLE field.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 65


contributed articles

ably part of the same object, tracking suitable for tracking individuals in characteristics of particles in a fluid
individuals based on the probability crowded scenes. and of people in a crowd. Most impor-
that points could be clustered togeth- tant, the motions of particles/people
er. More recently, Pellegrini et al.18 Particles and People are determined by the external forces
expanded a social-force model to take Random actions, relationships be- exerted on them; for example, both
into account destinations and desired tween energy and density, and a gas/ particles and people are affected by
directions of individuals, making it liquid/solid-state demeanor are all boundary forces (such as walls) and
feel the forces of neighboring par-
50 50 50 50
ticles/people. One difference is that
100 100 100 100
people are, to some extent, able to
150 150 150 150 determine their own destiny, so the
200 200 200 200 crowd may be viewed as a “thinking
250 250 250 250
fluid,”11 but there are still probabilistic
300 300 300 300

350 350 350 350


similarities to particle motion regard-
400 400 400 400
less of this difference.
450 450 450 450 When scientists consider hydrody-
100 100
200 200
300 300
400 400
500 500
600 600
700 700 100 100
200 200
300 300
400 400
500 500
600 600
700 700
namics, they often use different scales,
depending on the questions being
50 50 50 50 addressed.14 At the microscopic, one
100 100 100 100
may examine the position or velocity
of a particular particle among many.
150 150 150 150
On another, the macroscopic is used
200 200 200 200
to scrutinize the nature of enormous
250 250 250 250 collections of particles (such as a tree
300 300 300 300 branch moving in water). Between
350 350 350 350
them is the mesoscopic scale, which
50 100
50 150
100 200
150 250
200 300
250 350
300 400
350 450
400 450 50 100
50 150
100 200
150 250
200 300
250 350
300 400
350 450
400 450 is used to analyze the interaction of
“small” collections of particles, giving
50 50 50 50 characteristic information (such as
100 100 100 100
temperature and average density).
150 150 150 150
Considering the behavior of people
200 200 200 200

250 250 250 250


in a crowd, we take a similar approach,
300 300 300 300 depending on the questions we want
350 350 350 350 to answer. We focus on three gener-
400 400 400 400
ally recognized key problems in visual
450 450 450 450

100 100
200 200
300 300
400 400
500 500
600 600
700 700 100 100
200 200
300 300
400 400
500 500
600 600
700 700
crowd surveillance—crowd segmen-
tation, behavior analysis, and track-
50 50 50 50
ing—corresponding to the scales. To
100 100 100 100
be clear, some situations might neces-
150 150 150 150 sitate tracking a particular person in a
200 200 200 200
crowd, requiring a microscopic point
250 250 250 250
of view. Others might call for descrip-
300 300 300 300

350 350 350 350


tive information on when the behavior
400 400 400 400 of a crowd is abnormal, meaning it is
450 450 450 450 neither necessary nor feasible to track
100 100
200 200
300 300
400 400
500 500
600 600
700 700 100 100
200 200
300 300
400 400
500 500
600 600
700 700
every individual in a crowd but impor-
Figure 4. Video scene (left) and corresponding segmentation (right). tant to understand how groups of indi-
viduals interact, for which we employ a
mesoscopic point of view. Still, a macro-
scopic point of view is more appropriate
Particle Bag of for segmenting global patterns of flow.
Advection Forces Abnormal
Here, it is pertinent to discuss the
types of scenes and spatio-temporal
range of crowd behaviors that can be
handled through an understanding
Social Anomaly
Video of the hydrodynamics point of view.
Force Detection
To begin, hydrodynamics-based tech-
niques require that a crowd be viewed
from above, thereby minimizing ar-
Figure 5. Our approach for detecting abnormal behavior in crowd videos. tifacts resulting from independent

66 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

movement of multiple body parts. (see Figure 2). Every pixel has position
Side views of the scene are least prefer- x = (x, y), and the optical flow provides
able within the particle-based frame- velocities (u, v) at each position, so ob-
work; Figure 1 includes examples of jects are related to their velocities by
such scenes and camera setup. Our
algorithms next allow each pixel to Random actions, the system of equations

represent a particle, with a minimum


requirement on the spatial scale of at
relationships dx = u(x,y,t), dy = v(x,y,t)
dt dt
least one pixel per person. If two or between energy Particle advection is performed by over-
more people are matched to a single
particle, the methods may encounter
and density, laying the scene with a grid of particles
that serves as the initial conditions for
problems, but allowing as many par- and a gas/liquid/ this system of equations; particles are
ticles per person as the scene dictates
is certainly permissible.
solid-state then transported to new coordinates
in subsequent frames using a time-
Another noteworthy requirement is demeanor are all stepping technique for integrating the
that video scenes exhibit a dominant
trend typical of high-density crowds, characteristics system of equations, as in

where the movement of individuals


is restricted by other individuals, ob-
of particles in x(t+1) = x(t) + u(x(t),y(t),t),
y(t+1) = y(t) + v(x(t),y(t),t)
ligating the group to move as a whole, fluids and of people
like a fluid. However, the dominant
trend is key to the analysis, while the
in a crowd. Thus, the flow of the crowd in the scene
is given by particle trajectories.
density of the crowd is allowed to vary. Important to note is that errors
Since crowd behavior is naturally dy- and noise in optical flow are averaged
namic, and the flow (trend) of a crowd out to some extent as a result of time
can change with time, any video-based integration performed to determine
analysis of crowd motion should take particle trajectories. Thus, the par-
a sliding-window approach. It per- ticular method used to produce op-
forms the analysis over a particular se- tical flow is not crucial for the three
quence of frames (a window in time), problems we consider and has been
then “slides” the window to another verified experimentally. Temporal
sequence of frames to repeat the anal- scale for analysis is determined by the
ysis. The size of the window may be integration time t. In practice, t should
adaptable or fixed but depends on the depend on the rate of change of the
level of activity in the scene. Methods flow field, with a higher rate of change
explored here follow this principal. of flow field resulting in smaller time
Macroscopic scale (crowd segmenta- scales and vice versa. In our experi-
tion). The macroscopic scale suggests ments, we fixed t = two seconds or 60
a focus on global crowd behavior, re- frames for all scenarios.
quiring a comprehensive point of view; Particle advection produces a flow
Figure 1 includes examples of the types map, a function φtt (x0) = x(t; t0; x0) relat-
0

of scenes we consider, with thousands ing the position x of a particle at time t0


of people in view. In such settings, we + t to its original position x0 at the initial
are primarily interested in the overall time t0. That is, the flow map fully de-
movement of the crowd, meaning we scribes the trajectory of each particle,
are able to find segments of common which does not necessarily correspond
motion within it.1 to a person in a crowd but to a small
A key ingredient for our solution region in the scene exhibiting a col-
to the problem of segmentation is lective pattern of motion. In sections
called “particle advection” and used with coherent motion, the flow maps
in each of our three problem synop- show qualitatively similar behavior,
ses. The approach itself mimics a but trajectories experiencing different
common mathematical formulation behavior are from sections with differ-
of fluid mechanics, or Lagrangian ent coherent motion. These qualitative
specification, characterized by follow- differences define flow segments. Our
ing particular particles as they move primary mathematical tool for finding
with the flow.4 The first step in apply- these qualitative differences is called
ing this idea to a video sequence is to a “finite time Lyapunov exponent,” or
compute the optical flow, or apparent FTLE, we use to define Lagrangian co-
visual motion of objects in the scene herent structures.21 The FTLE is essen-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 67


contributed articles

tially a number that reflects how two Combining the FTLE fields for both and dark blue represents areas with
neighboring particles separate from forward and backward motion yields no coherent flow. A clear example is
one another over time and is computed vivid results (see Figure 3). We use a the traffic scene in the last row, with
using the maximum eigenvalue λmax of watershed algorithm to segment the dark blue representing regions out-
the Cauchy-Green deformation tensor FTLE field, making it find the exact side the lanes, and red, green and
Δ, obtained from the Jacobian matrix number of flow segments. This pro- light blue representing movement in
for the flow map, Dφtt (x). More precise-
0
cess is repeated by moving the sliding each direction.
ly, the largest FTLE with integration temporal window to obtain segments Mesoscopic scale (behavior detection).
time t is for subsequent time steps. Beyond a global understanding of pe-
The end result is a net segmenta- destrian flow in crowds, detection of ab-
1 tion showing each region exhibiting normal events or behavior is important,
σ= ln √λmax(Δ)
t a single clearly defined characteristic generally for the sake of public safety.
where flow pattern. Such a result is not pos- We use the local interactions of mul-
sible through segmentation based tiple people to identify regular patterns
Δ = Dφtt + t(x))T Dφtt + t(x)
0
0 0
0
solely on optical flow, because optical of motion, in addition to any anoma-
flow captures only motion between lies.17 A fundamental component of our
Computing the FTLE at every point two frames. On the other hand, parti- approach (see Figure 5) in this setting is
produces the FTLE field, a scalar cle advection motion in several frames a social force (fluids-based mathemati-
field that immediately exposes any is integrated over time and nicely cap- cal) model for describing pedestrian
regions in the scene with differing tured by the scalar FTLE field. Figure movement, as pioneered by Helbing
flow by finding particle trajectories 4 includes several results in which the and Molnar10 almost 20 years ago.
that start close together but end far motion in crowded pedestrian and The central idea hinges on New-
apart. In practice, the particle advec- traffic scenes is properly segmented ton’s second law of motion—force
tion approach allows implementa- through our method; each row shows equals mass times acceleration, or F =
tion of the algorithm in both forward a frame from a different video se- ma. In it, each individual in the scene
and backward time, meaning the flow quence, along with subsequent seg- reacts to forces that produce motion.
segments are the same regardless of mentation. Regions of different colors These forces can be deconstructed
which direction the flow is moving. signify qualitative changes in the flow, into two parts: the personal-desire
force (individuals striving to get to
their desired destinations) and the in-
teraction force (exerted on individuals
by other individuals or things in the
scene). Thus, pedestrian i changes ve-
locity according to
dvi
a= = Fp + Fint
dt
where Fp and Fint refer to personal and
(a)
interaction forces, respectively. In a
Normal Normal Normal Abnormal given scene, since individuals are all
relatively the same size, the masses are
assumed to be one. Quantifying these
forces (see Figure 6a for an example)
allows our method to establish the on-
(b) going behavior in the crowd, enabling
detection of any behavior out of the or-
Figure 6. (a) Optical flow (yellow) and computed interaction vectors (red) for pedestrians
with opposing directions; (b) frames of a sequence where the observed behavior suddenly dinary (Figure 6b).
becomes abnormal (people running in panic) in the last frame. Note that in very dense crowds, pe-
destrians follow group velocity and
goals,12 but as density decreases, per-
sonal interest plays a greater role in
pedestrian motion. Hence, at the me-
soscopic scale, our algorithm may use
scenes with mid-to-high crowd density,
provided the interaction force is not
negligible, meaning behavior is still
fluid-like.
The algorithm itself starts with
particle advection, followed by com-
Figure 7. Scheme for computing interaction force. putation of the forces. Each person in

68 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

the crowd has a desired direction and


speed, but individual direction and
speed is limited by the surrounding
pedestrians. The actual velocity vi of a
particle in the (xi, yi) coordinate is ob-
tained from the spatial-temporal aver-
age of optical flow. On the other hand,
p
the desired velocity v i is given by the
optical flow for that particle. Hence,
the personal-desire force is
1 p Ground Truth
Fp = (v i – vi) Ground
Ground Truth
Truth
τ Detection Result
Ground Truth
Frame # Detection Result
Detection Result 186
where τ is a relaxation parameter. Frame
Frame #
#
Detection Result 186
186
Frame # 186
Thus, the interaction force (see Figure
7) is given as
dvi
Fint = – Fp
dt
These forces together yield a sufficient
description of the motion in the scene
based on the acting forces.
Specification of the forces de-
termining the motion in the scene
provides understanding of synergy Ground Truth
between interacting particles but Ground
Ground Truth
Truth
Detection Result
Ground Truth
does not, by itself, secure evidence Detection
Frame # Detection Result
Result 216
of changes in behavior; for example, Frame #
Frame # Detection Result 216
216
Frame # 216
normal interaction forces on a stock-
market trading floor may differ dras-
tically from those of pedestrians on
the street. Using this technique to
detect and localize any changes in be-
havior, the computer must first learn
the “normal” behavior for the scene,
for which our algorithm takes a bag-
of-words approach. (In the same way
a document can be considered a bag
of words, a video can be considered a
Ground Truth
bag, or collection, of spatial-temporal Ground
Ground Truth
Truth
cuboids, for which the interaction Ground Truth
Detection Result
Frame # Detection Result
Detection Result 216
force is computed.) The idea is for the Frame Detection Result
Frame #
# 216
216
algorithm to use a training set of vid- Frame # 216
eos and match the interaction forces
with given dynamics. A video in ques-
tion can then be compared with those
from the training set, and changes
from the regular behavior in the scene
are easily identified by the computer.
To improve the fidelity of the results,
optical flow is smoothed by a Gaussian
filter, where the standard deviation of
the Gaussian distribution is empirically
Ground Truth
set to half the width of the typical per- Ground
Ground Truth
Truth
son in the crowd. This smoothing com- Ground Truth
Detection Result
Frame # Detection
Detection Result
Result 473
pensates for the inaccuracies of optical Frame #
# Detection Result 473
Frame 473
flow in textureless regions. Moreover, Frame # 473
using a bag of video words for several
Figure 8. Frames from different sequences, showing (left) normal behavior (green)
frames could also reduce the effects of and (right) abnormal escape panic (red), comparing ground truth to abnormal
inaccurate instantaneous optical flow. behavior detection.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 69


contributed articles

Sample results of the algorithm ing). At the “atomic” level, a surveil- their direction and velocity. The algo-
are in Figure 8; the videos for these lance analyst is interested in auto- rithm computes the probability that a
experiments are from the University matically following a person in a particular particle will move from one
of Minnesota and show walking pe- high-density crowd, a very challenging position to another, building on floor
destrians as the normal behavior. At problem, as the object our algorithm fields that provide information about
the end of each video the pedestrians is tracking is subject to occlusion, the scene.2 To make this clear, we make
suddenly run in all directions to es- and other nearby objects may lead the three assumptions about the flow in-
cape the scene. The figure shows de- tracker away from the original object. fluencing the individual’s behavior:
tection of abnormal behavior by our Figure 10 shows tracking results us- First, the person has a goal (place to
method (indicated as black triangles) ing our method in which individuals get to and clear direction how to get
compared with the ground truth. In are correctly tracked in four video se- there) and, in the absence of obstacles,
most cases, panic detection occurs quences involving hundreds of peo- will go there directly; this is the effect
immediately following the change ple; each image shows the tracks over- of what is called the “static floor field.”
in behavior. The receiver operating laying a single frame of the video. Second, the person avoids permanent
characteristic curves in Figure 9 show Inspired by research on evacuation fixtures (such as trash cans and walls)
a clear advantage of our method over dynamics,6,13 our method uses a scene- and virtual barriers (such as opposing
simply using the optical flow to detect structure-based force model that lik- crowd flow) as a consequence of what
abnormal behaviors. ens pedestrians to particles, such that is called the “boundary floor field.”
Microscopic scale (individual track- the forces acting on them determine And third, the person can move toward
the goal only as the flow of the crowd al-
Figure 9. ROC curves for detection of abnormal behaviors in the University of Minnesota lows; this motion and direction is the
data set; the area under the social force curve (red) is 0.96, and the area under the optical
flow curve (blue) is 0.84.
influence from the dynamic floor field.
A basic assumption on the static
floor field, based on the observation
  Force Flow    Optical Flow that directions of motion in high-den-
1 sity crowds have dominant trends, is
that crowd behavior remains constant
True Positive

0.8
during tracking. However, the static
0.6 floor field can be updated periodically
0.4 to respond to changes in the dominant
trends. To respond to any instanta-
0.2
neous change in crowd flow, the model
0 uses the dynamic floor, which is repre-
0 0.2 0.4 0.6 0.8 1 sentative of instantaneous crowd be-
False Positive havior in the vicinity of the target. The
main limitation of the floor-field track-
ing model is the inability to handle lo-
cations with no dominant trend (such
50 50 as a crowded museum) and locations
100 100
with more than one dominant trend
(such as pedestrian crossings).
150 150
We begin our description of the
200 200 method with the inference that peo-
250 250 ple in crowds are constantly avoid-
300 300
ing collisions. Hence, the boundary
floor field is repulsive and computed
(a) (b)
350
50 100 150 200 250 300 350 400 450
350
50 100 150 200 250 300 350 400 450
easily through particle advection and
the FTLE field, as described earlier in
50 terms of segmentation of crowd flow.
50
100 The edges of the computed segments
150 100
give the boundaries of the flow, leading
200
150 to the resulting boundary floor field
250

300 200
(see Figure 11).
Computation of the static floor
(c)
350
250
400 field (Figure 11d) is performed only
100 200 300 400 500 600 700
300 once for a given video using a small
350 (d) subset of all video frames. The first
50 100 150 200 250 300 350 400 450
step provides a representation of the
Figure 10. Tracking individuals using our method in (a–c) marathon scenes and (d) a crowded instantaneous changes in motion, or
train station. “point flow,” achieved by calculating

70 communications of th e ac m | d ec e m b e r 2 0 1 1 | vo l . 5 4 | n o. 1 2
contributed articles

the average optical flow for each loca-


tion over the entire subset of frames.
Our algorithm can then place a grid of
particles over the scene and determine
the preferred direction of each par-
ticle based on the motion of neighbor-
ing point-flow vectors. If the influence (b)
(a)
is great enough to move the particle to
the next cell, then the algorithm con-
tinues the process until the velocities
are not significant enough to move it
to the next position (see Figure 12).
This process is used by the algorithm
to find the sinks, defined computa-
tionally as the points where particle
motion ceases to exist. In terms of
crowd behavior, sinks are the desired
goals or locations of the individuals in
the crowd (such as preferred exits and (c)
(d)
frequently visited areas dominated by
the flow of the crowd). The sinks, as Figure 11. For the marathon sequence in Figure 10c: (a) crowd-flow segmentation obtained
through particle advection; (b) the corresponding edge map; (c) the boundary floor field; and
well as the shortest distance needed (d) the static floor field.
to reach them, produce the static floor
field.
Computing the dynamic floor field
means discovering the behavior of
the crowd around the individual. To
do this, the algorithm uses the opti-
cal flow for a subset of video frames L⊥
and performs particle advection. If a L //

particle changes its position between


(a) (b)
frames, then the value of interaction
between those frames is increased by
Figure 12. (a) The sink-seeking process. Red arrows signify the point flow influenced by
one, and zero interaction is assumed neighboring points; the yellow curve is the sink path. (b) The sliding window used to find
at the first frame in this sequence. sinks; the solid circle is the point under consideration; hollow circles inside the box are
neighboring points and outside the box non-neighboring points.
The individual’s interactions in a local
neighborhood are thus captured for
that interval of time (see Figure 13). 220

To bring the three floor fields to-


gether for the purpose of tracking,
230

the algorithm divides image space 240

8
into cells, so each cell is occupied by 250

6
one particle. The probability that a 260
4

particle at cell i will move to neighbor- 2


270 0 16
ing cell j is then computed and com- 2
14
12
bined with appearance information to 4
280

6 10
complete the tracking. This method 290
640 650 660 670 680 690 700 710 720
8
10 6
8

depends on computation of the influ- 12


14
4
2
ences from the static, dynamic, and 16

boundary floor fields, denoted Sij, Bij,


Figure 13. (left) Region for computing dynamic floor field, where green chip is a target
and Dij, respectively, with each needed individual; (right) dynamic floor field reflects strong relationship between the yellow cell
for accurately modeling the interac- and neighboring cells at the peak.
tion of individuals and their preferred
direction of motion. Described pre- strength of the object to the respec- by computing the gray-scale appear-
cisely, the probability that a particle tive field, C is a normalization con- ance template for a rectangular region,
will move from i to j is stant, and Rij is a similarity measure called a chip, surrounding the indi-
for the initial and updated appear- vidual, with average chip size 14 × 22
pij = CekDDijekSSijekBBijRij ance templates. pixels. The algorithm computes the
Experimentally, a target individual position of the target at the next time
where kD, kS, and kB are the coupling is selected by a surveillance analyst instant according to the probable loca-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 71


contributed articles

tion, as determined by the computed frame to frame is uncertain. However, Our underlying supposition is that
floor fields; the appearance similarity by combining the dynamic and static people in crowds appear to move ac-
is then computed by normalized cross floor fields (Figures 15b and c) with cording to the flow, like particles in
correlation, and the appearance tem- the appearance surface our method a liquid. Hence, we gaze through a
plate is automatically updated. Figure obtains a surface (Figure 15d) provid- hydrodynamics lens to analyze video
14 charts results for 50 tracks in a video ing the best match for the tracked indi- scenes in various scenarios on three
of a marathon, showing objects are vidual. Figure 16 also shows that, when different length scales. Each of our
tracked correctly in most cases. using all three floor fields, the tracking methods relies to some extent on the
Some tracking methods (though error is consistently low, but if using optical flow and associated particle ad-
not ours) depend mainly on appear- only one floor field, the error increases, vection adapted from the Lagrangian
ance information, but in crowded often significantly. approach to fluid dynamics.
scenes appearance is not enough, as Our experimental results have been
neighboring objects may have similar Conclusion excellent, and we expect the underly-
appearance. Figure 15a shows the ap- We have devised methods for seg- ing hydrodynamics theme can be taken
pearance similarity surface for a mara- menting motion, detecting abnormal further to solve other problems in visual
thon scene; the surface is relatively flat, behavior, and tracking individuals in surveillance of high-density crowds. Ul-
so which runner is being tracked from video scenes of high-density crowds. timately, we envision the ability to pre-
dict potentially hazardous situations in
Figure 14. Computed track lengths vs. ground truth for a marathon sequence. crowded scenes, though it is work for the
future. Training a computer to decipher
 Track Length (Our Method)   Track Length (Ground Truth) and understand crowd behavior from a
140 video sequence is extremely challeng-
120
ing; aside from having to sort through
a plethora of digital information, there
100 are also questions specific to each of the
Track Length (in trames)

three problems—segmenting motion,


80
detecting abnormal behavior, tracking
60 individuals—discussed here.
For crowd segmentation, our
40
method makes use of flow maps cor-
20
responding to each particle, comput-
ing maximal Lyapunov exponents to
0 reveal segments of coherent motion in
0 10 20 30 40 50 the scene.1 Our method performs well
Track Number for steady flows, with no changes in ge-
ometry, but segmentation for unsteady
flows is an open problem with several
challenges. Coherent flow segments in
crowds can change quickly, and to cap-
8

0.7

ture such changes, an algorithm must


6

0.65
5

0.6 4

0.55 3

2
distinguish changes within segments
from changes in segment boundar-
0.5

1
0.45

0
0.4

0.35
10

8
ies. One location in a scene may also
10
9
8 10
7

6
exhibit alternating collective patterns
of motion, meaning several segmenta-
5
7 9
6 8 4
7
5 6
3 10
4 5 9
8

tions are needed to describe different


3 4 7
2 6
3
2 5
2 4
1 1 3
1 2
1

(a) (b) modes within a single region. In addi-


tion, modeling abstract human behav-
2

1
iors that help define segments (such as
0
0.75

0.7
courteous acts, social agreement, and
−1

−2
0.65

0.6
individual intention) is difficult. More-
−3
0.55

0.5
over, scenes can grow more complex,
−4
0.45

0.4
as moving/cluttered back/foregrounds
are important in segmentations more
−5
0.35
10
9
8 10

discriminating than ours.


7 9
6 8 10
5 7 9
6 8
4 7

For detecting abnormal behavior,


10 5 6
3 9
8 4 5
7
2 6 4
5 3
4 3
1 3 2
2 2
1
1 1

our method approximates the inter-


(c) (d)
action forces in the crowd to build a
Figure 15. For tracking a runner in a marathon sequence: (a) appearance similarity surface; model for the motion, detecting anom-
(b) dynamic floor field; (c) static floor field; and (d) final decision surface. alies as deviations from the norm.17

72 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

8. Chan, A.B. and Vasconcelos, N. Mixtures of


Figure 16. Average tracking error for each object in a marathon sequence using only
dynamic textures. In Proceedings of the 10th IEEE
dynamic (green), only static (maroon), and all three (blue) floor fields. International Conference on Computer Vision (Beijing,
Oct. 17–20, 2005), 641–647.
9. Helbing, D. Traffic and related self-driven many-
Tracking Error (Using SFF, DFF & BFF) particle systems. Review of Modern Physics 73, 4 (Dec.
Tracking Error (Using only DFF) 2001), 1067–1141.
Tracking Error (Using only SFF) 10. Helbing, D. and Molnar, P. Social force model for
60 pedestrian dynamics. Physical Review E 51, 5 (May
1995), 4282–4286.
11. Hughes, R.L. The flow of human crowds. Annual
50 Review of Fluid Mechanics 3 (2003), 169–182.
12. Hughes, R.L. A continuum theory for the flow of
pedestrians. Transportation Research (Part B:
40 Methodological) 36, 6 (July 2002), 507–535.
13. Kirchner, A. and Schadschneider, A. Simulation of
evacuation processes using a bionics-inspired cellular
30 automaton model for pedestrian dynamics. Physica
A: Statistical Mechanics and its Applications 312, 1–2
(Sept. 2002), 260–276.
20 14. Kotelenez, P. Stochastic Ordinary and Stochastic
Differential Equations. Springer, New York, 2008.
15. Kratz, L. and Nishino, K. Anomaly detection in
10
extremely crowded scenes using spatio-temporal
motion pattern models. In Proceedings of the
0 IEEE Conference on Computer Vision and Pattern
Recognition (Miami, June 20–26, 2009), 1446–1453.
0 5 10 15 20 25 30 35 16. Marques, J.S., Jorge, P.M., Abrantes, A.J., and
Track Number Lemos, J.M. Tracking groups of pedestrians in video
sequences. In Proceedings of the IEEE Computer
Vision and Pattern Recognition Workshop (2003), 101.
17. Mehran, R., Oyama, A., and Shah, M. Abnormal
This approach works well for detecting among them makes it difficult to dis- behavior detection using social force model. In
Proceedings of the IEEE Computer Society Conference
global changes in regular motion, but tinguish one from another. In addition, on Computer Vision and Pattern Recognition (Miami,
June 20–26, 2009), 935–942.
detecting smaller (more local) events is occlusions result in loss of observation 18. Pellegrini, S., Ess, A., Schindler, K., and van Gool, L.
more difficult. Our method is also, by of a target object, while the object’s You’ll never walk alone: Modeling social behavior for
multi-target tracking. In Proceedings of the 12th IEEE
design, good at measuring the forces appearance (such as shape and color) International Conference on Computer Vision (Kyoto,
individuals exert on one another but varies, not only from one setting to the Sept. 27–Oct. 4, 2009).
19. Reisman, P., Mano, O., Avidan, S., and Shashua, A.
is unable to recognize specific behav- next, but also as a given setting evolves. Crowd detection in video sequences. In Proceedings
iors and distinguish the acceptable of the IEEE Intelligent Vehicles Symposium (Parma,
Italy, June 14–17, 2004), 66–71.
from the unacceptable. This limitation Acknowledgment 20. Sand, P. and Teller, S. Particle video: Long-range
stems from an enormous variety in the This article summarizes and incor- motion estimation using point trajectories. In
Proceedings of the IEEE Conference on Computer
behaviors observed in crowded scenes, porates three earlier publications: Vision and Pattern Recognition (New York, June 17–22,
along with the difficulty of distinguish- Ali et al.,1 Ali and Shah,2 and Mehran 2006), 2195–2202.
21. Shadden, S.C., Lekien, F., and Marsden, J.E. Definition
ing certain activities from other activi- et al.17 The research is partially sup- and properties of Lagrangian coherent structures from
ties. Some behaviors are easily defined ported by the U.S. Army Research Of- finite time Lyapunov exponents in two-dimensional
aperiodic flows. Physica D: Nonlinear Phenomena 212,
(such as bottlenecks or lanes), but for- fice, part of the U.S. Army Research 3–4 (Dec. 2005), 271–304.
mulating clearly defined behaviors for Laboratory, under grant number 22. Tu, P., Sebastian, T., Doretto, G., Krahnstoever, N.,
Rittscher, J., and Yu, T. Unified crowd segmentation.
general crowd motion is difficult, as is W911NF-09-1-0255 and by the U.S. De- In Proceedings of the 10th European Conference
on Computer Vision (Marseille, Oct. 12–18, 2008),
categorizing unsteady flows, with the partment of Defense. 691–704.
flow constantly changing. 23. Yilmaz, A., Javed, O., and Shah, M., Object tracking:
A survey. ACM Computing Surveys 38, 4 (2006),
For tracking in high-density crowds, References 13.1–13.45.
our method exploits the influences 1. Ali, S. and Shah, M. A Lagrangian particle dynamics 24. Zhan, B., Monekosso, D.N., Remagnino, P., Velastin,
approach for crowd flow segmentation and stability S.A., and Xu, L. Crowd analysis: A survey. Machine
of boundaries, neighboring pedestri- analysis. In Proceedings of the IEEE Conference on Vision and Applications 19, 5–6 (2008), 345–357.
Computer Vision and Pattern Recognition (Minneapolis,
ans, and desired direction, along with June 18–23, 2007), 1–6.
25. Zhao, T. and Nevatia, R. Tracking multiple humans
in a crowded environment. In Proceedings of the
appearance information, to identify 2. Ali, S. and Shah, M. Floor fields for tracking in high- IEEE Conference on Computer Vision and Pattern
density crowd scenes. In Proceedings of the 10th
the position of a target in subsequent European Conference on Computer Vision (Marseille,
Recognition (Washington, D.C., June 27–July 2, 2004),
II-406–II-413.
frames.2 Our algorithm produces ex- France, Oct. 12–18), Springer, 2008.
3. Andrade, E.L., Blunsden, S., and Fisher, R.B. Modeling
cellent results for extremely crowded crowd scenes for event detection. In Proceedings
Brian E. Moore (bmoore@math.ucf.edu) is an assistant
scenes, where the tracked individual of the 18th International Conference of Pattern
professor of mathematics in the Department of
Recognition (Hong Kong, Aug. 20–24, 2006).
is highly influenced by the flow of the 4. Bennett, A. Lagrangian Fluid Dynamics. Cambridge
Mathematics at the University of Central Florida, Orlando,
FL.
crowd, but tracking in crowds that University Press, New York, 2006.
5. Brostow, G. and Cipolla, R. Unsupervised Bayesian Saad Ali (saad.ali@sri.com) is a computer scientist in
are less dense, allowing pedestrians detection of independent motion in crowds. In the Vision Technologies department at SRI International
to move against the flow, still involves Proceedings of the IEEE Conference on Computer Sarnoff, Princeton, NJ.
Vision and Pattern Recognition (New York, June 17–22,
many research problems; for example, 2006). Ramin Mehran (ramin@cs.ucf.edu) is a Ph.D. student
6. Burstedde, C., Klauck, K., Schadschneider, A., and in the Computer Vision Lab at the University of Central
crowd dynamics involve psychological Florida, Orlando, FL.
Zittartz, J. Simulation of pedestrian dynamics using
aspects (such as preferences and hab- a two-dimensional cellular automaton. Physica A:
Mubarak Shah (shah@cs.ucf.edu) is Agere Chair
its) that influence individual behavior, Statistical Mechanics and its Applications 295, 3–4
Professor of Computer Science and founding director
(June 2001), 507–525.
of the Computer Vision Lab at the University of Central
thereby increasing scene complexity. 7. Chan, A.B. and Vasconcelos, N. Bayesian poisson
Florida, Orlando, FL.
regression for crowd counting. In Proceedings of 12th
Aside from the thoughts and intent of IEEE International Conference on Computer Vision
individuals, the constant interaction (Kyoto, Sept. 27–Oct. 4, 2009), 545–551. © 2011 ACM 0001-0782/11/12 $10.00

D EC EM B E R 2 0 1 1 | VO L. 5 4 | N O. 1 2 | C O M M U N I CAT IO N S O F THE ACM 73


contributed articles
doi:10.1145/2043174.2043193
includes permission to be reused in
Software developers’ reuse of code from commercial software development,14
making it highly attractive for firms.2
the Internet bears legal and economic risks Therefore, some firms systematically
for their employers. reuse it by including identification,
evaluation, and integration of suit-
by Manuel Sojer and Joachim Henkel able code in their development pro-
cesses.18 Alternatively, Internet code

License Risks
can also be reused in ad hoc fashion,
as described in Umarji et al.,23 with in-
dividual professional developers, on
their own and typically without tell-

from Ad Hoc
ing anybody, searching the Internet
for existing code as a shortcut in their
work, downloading and integrating it
into the software they develop.a

Reuse of
Despite its general suitability for
reuse in commercial software, In-
ternet code is rarely in the public
domain and usually under licenses

Code from
that demand compliance with spe-
cific conditions as a prerequisite for
reuse.8 These conditions vary widely
and may, for example, demand attri-

the Internet
bution of the original creators of the
reused code. More critical for firms
are the obligations demanded by the
GNU General Public License (GPL)b

a Places to search for code include OSS reposi-


tories (such as SourceForge.net), code search
engines (such as Koders.com), and code bases
of related OSS projects; for a detailed overview
and quantitative analyses, see Sojer.20
b The GPL is a family of licenses, including ver-
sions 1, 2, and 3; since all versions share the
Reusing existing software artifacts when developing “copyleft” obligation, we refer to the whole
family as “the GPL” throughout this article.
new software is an attractive way to reduce development
costs and time to market while improving software
key insights
quality.4 Code is the artifact most commonly reused
in software development.16 Researchers have identified P rofessional software developers reuse
code freely available on the Internet (such
such reuse in commercial software development as open source code) in their commercial
projects in ad hoc fashion.
as a new facet of software reuse.13,22 Here, “Internet
S uch code often comes with license
code” means code in the form of components (such obligations; noncompliance can mean
legal and economic risk, but developers
as a library encapsulating required functionality) and are often not sufficiently knowledgeable
snippets (such as containing a synchronization block) in these matters.

that can be downloaded from the Internet for free and F irms should establish clear policies
regarding reuse, leveraging reliable
without individual agreement with the originator; an information resources on the Internet
and complementing them with internal
important instance of such code is publicly available training, lobby universities to include the
topic in their curricula, and acknowledge
open source software (OSS). Internet code generally the interdisciplinary nature of the issue.

74 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


as the most common license.11 The nomic consequences.19 to weigh the benefits and risks of reus-
GPL is an OSS license, requesting that Other license conditions that can ing and manage potential license is-
other code tightly integrated with the be problematic for firms include re- sues properly. Yet colloquial evidence
code it governs is also licensed under using the code only in non-commer- of the reuse of Internet code in ad hoc
its terms.9 These terms allow users cial settings, only in certain applica- fashion—as opposed to systematic re-
of GPL-licensed software to access, tion types, only for a certain period use—suggests individual professional
modify, and redistribute the source of time, and only when not exporting software developers do not always ad-
code of the software.19 For firms try- it to certain geographic locations.17,c dress the license obligations of the
ing to protect their source code as Finally, some code available from the code they reuse.12,15 Thus, while their
proprietary intellectual property, Internet does not explicitly spell out ad hoc reuse of Internet code might
complying with this requirement may license or reuse conditions, though still result in greater effectiveness, effi-
be difficult. However, firms that inte- since it is protected by copyright, ciency, and quality for their firms, their
grate code under the GPL into their proper reuse necessitates contacting behavior might also produce legal and
Illustration by Met ropolis

software without complying with the the creator and other rights holder(s) economic trouble.
license terms and are then found out and asking permission. Most previous published research
can be legally forced to replace the When Internet code is reused sys- addressing reuse of Internet code is
GPLed code or license the entire pro- tematically it seems feasible for firms largely theoretical or based on indus-
gram under the GPL. Either option trial case studies. As an exception, Ger-
could produce costly legal and eco- c Such restrictions are not in OSS licenses. man and various co-authors6–9 quanti-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 75


contributed articles

tatively investigated license issues from velopers to pre-test the questionnaire. lation.e We extracted a total of 93,541
OSS code reuse through the analysis of We chose a survey-based research ap- unique email addresses from more
code bases and software distributions. proach over an analysis measuring the than one million messages posted
To complement this work, we em- share of reused Internet code in com- over the previous three years in 528
ployed quantitative data obtained mercial software code bases. While newsgroups dealing with software
from a global survey we conducted in this setup did not allow us to calculate development.f After cleaning the ad-
2009 involving 869 professional soft- a precise percentage of reuse of Inter- dresses, we selected a random sample
ware developers to explore ad hoc re- net code in commercial software de- of 14,000 addresses and invited the
use of Internet code, with a special velopment, it did allow us to include newsgroup participants via email
focus on license issues. Our findings more professional software develop- messages to take our online survey.
should provide firms with a starting ers. Moreover, if deviations between We received 1,133 fully filled-in re-
point for assessing their exposure to developers’ actual and survey-reported sponses, yielding a response rate of
license risks from their developers’ ad reuse would arise, they would be un- 9.9% (consistent with other Internet
hoc reuse of Internet code and devis- likely to be systematic and thus should surveys).g Of them, 869 responses
ing measures to avoid potential relat- not affect the results of our multivari- were submitted by current or former
ed liabilities. ate analyses. professional software developers
Since we were among the first to who are the focus of the analyses dis-
Survey investigate ad hoc reuse of Internet cussed in the following sections.
We developed the questionnaire fol- code by individual professional soft- The vast majority (98%) of the 869
lowing our literature review and 20 in- ware developers, we opted not to use professional software developers we
terviews with industry experts.d Before a limited sample of developers from surveyed was male, with average age
conducting the survey, we enlisted four a single firm but rather a broad and 35.6, living in Europe (53%), North
academic peers and 113 software de- heterogeneous group of professional America (28%), Asia (12%), and South
software developers active in Inter- America (4%); 56% had previously
d Full questionnaire available from the authors. net newsgroups as our survey popu- contributed to OSS. At the time of the
survey, in 2009, 79% of the developers
Figure 1. Extent of ad hoc reuse of Internet code, 2009. were employed as professional soft-
ware developers; the others had been
working as professional developers
Importance of ad hoc reuse of Internet code for professional
software developers in 2009 (in % of developers surveyed) but had quit before 2009.h On aver-
age, survey participants had 9.7 years
29%
30% 24% of work experience as professional
20% 16%
19% developers in 2009, most as program-
12%
10%
mers (51%), others as software ar-
chitects (28%) and project managers
0
(4%); 23% were employed as freelanc-
Not important Not very Somewhat Important Very
at all important important important ers in 2009, and the others worked on
Note: N = 732
permanent contracts.
Also at the time of the survey, 54%
of the developers worked for firms for
which software development was the
Figure 2. Evolution of extent of ad hoc reuse of Internet code, through 2009. main business, with 68% developing
software for external customers, the
rest for internal use in their firms.
Importance of ad hoc reuse of Internet code for professional developers in
most recent year as a developer (in average importance perceived by those surveyed). Among the 68% writing software for
external customers, 62% were creat-
3
3.0 ing off-the-shelve software for multi-
2.5
2.2
2 1.8 1.8
e Potential limitations of our approach are dis-
cussed later in the section on threats to valid-
1 ity.
Before 2002 2002 & 2003 2004 & 2005 2006 & 2007 2008 & 2009 f The 528 newsgroups included all main and
high-traffic groups (such as comp.lang.c++
S.D. 1.2 1.3 1.3 1.2 1.3 and comp.lang.java.programmer).
Number of 32 13 17 28 779 g To calculate response rate, we adjusted the
developers number of invitations we sent to potential
in class survey participants by the number of email
Notes: Average values displayed for multiyear groups; S.D. = standard deviation; importance scale: 1 = not important at all; 2 = not very important; 3
messages that did not reach their designated
= somewhat important; 4 = important; 5 = very important; N = 869. recipients.
h In the following sections, we report the charac-
teristics of the last software development activi-
ties of developers who quit creating software.

76 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

ple customers, and the rest developed code reuse much more attractive to software developers attribute to ad
custom software. These distinctions professional developers. hoc reuse of Internet code we conduct-
are important because the license ed an exploratory regression analysis
risks resulting from reuse of Internet Determinants of Code Reuse with the data collected in our survey.
code are typically more severe for soft- To understand which factors most in- The model (see Table 1) employs an
ware developed for multiple external fluence the importance professional ordered logistic regression10 and the
customers.
Table 1. Multivariate analysis of the importance of ad hoc reuse of Internet code.13
Extent of Code Reuse
To quantitatively assess the extent of Ordered Logistic Regression
ad hoc reuse of Internet code in com- Coef. Std. Err.
mercial software development, we License risk level of developer’s work 0.111 0.085
asked survey participants to indicate Developer has never received any form of training –0.258 0.167
how important reusing Internet code or information on Internet code reuse (dummy)
(components and snippets) in an ad Developer’s self-assessed knowledge about 0.442*** 0.099
hoc fashion was for their work. Internet code licenses

Outlining the perceptions of pro- Developer’s objectively assessed knowledge –0.032 0.057
about Internet code licenses
fessional software developers active
Developer has OSS experience (dummy) 0.391*** 0.143
in 2009, Figure 1 reflects that ad hoc
Experience as professional software developer (in years) 0.017* 0.009
reuse was an essential part of the work
Last year as professional software developer 0.197*** 0.043
of many professional developers. More
Software development role (dummies, reference group: architect)
than half of those we surveyed (59%)
Project manager 0.155 0.358
considered ad hoc reuse of Internet
Programmer –0.356** 0.149
code at least “somewhat important”
Analyst –0.943 0.969
for their work, while only 12% appar-
Tester –1.176 0.717
ently did not reuse any Internet code
Database developer –0.751** 0.350
in ad hoc fashion. This finding con-
Other –0.281 0.241
trasts with the prevailing assumption
Primary programming language (dummies, reference group: Ruby)
of many firms that their code base
Python –0.284 0.276
does not or only to a small, controlled
Perl –0.861** 0.435
degree contain Internet code.15
Java –1.015*** 0.268
In addition to analyzing the extent
PHP –1.533*** 0.381
of ad hoc reuse of Internet code, we
C –1.550*** 0.333
also investigated the historic develop-
C++ –1.808*** 0.269
ment of such reuse. Figure 2 includes
Visual Basic –2.001*** 0.516
the perceptions of professional soft-
C# –1.957*** 0.315
ware developers who quit creating soft-
Other –1.842*** 0.258
ware before 2009. Since we asked sur-
Developer lives in…(dummies, reference group: Europe)
vey participants about their last year
…North America 0.016 0.164
as active developers, their responses
…South America 0.727** 0.337
are informative about the respective
…Asia or rest of world –0.206 0.210
year. Our survey data shows that start-
ing with 2004 the importance of ad hoc Developer is working as a freelancer (dummy) 0.041 0.163

reuse of Internet code for professional Education (dummies, reference group: engineering)

software developers had increased, ris- Computer science or related subject –0.223 0.158

ing from a mean importance value of Mathematics or physics –0.300 0.251

1.8 (“not very important”) in 2002 and Business administration –0.222 0.421

2003 to 3.0 (“somewhat important”) in Other subject 0.147 0.258

2008 and 2009. Developer is working on embedded –0.159 0.185


software projects (dummy)
A possible interpretation is that
Developers’ self-assessed software –0.048 0.087
before 2004, code available from the development skill level
Internet might have only rarely been Observations 807
suited for reuse in commercial soft- Pseudo R² 0.09
ware development because it was not Wald test X²(32)=208.94, p<0.0001
mature enough and covered only a Cuts 393.318, 395.001, 396.161, 397.153
few functional areas. However, result- * significant at 10%, ** significant at 5%, *** significant at 1%
ing from the strong recent growth of Notes: Significant coefficients are in bold type; reported standard errors are robust standard errors.
OSS,3 both the quality and the fields Descriptive statistics and the correlation table of the explanatory variables used are available
from the authors on request.
for which code exists should have in-
creased strongly, thus making Internet

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 77


contributed articles

perceived importance of ad hoc reuse nal customers did not deem ad hoc re- did not differ significantlyj in their
for the individual work of professional use as less important than developers view of the importance of ad hoc re-
developers measured on a five-point working on custom software or soft- use of Internet code from developers
scale as a dependent variable. As in- ware for internal firm use. A possible who were trained or had received such
dependent variables we included mul- interpretation is that developers, in information. Also, while developers
tiple characteristics of professional deciding to reuse Internet code, did who self-assessed their knowledge
developers, some as dummy variables. not acknowledge the real possibility about Internet code licenses better
Regression coefficients are not stan- of negative legal and economic con- also deemed ad hoc reuse of Internet
dardized, such that the range or stan- sequences their employers might face code reuse more important, this rela-
dard deviation of a variable must be due to license violations. However, tionship does not hold for an objective
taken into account when assessing we can also think of two alternative assessment of developer proficiency
the variable’s effect on the importance explanations: One could assume less regarding licenses for the code.k If
professional developers attribute to ad reusable code was available for inter- we (plausibly) assume that the re-
hoc reuse in their work. nal use or custom software due to its sults of our objective assessment are
First, the model results point out tailored nature; and one could also more informative about developers’
that developers’ ad hoc reuse seemed imagine that while not considering ad license-related knowledge than their
to be independent of the “license risk hoc reuse less important, profession- self-assessment, we can also assume
level”i; that is, developers creating al developers were still more careful that developers, at least as of 2009, on
software to be sold to multiple exter- when reusing such code in develop- average did not correctly account for
ment projects for multiple external their own knowledge about licenses
customers. for Internet code when considering ad
i We set “license risk level” to 1 if developers Moreover, developers who never hoc reuse of Internet code.
were working on internal projects, to 2 if they
were working on external projects for only one
had any training or information on The model also indicates that de-
customer, and to 3 if they were working on reusing Internet code and thus should velopers who had been active in OSS
projects for multiple external customers. be more likely to create license issues projects and those with longer ex-
perience as professional developers
Figure 3. Sources for learning about reuse of Internet code, 2009. considered ad hoc reuse significantly
more important.l A plausible interpre-
tation of this finding, consistent with
Professional software developer sources for learning about
reuse of Internet code in 2009 (in % of developers surveyed) Sojer and Henkel,21 is that for OSS-
80%
65% savvy developers, the costs of search-
60% ing, evaluating, and understanding
46%
40% 33% Internet code should be lower than
23%
20%
21%
16% for developers with less OSS experi-
5% ence. Likewise, more senior develop-
0
ers should face lower costs for reuse
Internet Friends and Magazines Firms Education/ Other No training or
colleagues university information at all due to their typically larger personal
Note: N = 732
networks and reuse experience. The

j Throughout this article, “significant” means


“statistically significant.”
Table 2. Software developer familiarity with license obligations concerning Internet code, k This objective assessment of developer knowl-
2009. edge of the obligations of reusing Internet
code is based on a five-question quiz in our
survey. We developed the quiz following 20 in-
terviews with experts in the reuse of Internet
Not familiar Not very Somewhat Very code. The quiz covers five typical scenarios in
at all familiar familiar Familiar familiar which professional software developers may
Share of developers 2% 3% 29% 50% 16% violate license obligations when reusing In-
self-assessing their ternet code. One might conjecture that the in-
familiarity with Internet significance of the objectively assessed knowl-
code license obligations edge is caused by the fact that it is correlated
in the respective groups with the self-assessed knowledge. However,
Developers’ average 0.88 1.50 2.08 2.74 3.11 this insignificance persists when the self-as-
score in quiz on license sessed knowledge level is dropped from the
obligations from list of explanatory variables.
Internet code reuse l Note regression coefficients are not standard-
(max. score attainable: ized. Since “experience as professional soft-
5, average score across ware developer” is measured in years, ranging
all groups: 2.54) from 0.5 to 45, its effect is comparable in size
Note: N=732. to the dummy variable “developer has OSS
experience.” The coefficient of the latter vari-
able is much larger (0.39 vs. 0.017), though its
range is much smaller (1 vs. 44.5).

78 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

multivariate model also supports the common ones (such as Visual Basic mation on the reuse of Internet code.
result outlined in Figure 2, showing and C#), and various others formed Overall, these findings suggest that
the perceived importance of ad hoc re- the last group viewing code reuse as conveying knowledge about reusing In-
use of Internet code grew significantly least important. ternet code and potential license risks
from 2004 to 2009. While one could conjecture that di- was not high on the agenda of firms
Moreover, the developers we sur- verse legal systems (such as common and universities, at least until 2009.
veyed had different views of the im- law vs. civil law), cultural variations, Given the high number of develop-
portance of ad hoc reuse of Internet and the availability of Internet code in ers surveyed who reported never hav-
code depending on their development local language lead to different views ing received training or information
role. Programmers and database de- of the importance of ad hoc reuse in on the reuse of Internet code or who
velopers attributed significantly less different geographies, our survey did relied on information from unofficial
importance to it than the architects not find substantial support for such channels (such as the Internet and
we defined as a reference group. For reasoning; Asian, European, and North friends), we were compelled to inves-
all other roles, the difference with American developers did not differ tigate their knowledge of licenses for
the “architects” was insignificant at a significantly in how they perceived the such code. When self-assessing their
10% level. The finding that architects importance of ad hoc reuse; only South knowledge, two-thirds of surveyed
deemed ad hoc reuse significantly American developers deemed such re- developers reported being “familiar”
more important than programmers use significantly more important. How- or “very familiar” with nearly all obli-
is startling since architects should be ever, since only 33 South American de- gations in Internet code licenses (see
concerned with systematic rather than velopers participated in the survey, this Table 2). Contrasting this self-assess-
ad hoc reuse. However, architects, finding may not be representative. ment with the results of our five-ques-
especially in smaller and mid-size Finally, our survey did not find sig- tion quiz about license obligations
firms, might also take on programmer nificant differences in professional resulting from the reuse of Internet
responsibilities and leverage their developers’ perception of the impor- code (discussed earlier) suggests de-
greater architectural latitude to reuse tance of ad hoc reuse based on their velopers overestimated their knowl-
Internet code in an ad hoc fashion. education and skills and whether they edge. Even those who viewed them-
The architecture of a piece of software develop embedded or traditional soft- selves as “very familiar” with license
influences how easy it should be to re- ware or were employed, at the time of obligations on average failed on two
use external code.5 Shaping architec- the survey, in time-limited contracts questions in our quiz, obtaining a
ture, architects might have more con- (such as freelancers) or as permanent mean score of 3.11 out of a maximum
trol over reusing Internet code than employees. of 5.m Moreover, while positive and
programmers for whom the architec- statistically significant (p<0.001), the
ture of the software they develop is Developer Knowledge correlation between self-assessment
often exogenous. Moreover, greater and Risks for Firms and quiz score in the survey was weak,
architectural latitude could also al- How well are professional software de- with a correlation coefficient of 0.345.
low developers to integrate Internet velopers prepared to deal with the li- We also sought to identify the fac-
code in such a way as to avoid license censes and obligations associated with tors that influence developers’ ob-
violations,9 assuming developers are ad hoc reuse of Internet code? jectively assessed knowledge about
aware of the relevant issues in the first It seems reasonable to assume that Internet code licenses and their obli-
place. Supporting this line of thought, professional developers who are more gations. The exploratory Tobit10 regres-
our survey found that architects are aware of the particularities of Inter- sion model (see Table 3) uses develop-
significantly more knowledgeable re- net code (such as its licenses) are less ers’ scores in the survey’s license quiz
garding licensing topics than other likely to ignore license obligations. as the dependent variable. The results
developers, including programmers. Thus, we first investigated whether underscore that developers with OSS
Architects should still be able to re- professional software developers had experience were significantly more
use Internet code properly, while pro- received training or information on knowledgeable about Internet code
grammers would have to choose be- reuse at the time of the survey and the licenses than other developers. Fur-
tween reusing the code in a way that sources of such training and informa- thermore, most forms of training and
violates the code’s license obligations tion (see Figure 3). information about reusing Internet
and not reusing it at all. Two rather informal channels—
The main programming language the Internet (65%) and friends and
developers were using influenced colleagues (46%)—were developers’ m We pre-tested the quiz questions to make sure,
as much as possible, they were of comparable
how they viewed ad hoc reuse in their reported main sources of informa- difficulty and relevance. However, there was
work. For example, developers relying tion about Internet code licenses and some variation among them, and it is possible
mainly on Ruby or Python found ad their particularities. Comparatively that respondents who described themselves as
hoc reuse most important, followed unimportant were firms (21%) and “very familiar” with the obligations of licens-
by those working with Perl, Java, PHP, ing Internet code (and who failed on average to
educational institutions, including
answer 1.89 questions out of 5 in the survey),
and other such languages. Developers universities (16%). Meanwhile, 23% of often struggled with questions on license is-
using more traditional programming the developers we surveyed had not sues that appear less frequently and are thus
languages (such as C and C++), less received any form of training or infor- less critical for firms.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 79


contributed articles

code (from firms, friends, colleagues, seemed prevalent while also exposing tently significant effect on whether or
magazines, and other sources) did not firms to risks, it would seem reason- not a firm had such policies.
exert significant influence on develop- able for firms to introduce explicit pol- Of the developers working in firms
er knowledge. Developers who had re- icies providing guardrails to develop- with policies regarding Internet code
ceived training or information in edu- ers considering reuse of Internet code. reuse, nearly one-quarter reported not
cational institutions were significantly However, only about one-third of to have read them. Programmers were
less proficient than other developers. the developers we surveyed worked in less likely to have read policies than
Only information acquired from the firms with policies regulating such re- architects; also, developers unhappy
Internet had a significant positive ef- use. More detailed analysis of this mat- with their jobs were significantly less
fect on developer knowledge. ter emphasizes that firms with more likely to have read their employers’
Along with these factors, the devel- than 5,000 employees were 31% more policies.o Additionally, developers
opers from Asia and North America likely to have such policies, while there who were not involved in development
seemed to know less about Internet was no significant difference among projects for multiple external custom-
code licenses than their European smaller firms of various sizes.n More- ers were significantly less likely to have
and South American counterparts in over, firms for which software devel- read the policies.
2009. Regarding educational back- opment was the main business had a As a consequence of the overall
grounds, developers with academic 19% greater probability of having such situation regarding the ad hoc reuse
degrees in computer science and policies, while firm age had no consis- of Internet code described here, it is
engineering were more proficient re- not surprising that our survey found
garding Internet code licenses than that 21% of the developers creating
n These findings result from exploratory logistic
other developers. regression analyses and resulting marginal ef- software in 2009 had at least once not
In the situation described earlier fects not covered here; full regression tables checked thoroughly for Internet code
in which ad hoc reuse of Internet code are available from the authors. license obligations when reusing snip-
pets; 16% did the same when reusing
Table 3. Multivariate analysis of software developer knowledge concerning Internet code components; and 14% ignored license
licenses.17
obligations they were aware of when
reusing snippets.

Tobit Regression Threats to Validity


Coef. Std. Err. Given the multiple variables in our re-
Developer has OSS experience (dummy) 0.835*** 0.098 gression models, the size of our sam-
Developer has received training or information ple, and significance levels reported,
on Internet code from…(dummies) our results should reflect statistical
…firms 0.124 0.120 validity. However, the threats to inter-
…educational institutions –0.243* 0.126 nal, construct, and external validity of
…friends and colleagues 0.080 0.112 this work should be addressed in fu-
…Internet 0.390*** 0.122 ture research.
…magazines 0.089 0.112 In terms of internal validity, the
…other sources –0.091 0.213 explanatory and control variables in
Developer lives in…(dummies, reference group: Europe) our models should ensure no omitted
…North America –0.238** 0.117 variable biases influence our survey
…South America –0.119 0.222 results. However, since our question-
…Asia or rest of world –0.297** 0.142 naires were completed anonymously
Education (dummies, reference group: computer science by developers identified through email
or related subject)
addresses, we cannot be sure of the ac-
Engineering 0.073 0.124
curacy and truthfulness of the answers
Mathematics or physics –0.320* 0.170
to our questions.
Business administration –0.751** 0.354
Regarding construct validity, the
Other subject –0.385** 0.184 main dependent variable of our re-
Experience as professional software developer (in years) 0.002 0.007 search is the perceived importance of
Constant 1.890*** 0.141 ad hoc reuse of Internet code for de-
Observations 869 velopers’ individual work. While this
Pseudo R² 0.04 variable is a suitable proxy for the ex-
F test F(15, 854)=8.62, p<0.0001 tent to which professional software
σ 1.376 developers practice ad hoc reuse, fu-
* significant at 10%, ** significant at 5%, *** significant at 1%
Notes: Significant coefficients are in bold type; reported standard errors are robust standard errors.
Descriptive statistics and the correlation matrix of the explanatory variables used are available o These findings result from exploratory logistic
from the authors on request. regression analyses and resulting marginal ef-
fects not covered here; full regression tables
are available from the authors.

80 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

ture research might want to take more had deployed policies addressing source. In Proceedings of the Fourth International
Conference on Open Source Systems (Milan, Italy,
direct measures to check the robust- reuse of Internet code in 2009. Con- Sept. 7–10). Springer, Boston, 2008, 197–209.
ness of our findings and conclusions. sequently, a considerable share of 4. Frakes, W.B. and Kang, K. Software reuse research:
Status and future. IEEE Transactions of Software
Moreover, despite our extensive pre- developers—14%–21% of our sample, Engineering 31, 7 (July 2005), 529–536.
test with more than 100 developers, depending on scenario—had at some 5. Garlan, D., Allen, R., and Ockerbloom, J. Architectural
mismatch: Why reuse is still so hard. IEEE Software
it might be possible that some sur- point either not checked thoroughly 26, 4 (July/Aug. 2009), 66–69.
vey participants misunderstood the for license obligations or even know- 6. German, D.M., Di Penta, M., and Davies, J.
Understanding and auditing the licensing of open
meaning of some of our survey ques- ingly ignored them when reusing In- source software distributions. In Proceedings of
tions. ternet code in the past. the 18th IEEE International Conference in Program
Comprehension (Braga, Portugal, June 30–July 2).
Addressing external validity, there Firms must recognize and acknowl- IEEE Press, Los Alamitos, CA, 2010, 84–93.
7. German, D.M., Di Penta, M., Guéhéneuc, Y.-G., and
is still the risk that our survey popula- edge the existence of Internet code in Antoniol, G. Code siblings: Technical and legal
tion of 869 developers active in Inter- their own code bases. Given our find- implications of copying code between applications. In
Proceedings of the Sixth IEEE International Workshop
net newsgroups is not representative ings, they should further consider on Mining Software Repositories (Vancouver, Canada,
of professional developers in general. that some of the Internet code reused May 16–17). IEEE Press, Los Alamitos, CA, 2009,
81–90.
Since this research is among the first to in their software might also violate li- 8. German, D.M. and Gonzalez-Barahona, J.M. An
quantitatively investigate ad hoc reuse cense obligations. empirical study of the reuse of software licensed
under the GNU general public license. In Proceedings
of Internet code by individual develop- Our study offers multiple levers of the Fifth International Conference on Open Source
ers, we deliberately chose developers for firms to mitigate the economic Systems (Skövde, Sweden, June 3–6). Springer,
Boston, 2009, 185–198.
from newsgroups to ensure broad het- and legal risk from ad hoc reuse of 9. German, D.M. and Hassan, A.E. License integration
erogeneity in our sample. Moreover, such code. First, the topic itself must patterns: Dealing with license mismatches in
component-based development. In Proceedings of
the comparison of the demographics be positioned more prominently on the 31st IEEE International Conference on Software
of our sample with that of other recent their agendas. Firms should actively Engineering (Vancouver, Canada, May 16–24). IEEE
Press, Los Alamitos, CA, 2009, 188–198.
studies among professional develop- make developers aware of the poten- 10. Greene, W.H. Econometric Analysis. Prentice Hall,
ers (such as Alexy1) gives us confidence tial license issues resulting from their Upper Saddle River, NJ, 2007.
11. Lerner, J. and Tirole, J. The scope of open source
in the representativeness of our sam- reuse of code. They should leverage licensing. The Journal of Law, Economics, and
Organization 21, 1 (Apr. 2005), 20–56.
ple. Still, it would be worthwhile to re- reliable information resources on the 12. Levi, S.D. and Woodard, A. Open source software: How
peat our study in a more homogeneous Internet, complementing them with to use it and control it in the corporate environment.
Computer & Internet Lawyer 21, 8 (Aug. 2004), 8–13.
single-firm setting. mandatory internal training and oth- 13. Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad,
er practical information. Second, they O.P.N., and Morisio, M. Development with off-the-shelf
components: 10 facts. IEEE Software 26, 2 (Mar.
Conclusion should lobby universities and other 2009), 80–87.
Our analyses of ad hoc reuse of In- educational institutions to include 14. Madanmohan, T.R. and De, R. Open source reuse in
commercial firms. IEEE Software 21, 6 (Nov. 2004),
ternet code in commercial software the topic in their curricula. Third, 62–69.
development suggest its importance they should establish easy-to-under- 15. McGhee, D.D. Free and open source software licenses:
Benefits, risks, and steps toward ensuring compliance.
has increased over time; in 2009 over stand policies providing guidance Intellectual Property & Technology Law Journal 19,
50% of the developers we surveyed as to how to deal with Internet code. 11 (Nov. 2007), 5–9.
16. Morisio, M., Ezran, M., and Tully, C. Success and failure
deemed ad hoc reuse at least “some- Moreover, they need to ensure that factors in software reuse. IEEE Transactions on
what important” for their own work. developers are aware of these poli- Software Engineering 28, 4 (Apr. 2002), 340–357.
17. Murray, G.F. Categorization of open source licenses:
This result differs from the prevailing cies and actually read and understand More than just semantics. Computer & Internet
assumption of many firms that their them. Finally, they need to recognize Lawyer 26, 1 (Jan. 2009), 1–11.
18. Norris, J.S. Mission-critical development with open
code base does not or only to a small the interdisciplinary nature of license source software: Lessons learned. IEEE Software 21,
1 (Jan. 2004), 42–49.
and controlled degree contains Inter- risks from reuse of Internet code re- 19. Rosen, L. Open Source Licensing: Software Freedom
net code.15 lating to developers and engineers, as and Intellectual Property Law. Prentice-Hall,
Englewood Cliffs, NJ, 2004.
Addressing the knowledge of profes- well as to lawyers. 20. Sojer, M. Reusing Open Source Code. Gabler,
sional developers about Internet code They should thus facilitate commu- Wiesbaden, 2010.
21. Sojer, M. and Henkel, J. Code reuse in open source
licenses and their legal obligations, we nication between developers and legal software development: Quantitative evidence, drivers,
found about one-quarter of them had experts such that clearance for spe- and impediments. Journal of the Association for
Information Systems 11, 12 (Dec. 2010), 868–901.
never received any form of training or cific instances of the reuse of Internet 22. Spinellis, D. and Szyperski, C. How is open source
information on the topic. Only a small code can be obtained quickly. Other- affecting software development? IEEE Software 21, 1
(Jan. 2004), 28–33.
fraction had received training or infor- wise, developers would have to choose 23. Umarji, M., Sim, S.E., and Lopes, C. Archetypal
mation from firms or from educational between practicing reuse on their own Internet-scale source code searching. In Open Source
Development, Communities and Quality, B. Russo,
institutions. Moreover, many exist- or abandoning it altogether, an option E. Damiani, S. Hissam, B. Lundell, and G. Succi, Eds.
ing forms of training and information that would ignore a valuable source of Springer, Boston, 2008, 257–263.

were apparently not effective. efficiency and quality gains.


Manuel Sojer (sojer@wi.tum.de) is a management
As a consequence of this lack of consultant at Bain & Company, Munich, Germany. This
useful training and information, References article is part of his Ph.D. work completed in 2010 at
1. Alexy, O. Free Revealing: How Firms Can Profit from Technische Universität München, Munich, Germany.
many developers, at least in 2009, Being Open. Gabler, Wiesbaden, 2009.
2. Chen, W., Li, J., Ma, J., Conradi, R., Ji, J., and Liu, C. An Joachim Henkel (henkel@wi.tum.de) is a professor of
lacked detailed knowledge about empirical study on software development with open technology and innovation management at Technische
their obligations potentially result- source components in the Chinese software industry. Universität München, Munich, Germany.
Software Process Improvement and Practice 13, 1
ing from the reuse of Internet code. (Jan. 2008), 89–100.
Despite this, only a minority of firms 3. Deshpande, A. and Riehle, D. The total growth of open © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 81


contributed articles
doi:10.1145/2043174.2043194
through near-universal adoption. MPI
The goal is reliable parallel simulations, development continues, with MPI-2.2
released in 2009 and MPI 3.0 expected
helping scientists understand nature, from in 2012. The standard is published on
how foams compress to how ribosomes the Web21 and as a book, along with
construct proteins. several other books based on it; see, for
example, Gropp et al.13 and Pacheco.26
Implementations are available in open
By Ganesh Gopalakrishnan, Robert M. Kirby,
source from MPICH222 and Open MPI25
Stephen Siegel, Rajeev Thakur, William Gropp, from software vendors and from every
Ewing Lusk, Bronis R. de Supinski, Martin Schulz, vendor of HPC systems. MPI is widely
and Greg Bronevetsky cited; Google Scholar recently returned
39,600 hits for the term “+MPI +Mes-

Formal
sage Passing Interface.”
MPI is designed to support highly
scalable computing applications us-
ing more than 100,000 cores on, say,

Analysis of
the IBM Blue Gene/P (see Figure 1) and
Cray XT5. Many MPI programs repre-
sent dozens, if not hundreds, of person-
years of development, including cali-

MPI-based
bration for accuracy and performance
tuning. Scientists and engineers world-
wide use MPI in thousands of applica-
tions, including in investigations of al-

Parallel
ternate-energy sources and in weather
simulation. For HPC computing, MPI
is by far the dominant programming
model; most (at some centers, all) ap-

Programs
plications running on supercomputers
use MPI. Many application developers
for exascale systems15 regard support
for MPI as a requirement.
Still, the MPI debugging methods
available to these developers are typi-
cally wasteful and ultimately unreli-
able. Existing MPI testing tools seldom
provide coverage guarantees, examin-

Mos t para lle l c o mputingapplications in high- key insights


performance computing use the Message Passing A ddressing the challenges of distributed
systems, debugging necessitates
Interface (MPI) API. Given the fundamental collaboration between HPC and formal
importance of parallel computing to science and verification.
A long with HPC, distributed computing
engineering research, application correctness is based on communication libraries is
paramount. MPI was originally developed around going mainstream in the commodity
world, two communities that must look
1993 by the MPI Forum, a group of vendors, parallel to learn and benefit from one another.

programming researchers, and computational C atastrophic disruption of programmer


productivity can be avoided through
scientists. However, the document defining the formal verification tools that handle
problems of scale, enhance coverage
standard is not issued by an official standards by avoiding redundant searches, and
decrease false-alarm rates through
organization but has become a de facto standard more precise analysis.

82 commun ications of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Figure 1. The Intrepid Blue Gene/P Open Science machine at Argonne National Laboratory, with 163,840 cores and 557TFlops/sec (peak).

ing essentially equivalent execution have used either message passing or of MPI codes by enabling the architect
sequences, thus reducing testing effi- shared memory for communication. to communicate noncontiguous data
ciency. These methods fare even worse Compared to other message-passing with a single MPI function call. MPI
at large problem scales. Consider the systems noted for their parsimony, also supports a limited form of shared-
costs of HPC bugs. A high-end HPC MPI supports a large number of co- memory communication based on
center costs hundreds of millions of hesively engineered features essential one-sided communication. A majority
dollars to commission, and the ma- for designing large-scale simulations; of MPI programs are still written using
chines become obsolete within six for example, MPI-2.221 specifies more the “two-sided” (message-passing-ori-
years; in many centers, the annual than 300 functions, though most de- ented) constructs we focus on through
electricity bill can run more than $3 velopers use only a few dozen in any the rest of the article. Finally, MPI-IO
million, and research teams apply for given application. addresses portable access to high-per-
computer time through competitive MPI programs consist of one or formance input/output systems.
proposals, spending years planning more threads of execution with private MPI applications and libraries are
their experiments. In addition to these memories (called “MPI processes”) written predominantly in C, C++, and/
costs, one must add the costs to society and communicate through message or Fortran. Languages that use garbage
of relying on potentially defective soft- exchanges. The two most common collection or managed runtimes (such
ware to inform decisions involving is- are point-to-point messages (such as Java and C#) are rarely used in HPC;
Photogra ph court esy of A rgonn e Nation a l L a borato ry

sues of great public importance (such as sends and receives) and collective preexisting libraries, compiler support,
as climate change). operations (such as broadcasts and and memory locality management
Formal methods can play an impor- reductions). MPI also supports non- drive these choices. Memory is a pre-
tant role in debugging and verifying blocking operations that help overlap cious resource in large-scale systems; a
MPI applications. Here, we describe computation and communication rule of thumb is an application cannot
existing techniques, including their and persistent operations that make afford to consume more than one byte
pros and cons, and why they have value repeated sends/receives efficient. In per FLOP. Computer memory is expen-
beyond MPI, addressing the general addition, MPI allows processes and sive and increases cluster energy con-
needs of future concurrency applica- communication spaces to be struc- sumption. Even when developing tra-
tion developers who will inevitably use tured using topologies and communi- ditional shared-memory applications,
low-level concurrency APIs. cators. MPI’s derived datatypes further system architects must work with low
Historically, parallel systems enhance the portability and efficiency amounts of cache-coherent memory

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 83


contributed articles

per core and manage data locality, not easy for HPC developers to down- properties may be verified, including
something done routinely by MPI pro- scale a program to a smaller instance absence of deadlocks, assertion viola-
grammers. Computer scientists are and locate the bug. For these reasons, tions, incompatible data payloads be-
also realizing that future uses of MPI HPC developers need a variety of verifi- tween senders and receivers, and MPI
will be in conjunction with shared- cation methods, each narrowly focused resource leaks. Using a formal model
memory libraries (such as Pthreads7) on subsets of correctness issues and of the MPI semantics, a dynamic veri-
to reduce message-copy proliferation. making specific trade-offs. Our main fier can conclude that if no violations
While some MPI applications are writ- focus here is formal analysis methods occur on the subset of executions,
ten from scratch, many are built atop for smaller-scale MPI applications and then there can be no violation on an
user libraries, including ParMETIS16 semiformal analysis methods for the execution. If even this reduced subset
for parallel hypergraph partitioning, very large scale. For detecting MPI bugs cannot be explored exhaustively, the
ScaLAPACK5 for high-performance lin- in practice, formal analysis tools must developer can specify precise cover-
ear algebra, and PETSc3 for solving par- be coupled with runtime instrumenta- age criteria and obtain a lesser (but
tial differential equations. tion methods found in tools like Um- still quantifiable) degree of assurance.
MPI processes execute in disjoint pire,32 Marmot,19 and MUST,14 though This approach was originally demon-
address spaces, interacting through much more research is needed in tool strated in the VeriSoft10 tool and has
communication commands involv- integration. the advantage of not requiring modi-
ing deterministic, nondeterministic, Dynamic analysis. MPI provides fications to the program source code,
collective, and non-blocking modes. many nondeterministic constructs compiler, or libraries.
Existing (shared-memory concurrent that free the runtime system to choose Full-scale debugging. Traditional
program) debugging techniques do the most efficient way to carry out an “step-by-step” debugging techniques
not directly carry over to MPI, where operation but also mean a program are untenable for traces involving mil-
operations typically match and com- can exhibit multiple behaviors when lions of threads. Later, in an expanded
plete out-of-program order according run on the same input, posing veri- description of full-scale debugging, we
to an MPI-specific matches-before or- fication challenges; an example is a describe a new debugging approach
der.30,33 The overall behavior of an MPI communication race arising from a called Stack Trace Analysis that ana-
program is also heavily influenced by “wildcard” receive, an operation that lyzes an execution trace and partitions
how specific MPI library implementa- does not specify the source process of the threads into equivalence classes
tions take advantage of the latitude the message to be received, leaving the based on their behavior. Experience
provided by the MPI standard. decision to the runtime system. Many on real large-scale systems shows that
An MPI program bug is often intro- subtle program defects are revealed only a small number of classes typically
duced when modeling the problem only for a specific sequence of choices. emerge, and the information provided
and approximating the numerical Though random testing might happen can help a developer isolate defects.
methods or while coding, including on one such sequence, it is hardly a re- While this approach is not comparable
whole classes of floating-point chal- liable approach. to the others covered here, in that the
lenges.11 While lower-level bugs (such In contrast, dynamic verification focus is on the analysis of one trace
as deadlocks and data races) are seri- approaches control the exact choices rather than reasoning about all execu-
ous concerns, detecting them requires made by the MPI runtime, using this tions, it provides a clear advantage in
specialized techniques of the kind control to methodically explore a care- terms of scalability.
described here. Since many MPI pro- fully constructed subset of behaviors. Symbolic analysis. The techniques
grams are poorly parameterized, it is For each such behavior, a number of discussed earlier are only as good as
the set of inputs chosen during analy-
sis. Defects revealed for very specific
input or parameter values may be dif-
ficult to discover with these techniques
alone. Symbolic execution18 is a well-
known technique for identifying de-
fects, described later in the section on
Photogra ph court esy of A rgonn e Nation a l L a borato ry

symbolic analysis of MPI, including


how it is applied to MPI programs. The
TASS toolkit27 uses symbolic execution
and state-enumeration techniques to
verify properties of MPI programs, not
only for all possible behaviors of the
runtime system, but for all possible in-
puts as well. It can even be used to es-
tablish that two versions of a program
are functionally equivalent, at least
within specified bounds. On the other
Component of Blue Gene/P supercomputer at Argonne National Laboratory, Argonne, IL. hand, implementing the symbolic ex-

84 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

ecution technique requires sophisti- severely lacking in terms of ensuring as outlined in Figure 2, ISP can reorder,
cated theorem-proving technology and coverage goals. To address this limita- at runtime, MPI calls issued by the pro-
a symbolic interpreter for all program tion, some of the authors have built a gram. In the example, ISP’s scheduler
constructs and library functions; for tool called Distributed Analyzer of MPI, intercepts all MPI calls coming to it in
this reason, TASS supports only C and or DAMPI,34 which uses a distributed program order and dynamically reor-
a subset of MPI. Moreover, it generally scheduler while still ensuring nonde- ders the calls going into the MPI run-
does not scale beyond a relatively small terminism coverage. DAMPI scales de- time (ISP’s scheduler sends Barri-
number of processes, though, as we monstrably far more than ISP. ers first, an order allowed by the MPI
show, defects that usually appear only Dynamic verification using ISP. For semantics), at which point it discovers
in large configurations can often be de- programs with nondeterministic MPI the nondeterminism.
tected in much smaller configurations calls, simply modulating the absolute When ISP determines two matches
through symbolic execution. times at which MPI calls are issued could occur, it re-executes (replays
Static analysis. Compilers use static (such as by inserting nondeterminis- from the beginning) the program in
analyses to verify a variety of simple tic sleep durations, as performed by Figure 3 twice, once with the Isend
safety properties of sequential pro- stress-testing tools) is ineffective be- from P0 matching the receive, the sec-
grams, working on a formal structure cause most often it does not alter the ond Isend from P2 matching it. To
that abstractly represents some aspect way racing MPI sends match with ensure these matches occur, ISP dy-
of the program (such as a control-flow MPI nondeterministic receives namically rewrites Irecv(from:*)
graph, or CFG). Extending these tech- deep inside the MPI runtime. Also, into Irecv(from:0) and into
niques to verify concurrency proper- such delays slow the entire testing pro- Irecv(from:2) in these replays. If
ties of MPI programs (such as deadlock cess unnecessarily. the algorithm does not do this but in-
freedom) requires new abstractions ISP’s active testing approach (see stead issues Irecv(from:*) into the
and techniques. Later, in the section Figure 3) means if P2’s MPI _ Isend MPI runtime, coverage of both process
on static analysis of MPI, we outline a can match P1’s MPI _ Irecv, the test sends is no longer guaranteed. ISP
new analysis framework targeting this encounters a bug. But can such a match discovers the maximal extent of non-
problem that introduces the notion of occur? Yes, and here’s how; first, let P0 determinism through dynamic MPI
a parallel CFG. The framework has the issue its non-blocking MPI _ Isend call reordering and achieves schedul-
advantage that the pCFG is indepen- call and P1 its non-blocking MPI _ ing control of relevant interleavings
dent of the number of processes, es- Irecv call; then allow the execution through dynamic API call rewriting.
sentially making it infinitely scalable. to cross the MPI _ Barrier calls; af- While pursuing relevant interleavings,
However, because automating these ter that, P2 can issue its MPI _ Isend. ISP additionally detects three basic
analyses is so difficult they may require The MPI runtime then faces a nonde- error conditions: deadlocks, resource
user-provided program annotation to terministic choice of matching either leaks (such as MPI object leaks), and
guide them. MPI _ Isend. The system achieves this violations of C assertions in the code.
particular execution sequence only if Developers should bear in mind
Dynamic Verification of MPI the MPI _ Barrier calls are allowed that MPI programmers often use non-
Here, we explore two dynamic analysis to match before the MPI _ Irecv blocking MPI calls to enhance com-
approaches: The first, implemented by matches. Existing MPI testing tools putation/communication overlap and
the tool ISP31,35 (see Figure 2), delivers a cannot exert such fine control over MPI nondeterministic MPI calls in master/
formal coverage guarantee with respect executions. By interposing a scheduler, worker patterns to detect which MPI
to deadlocks and local safety asser-
tions30; ISP has been demonstrated on Figure 2. Overview of ISP.
MPI applications of up to 15,000 lines
of code. Running on modern laptop
computers, ISP can verify such appli-
cations for up to 32 MPI processes on MPI Program
Executable
mostly deterministic MPI programs.
ISP’s scheduler, as outlined in the Run Proc1 Scheduler
figure, exerts centralized control over Proc2
every MPI action. It limits ISP scal- ……
Interposition Procn
ability to at most a few dozen MPI pro- Layer
cesses and does not help programmers
encountering difficulty at higher ends
of the scale where user applications
and library codes often use different
algorithms. What if a designer has op-
MPI Runtime
timized an HPC computation to work
efficiently on 1,000 processors and
suddenly finds an inexplicable bug?
Traditional HPC debugging support is

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 85


contributed articles

process finishes first, so more work formal verification of MPI available approach where supercomputing pow-
can be assigned to it. When these op- seamlessly within a popular integrated er aids verification, an idea the authors
erations, together with “collective” development environment. GEM also implemented in their tool framework
operations (such as Barriers), are all serves as a formal-methods supple- DAMPI34 (see Figure 4).
employed in the same example, a de- ment to a popular MPI textbook26 by The key insight that allowed them
veloper can obtain situations like the providing chapter examples as readily to design the decentralized schedul-
one in Figure 3. The safety net provided available MPI C projects. ing algorithm of DAMPI is that a non-
by ISP and other such tools is therefore Dynamic verification using DAMPI. A deterministic operation, as in MPI _
essential for efficiency-oriented MPI widely used complexity-reduction ap- Irecv(MPI _ ANY _ SOURCE) and
programming. proach is to debug a given program af- MPI _ Iprobe(MPI _ ANY _ SOURCE),
ISP guarantees MPI communica- ter first suitably downscaling it. Howev- represents a point on the timeline of
tion nondeterminism coverage under er, a practical difficulty in carrying out the issuing process when the opera-
the given test harness and helps avoid such debugging is that many programs tion commits to a match decision. It
exponential interleaving explosion pri- are poorly parameterized. For them, if is natural for an HPC programmer to
marily by avoiding redundantly exam- a problem parameter is reduced, it is view each such event as starting an
ining equivalent behaviors (such as by often unclear whether another param- epoch, an interval stretching from the
not examining the n! different orders eter should be reduced proportion- current nondeterministic event up to
in which an MPI barrier call might be ally, logarithmically, or through some (but not including) the next nonde-
invoked); testing tools typically fall vic- other relationship. A more serious dif- terministic event. All deterministic re-
tim to this explosion. ISP also includes ficulty is that some bugs are manifest ceives can be assigned the same epoch
execution-space sampling options. only when a problem is run at scale. as the one in which they occur. Even
ISP has examined many large MPI The algorithms employed by applica- though the epoch is defined by a non-
programs, including those making mil- tions and/or the MPI library itself can deterministic receive matching an-
lions of MPI calls. Some of the authors change depending on problem scale. other process’s send, how can the tool
have also built the Graphical Explorer Also, resource bugs (such as buffer determine all other sends that match
of Message passing (GEM) tool,12 which overflows) often show up only at scale. it? The solution is to pick all the sends
hosts the ISP verification engine. GEM While user-level dynamic verifica- that are not causally after the nondeter-
is an official component of the Eclipse tion supported by ISP resolves sig- ministic receive (and subject to MPI’s
Parallel Tools Platform, or PTP,9 (PTP nificant nondeterminism, testing at “non-overtaking” rules). DAMPI de-
version 3.0 onward), making dynamic larger scales requires a decentralized termines these sends through an MPI-
specific version of Lamport clocks,20
Figure 3. Bug manifests on some runtimes. striking a good compromise between
scalability and omissions.
Experimental results show DAMPI
P0 P1 P2
effectively tests realistic problems run-
Isend(to : 1, 22); Irecv(from : *, x) Barrier;
ning on more than 1,000 CPUs by ex-
Barrier; Barrier; Isend(to : 1, 33);
ploiting the parallelism and memory
if (x == 33)bug;
capacity of clusters. It has examined
all benchmarks from the Fortran NAS
Parallel Benchmark suite,24 with in-
strumentation overhead less than 10%
Figure 4. Distributed MPI analyzer. compared to ordinary testing, but able
to provide nondeterminism coverage
not provided by ordinary testing.
Potential Recent experiments by some of the
Matches
authors found a surprising fact: None
MPI Program
Executable of the MPI programs in the NAS Par-
allel Benchmarks employing MPI _
Run Proc1 Rerun
Proc2 Epoch Schedule Irecv(MPI _ ANY _ SOURCE) calls
Decisions Generator actually exhibit nondeterminism un-
……
DAMPI Procn der DAMPI. This means these bench-
modules
marks were “determinized,” perhaps
through additional MPI call arguments
and is further confirmation of the value
of dynamic analysis in providing pre-
cise answers.
Native MPI
Full-Scale Debugging
The approach described here tar-
gets the large-scale systems that will

86 commun ications of t h e acm | dec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

emerge over the next few years; cur- sampled over time in a low overhead time, including an Algebraic Multigrid
rent estimates anticipate half a bil- and distributed manner. It then merg- (AMG) package, which is fundamental
lion to four billion threads in exas- es these stack traces to identify which for many HPC application codes.
cale systems. With such concurrency, processes are executing similar code. Tools like STAT also detect outliers
developers of verification tools must The tool considers a variety of equiva- that can directly point to erroneous
target debugging techniques able to lence relations; for example, for any behavior without further debugging;
handle these scales, as bugs are often n ≥ 1, it considers two processes as for example, the STAT tool was used on
not manifest until a program is run at equivalent if they agree on the first n the CCSM code when it hung on more
its largest scale. Bugs often depend on function calls issued. Increasing n re- than 4,096 processes. The stack trace
input, which can differ significantly fines this equivalence relation, giving tree showed one task executing in an
across full-scale runs. Furthermore, the developer control of the precision- abnormally deep stack, and, on closer
certain types of errors (such as integer accuracy trade-off. examination of the stack trace, not
overflows) often depend directly on the The resulting tree readily identifies only that a mutex lock operation with-
number of processors. different execution behaviors. For ex- in the MPI implementation was called
However, most debugging tech- ample, Figure 5 shows the top levels multiple times, creating the deadlock,
niques do not translate well to full- of the tree obtained from a run of the but also exactly where in the code the
scale runs. The traditional paradigm of Community Climate System Model respective erroneous mutex lock call
stepping through code has significant (CCSM), an application that uses five occurred. This led to a quick fix of the
performance limitations with large separate modules to model land (CSM), MPI implementation.
processor counts, as well as being im- ice, ocean (POP), and atmosphere The STAT developer group’s efforts
practical with thousands of processes (CAM) and couple the four models. In now include extensions that provide
or threads, let alone billions. Dynam- it, the developer can quickly identify better identification of the behavior
ic-verification techniques offer para- that MPI processes 24–39 are execut- equivalence classes, as well as tech-
digmatic scaling but have even more ing the land model, 8–23 the ice model, niques to discern relationships among
performance limitations, particularly 40–135 the ocean model, and 136–471 the classes.1 Additional directions
when the number of interleavings de- the atmospheric model, while 0–7 are include using the classes to guide dy-
pends on process count. executing the coupler. If a problem namic verification techniques.
Faced with scaling requirements, should be observed in one of them, the
HPC developers require new tech- developer can then concentrate on this Symbolic Analysis of MPI
niques to limit the scope of their de- subset of tasks; in the case of a broader The basic idea of symbolic execution
bugging efforts. Some of the authors error, the developer can pick represen- is to execute the program using sym-
developed mechanisms for identifying tatives from the five classes, thereby re- bolic expressions in place of the usual
behavioral-equivalence classes based ducing the initial debugging problem (concrete) values held by program vari-
on the observation that when errors to five processes. The STAT tool has ables.18 The inputs and initial values
occur in large-scale programs, they do been used to debug several codes with of the program are symbolic constants
not exhibit thousands or millions of significantly shortened turnaround X0;X1,…, so-called because they repre-
different behaviors. Rather, they exhib-
it a limited set of behaviors in which all Figure 5. STAT process equivalence classes.
processes follow the same erroneous
path (a single common behavior) along
_start
which one or a few processes follow an
erroneous path that can then lead to (0–471)
changes in the behavior of a few relat-
ed processes (two or three behaviors). _libc_start_main
While the effect may trickle further
out, developers rarely observe more (0–471)
than a half-dozen behaviors, regard-
main
less of the total number of processes in
an MPI program.
Given the limited behaviors that (24–39) (8–23) (40–135) (0–7) (136–471)
are exhibited, developers can then fo-
program_csm icemodel pop cpl cam
cus on only debugging representative
processes from each behavioral class, (24–39) (8–23) (40–135) (136–471)
rather than all processes at once, there-
by enabling the debugging of problems driver ice_co… step_mo… … stepon

previously not debuggable.


(24–39)
The Stack Trace Analysis Tool
(STAT)2 achieves this debugging goal clm_cs… … … …
by attaching to all processes in a large-
scale job and gathering stack traces

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 87


contributed articles

sent values that do not change during is governed by condition u+v>0. Since made at branch points. This variable is
execution. Numerical operations are the values are symbolic, it is not nec- initialized to true. At a branch, a non-
replaced by operations on symbolic essarily possible to say whether the deterministic choice is made between
expressions; for example, if program condition evaluates to true or false; the two branches, and pc is updated
variables u and v hold values X0 and X1, both possibilities must be explored. accordingly. To execute the branch
respectively, then u+v will evaluate to Symbolic execution handles this prob- on u+v > 0, pc would be assigned the
the symbolic expression X0 + X1. lem by introducing a hidden Boolean- symbolic value pc ∧ u+v > 0 if the true
The situation is more complicated valued symbolic variable, the path branch is selected; if this is the first
at a branch point. Suppose a branch condition pc, to record the choices branch encountered, pc will now hold
the symbolic expression X0 + X1 > 0. If
Figure 6. Programs that read an array from a file, sum positive elements, and output the the false branch is chosen instead, pc
result.
will hold X0 + X1 ≤ 0. Hence the path
condition records the condition the in-
for (i=0; i<n; i++) a[i] = read element i; puts must satisfy for a particular path
sum = 0.0;
for (i=0; i<n; i++)
to be followed. Model-checking tech-
if (a[i]>0.0) sum += a[i]; niques can then be used to explore all
output sum; nondeterministic choices and verify
(a) adder_seq: sequential version a property holds on all executions17 or
generate a test set. An automated theo-
int first = n*rank/nprocs; rem prover (such as CVC34) can be used
int count = n*(rank+1)/nprocs - first; to determine if pc becomes unsatisfi-
for (i=0; i<count; i++) a[i]=read element first+i; able, in which case the current path is
sum = 0.0;
for (i=0; i<count; i++)
infeasible and pruned from the search.
if (a[i]>0.0) sum += a[i]; One advantage of symbolic tech-
if (rank == 0) { niques is they map naturally to mes-
for (j=1; j<nprocs; j++) {
sage-passing-based parallel programs.
recv into buffer from rank j;
sum += buffer; The Verified Software Lab’s Toolkit for
} Accurate Scientific Software (TASS),27
output sum; based on CVC3, uses symbolic execu-
} else { send sum to rank 0; }
tion and state-exploration techniques
(b) adder_par: parallel version to verify properties of such programs.
The TASS verifier takes as input the
MPI/C source program and a specified
Figure 7. Excerpts from MPICH2 broadcast code; the fault occurs when the highlighted number of processes and instantiates
expression is negative.
a symbolic model of the program with
that process count. TASS maintains a
relative_rank = (rank >= root ?
model of the state of the MPI imple-
rank - root : rank - root + comm_size); mentation, including that of the mes-
nbytes = type_size * count; sage buffers. Like all other program
scatter_size =
variables, the buffered message data
(nbytes + comm_size - 1)/comm_size;
mask = 0x1; i = 0; is represented as symbolic expres-
while (mask < comm_size) { sions. The TASS user may also specify
relative_dst = relative_rank ^ mask; bounds on input variables in order to
dst_tree_root = relative_dst >> i;
dst_tree_root <<= i;
make the model finite or sufficiently
recv_offset = dst_tree_root * scatter_size; small. An MPI-specific partial-order-
if (relative_dst < comm_size) reduction scheme restricts the set of
{ ... MPIC_Sendrecv(...,
states explored while still guaranteeing
nbytes-recv_offset, ...); ... }
mask <<= 1; i++; that if a counterexample to one of the
} properties exists (within the specified
bounds), a violation is reported. Ex-
(a) MPIR_Bcast_scatter_doubling_allgather
amples are included in the TASS distri-
bution, including where TASS reveals
else { /* (nbytes >= MPIR_BCAST_SHORT_MSG) defects in the MPI code (such as a dif-
&& (comm_size >= MPIR_BCAST_MIN_PROCS) */
fusion simulation code from the Func-
if ((nbytes < MPIR_BCAST_LONG_MSG) &&
(MPIU_is_pof2(comm_size, NULL))) { tional Equivalence Verification Suite at
MPIR_Bcast_scatter_doubling_allgather http://vsl.cis.udel.edu/fevs/).
TASS can verify the standard safety
(b) invocation context
properties, but its most important
feature is the ability to verify that two
programs are functionally equivalent;

88 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

that is, if given the same input, they ing traditional debuggers scale to thou-
always return the same output. This is sands of processes for just this reason.
especially useful in scientific comput- However, it would be more practical to
ing where developers often begin with force the same defect to manifest itself
a simple sequential version of an al-
gorithm, then gradually add optimiza- We propose at smaller scales and then isolate the
defect at those scales.
tions and parallelism. The production a continuum A real-life example illustrates this

of tools based
code is typically much more complex point: In 2008, a user reported a failure
but intended to be functionally equiva- in the MPICH2 MPI implementation
lent of the original. The symbolic tech-
nique used to compare two programs
on static analysis, when calling the broadcast function
MPI _ Bcast, which used 256 processes
for functional equivalence is known as dynamic analysis, and a message of just over count =
“comparative symbolic execution.”28
To illustrate the comparative sym-
symbolic analysis, 3,200 integers. Investigation revealed
the defect was in a function used to
bolic technique, see Figure 6, where and full-scale implement broadcasts in specific
the sequential program reads n float-
ing-point numbers from a file, sums debugging, situations (see Figure 7a). For certain
inputs, the “size” argument (nbytes-
the positive elements, and returns complemented recv _ offset) to an MPI point-to-
the result. A parallel version divides
the file into approximately equal-size by more traditional point operation—an argument that
should always be nonnegative—could
blocks. Each process reads one block
into a local array and sums the positive
error-checking in fact be negative. For 256 processes
and integer data (type _ size = 4),
elements in its block. On all processes tools. this fault occurs if and only if 3,201 ≤
other than process 0, this partial sum count ≤ 3,251.
is sent to process 0, which receives the The problematic function is guard-
numbers, adds them to its partial sum, ed by the code in Figure 7b, referring
and outputs the final result. to three compile-time constants—
Ignoring round-off error, the two M PI R _ B C A S T _ S H O R T _ M S G,
programs are functionally equivalent; MPIR _ BCAST _ LONG _ MSG, and
given the same file, they output the MPIR _ BCAST _ MIN _ PROCS—de-
same result. To see how the compara- fined elsewhere as 12,288, 524,288, and
tive symbolic technique establishes 8, respectively. Essentially, the function
equivalence, consider the case n = is called for “medium-size” messages
nprocs = 2 and call the elements of only when the number of processes is a
the file X0 and X1. There are four paths power of 2 and above a certain threshold.
through the sequential program, due With these settings, the smallest con-
to the two binary branches if a[i]>0.0. figuration that would reveal the defect is
One of these paths, arising when both 128 processes, with count = 3,073.
elements are positive, yields the path A symbolic execution technique
condition X0 > 0 ∧ X1 > 0 and output X0 that checks that the “size” arguments
+ X1. The comparative technique now to MPI functions are always non-nega-
explores all possible executions of ad- tive would readily detect the defect. If
der _ par in which the initial path the tool also treats the three compile-
condition is X0 > 0 ∧ X1 > 0; there are time constants as symbolic constants,
many such executions due to the vari- the defect can be manifest at the
ous ways the statements from the two much smaller configuration of eight
processes can be interleaved. In each, processes and count = 1 (in which
the output is X0 + X1. A similar fact can case nbytes-recv _ offset = −1).
be established for the other three paths Such an approach would likely have
through the sequential program. Tak- detected this defect earlier and with
en together, these facts imply the pro- much less effort.
grams will produce the same result on Arithmetic. In our analysis of the
any input (for n = nprocs = 2). adder example, we interpreted the
The ability to uncover defects at values manipulated by the program
small scales is an important advantage as the mathematical real numbers
of symbolic approaches. Isolating and and the numerical operations as the
repairing a defect that manifests only (infinite precision) real operations. If
in tests with thousands of processes instead these values are interpreted
and huge inputs is difficult. Several re- as (finite-precision) floating-point val-
search projects have focused on mak- ues and operations, the two programs

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 89


contributed articles

are not functionally equivalent, since is inflexible and non-scalable.


floating-point addition is not associa- Over the past few years, we have
tive.11 Which is right? The answer is it developed a novel compiler-analysis
depends on what the user is trying to framework that extends traditional
verify. For functional equivalence, the dataflow analyses to MPI applications,
specification and implementation are The shortage of extracting the application’s commu-
rarely expected to be “bit-level” equiv-
alent (recall the adder example), so
formal-methods nication topology and matching the
send and receive operations that may
real equivalence is probably more use- researchers communicate at runtime.6 The frame-
ful for the task. TASS uses a number of
techniques specialized for real arith-
interested in HPC work requires no runtime bound on
number of processes and is formulat-
metic, as when all real-valued expres- problems is perhaps ed as a dataflow analysis over the Car-
sions are put into a canonical form that
is the quotient of two polynomials to the result of the tesian product of control flow graphs
(CFGs) from all processes we refer to
facilitate the matching of expressions.
For other tasks (such as detecting the
severe historical as a parallel CFG, or pCFG. The analy-
sis procedure symbolically represents
defect in Figure 7), bit-level reasoning disconnect between the execution of multiple sets of pro-
is more appropriate. Klee8 is another
symbolic execution tool for (sequen-
“traditional cesses, keeping track of any send and
receive. Process sets are represented
tial) C programs that uses bit-precise computer through abstractions (such as lower
reasoning. There is no reason why
these techniques could not be extend-
scientists” and and upper bounds on process ranks)
and predicates (such as “ranks divisible
ed to parallel MPI-based programs. HPC researchers. by 4”). Sends and receives are periodi-
cally matched to each other, establish-
Static Analysis of MPI ing the application’s communication
In the sequential arena, compiler tech- topology. Tool users can instantiate the
niques have been successful at analyz- analysis framework through a variety
ing programs and transforming them of “client analyses” that leverage the
to improve performance. However, an- communication-structure information
alyzing MPI applications is difficult for derived by the framework to propagate
four main reasons: the number of MPI dataflow information, as they do with
processes is both unknown at compile sequential applications. Analyses and
time and unbounded; since MPI pro- transformations include optimiza-
cesses are identified by numeric ranks, tions, error detection and verification,
applications use complex arithmetic and information-flow detection.
expressions to define the processes in- Finally, since topological infor-
volved in communications; the mean- mation is key to a variety of compiler
ing of ranks depends closely on the transformations and optimizations,
MPI communicators used by the MPI our ongoing work focuses on source-
calls; and MPI provides several nonde- code annotations that can be used
terministic primitives (such as MPI _ to describe a given MPI application’s
ANY _ SOURCE and MPI _ Waitsome). communication topology and other
While some prior research (such as properties. The techniques will then
Strout et al.29) explored analysis of MPI exploit this information to implement
applications, none successfully ad- novel scalable analyses and transfor-
dressed this challenge. mations to enable valuable optimiza-
Some approaches treat MPI appli- tions in complex applications.
cations as sequential codes, making it
possible to determine simple applica- Conclusion
tion behaviors (such as the relation- This article’s main objective is to high-
ship between writing to a buffer and light the fact that both formal and
sending the buffer). However, these ap- semi-formal methods are crucial for
proaches cannot represent or analyze ensuring the reliability of message-
the application’s communication topol- passing programs across the vast scale
ogy. Other techniques require knowl- of application sizes. Unfortunately,
edge of the number of processes to be discussion of these techniques and
used at runtime, analyzing one copy approaches is rare in the literature.
of the application for each process. To address this lacuna, we presented
While this analysis can capture the the perspectives of academic research-
application’s full parallel structure, it ers, as well as HPC researchers in

90 commun ications of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


contributed articles

U.S. national laboratories, engaged 2. Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Software. Verified Software Laboratory, University of
Miller, B.P., and Schulz, M. Stack trace analysis for Delaware, 2010; http://vsl.cis.udel.edu/tass
in cutting-edge HPC deployment. We large-scale debugging. In Proceedings of the IEEE 28. Siegel, S.F., Mironova, A., Avrunin, G.S., and Clarke, L.A.
propose a continuum of tools based International Parallel & Distributed Processing Combining symbolic execution with model checking to
Symposium (Long Beach, CA, Mar. 26–30). IEEE verify parallel numerical programs. ACM Transactions
on static analysis, dynamic analysis, Computer Society, 2007, 1–10. on Software Engineering and Methodology 17, 2 (Apr.
symbolic analysis, and full-scale de- 3. Balay, S., Gropp, W.D., McInnes, L.C., and Smith, B.F. 2008), 1–34.
Efficient management of parallelism in object-oriented 29. Strout, M.M., Kreaseck, B., and Hovland, P.D. Data-flow
bugging, complemented by more tra- numerical software libraries. In Modern Software analysis for MPI programs. In Proceedings of the
ditional error-checking tools. Tools in Scientific Computing, E. Arge, A.M. Bruaset, 2006 International Conference on Parallel Processing
and H.P. Langtangen, Eds. Birkhauser Press, 1997, (Columbus, OH, Aug. 14–18). IEEE Computer Society,
Unfortunately, we only barely 163–202. 2006, 175–184.
30. Vakkalanka, S. Efficient Dynamic Verification
scratched the surface of a vast problem 4. Barrett, C. and Tinelli, C. CVC3. In Proceedings of
Algorithms for MPI Applications. Ph.D. dissertation,
the 19th International Conference on Computer
area. The shortage of formal-methods Aided Verification, Vol. 4590 LNCS (Berlin, July 3–7). University of Utah, 2010; http://www.cs.utah.edu/fv
31. Vakkalanka, S., Vo, A., Gopalakrishnan, G., and Kirby,
researchers interested in HPC prob- Springer, Berlin, 2007, 298–302.
R. M. Reduced Execution Semantics of MPI: From
5. Blackford, L. Scalapack User’s Guide. Society for
lems is perhaps the result of the se- Industrial and Applied Mathematics. Philadelphia, PA, Theory to Practice. In Proceedings of Formal Methods,
Second World Congress Lecture Notes in Computer
vere historical disconnect between 1997.
Science 5850 (Eindhoven, The Netherlands, Nov. 2–6).
6. Bronevetsky, G. Communication-sensitive static
“traditional computer scientists” and dataflow for parallel message passing applications. In Springer 2009. 724–740.
Proceedings of the International Symposium on Code 32. Vetter, J.S. and de Supinski, B.R. Dynamic software
HPC researchers. This is especially un- testing of MPI applications with Umpire. In
Generation and Optimization (Seattle, Mar. 22–25,
fortunate considering the disruptive 2009), 1–12. Proceedings of the 2000 ACM/IEEE Conference on
7. Butenhof, D.R. Programming with POSIX Threads. Supercomputing (Dallas, Nov. 4–10). IEEE Computer
technologies on the horizon, including Addison-Wesley, Boston, 2006. Society Press, 2000.
many hybrid concurrency models to 8. Cadar, C., Dunbar, D., and Engler, D. KLEE: Unassisted
33. Vo, A., Gopalakrishnan, G., Kirby, R.M., de Supinski,
B.R., Schulz, M., and Bronevetsky, G. Large-scale
program many-core systems. There are and automatic generation of high-coverage tests for
verification of MPI programs using Lamport clocks
complex systems programs. In Proceedings of the
also emerging message-passing-based with lazy updates. In Proceedings of the 20th
Eighth USENIX Symposium on Operating Systems
International Conference on Parallel Architectures
standards for embedded multicores Design and Implementation (San Diego, Dec. 7–10).
and Compilation Techniques (Galveston, TX, Oct.
USENIX Association, 2008, 209–224.
(such as MCAPI23), with designs and 10–14). IEEE Computer Society Press, 2011. 329–338.
9. Eclipse Foundation, Inc. Parallel Tools Platform.
34. Vo, A., Aananthakrishnan, S., Gopalakrishnan, G.,
tool support that would benefit from Ottawa, Ontario, Canada; http://www.eclipse.org/ptp
de Supinski, B.R., Schulz, M., and Bronevetsky, G. A
10. Godefroid, P. Model checking for programming
scalable and distributed dynamic formal verifier for
lessons learned in the MPI arena. languages using Verisoft. In Proceedings of the 24th
MPI programs. In Proceedings of the ACM/IEEE
ACM SIGPLAN-SIGACT Symposium on Principles
We propose two approaches to ac- of Programming Languages (Paris, Jan. 15–17). ACM
Conference on Supercomputing (New Orleans, Nov.
13–19). IEEE Computer Society Press, 2010.
celerate use of formal methods in HPC: Press, New York, 1997, 174–186.
35. Vo, A., Vakkalanka, S., DeLisi, M., Gopalakrishnan,
11. Goldberg, D. What every computer scientist should
First and foremost, researchers in for- know about floating-point arithmetic. ACM Computing
G., Kirby, R.M., and Thakur, R. Formal verification of
practical MPI programs. In Proceedings of the 14th
mal methods must develop verifica- Surveys 23, 1 (Mar. 1991), 5–48.
ACM SIGPLAN Symposium on Principles and Practice
12. Graphical Explorer of MPI Programs. ISP Eclipse
tion techniques that are applicable to plug-in; University of Utah, School of Computing;
of Parallel Programming (Raleigh, NC, Feb. 14–18).
ACM Press, New York, 2009, 261–269.
programs employing established APIs. http://www.cs.utah.edu/formal_verification/GEM
13. Gropp, W., Lusk, E., and Thakur, R. Using MPI-2:
This would help sway today’s HPC Portable Parallel Programming with the Message-
Ganesh Gopalakrishnan (ganesh@cs.utah.edu) is a
practitioners toward being true believ- Passing Interface. MIT Press, Cambridge, MA, 1999.
professor in the School of Computing at the University of
14. Hilbrich, T., Schulz, M., de Supinski, B., and Müller, M.S.
ers and eventually promoters of formal MUST: A scalable approach to runtime error detection
Utah, Salt Lake City, UT, where he is director of the Center
for Parallel Computing at Utah.
methods. Moreover, funding agen- in MPI programs. In Tools for High Performance
Computing. Springer, Berlin, 2009, 53-66. Robert M. (Mike) Kirby (kirby@cs.utah.edu) is an
cies must begin tempering the hoopla 15. International Exascale Software Project; http://www. associate professor in the School of Computing and
around performance goals (such as exascale.org/iesp/Main_Page Scientific Computing and Imaging Institute at the
16. Karypis Lab. ParMETIS: Parallel Graph Partitioning University of Utah, Salt Lake City, UT.
“ExaFLOPs in this decade”) by also and Fill-Reducing Matrix Ordering. Minneapolis, MN;
http://glaros.dtc.umn.edu/gkhome/metis/parmetis/ Stephen F. Siegel (siegel@udel.edu) is an assistant
setting formal correctness goals that professor in the departments of Computer & Information
overview
lend essential credibility to the HPC 17. Khurshid, S., Păsăreanu, C.S., and Visser, W. Sciences and Mathematical Sciences at the University of
Generalized symbolic execution for model checking Delaware, Newark, DE.
applications on which science and en- and testing. In Proceedings of the Ninth International Rajeev Thakur (thakur@mcs.anl.gov) is a computer
gineering depend. Conference on Tools and Algorithms for the scientist in the Mathematics and Computer Science
Construction and Analysis of Systems, Vol. 2619 LNCS, Division of Argonne National Laboratory, Argonne, IL.
H. Garavel and J. Hatcliff, Eds. (Warsaw, Apr. 7–11).
Acknowledgments Springer, 2003, 553–568. William Gropp (wgropp@illinois.edu) is the Paul and
18. King, J.C. Symbolic execution and program testing. Cynthia Saylor Professor of Computer Science at the
This work is supported in part by Mi- Commun. ACM 19, 7 (July 1976), 385–394. University of Illinois in Urbana-Champaign, and a fellow of
crosoft, National Science Foundation 19. Krammer, B., Bidmon, K., Müjller, M.S., and Resch, the ACM, IEEE, and SIAM and a member of the National
M.M. MARMOT: An MPI analysis and checking tool. Academy of Engineering.
grants CNS-0509379, CCF-0811429, In Proceedings of the Parallel Computing Conference
CCF-0903408, CCF-0953210, and (Dresden, Sept. 2-5, 2003), 493–500. Ewing Lusk (lusk@mcs.anl.gov) is an Argonne
20. Lamport, L. Time, clocks, and the ordering of events in Distinguished Fellow in the Mathematics and Computer
CCF-0733035, and Department of En- a distributed system. Commun. ACM 21, 7 (July 1978), Science Division at Argonne National Laboratory, Argonne,
IL.
ergy grant ASCR DE-AC0206CH11357. 558–565.
21. Message Passing Interface Forum. MPI: A Message- Bronis R. de Supinski (bronis@llnl.gov) is the principal
Part of this work was performed under Passing Interface Standard, Version 2.2, Sept. 4, 2009; investigator and leader of the Exascale Computing
the auspices of the U.S. Department http://www.mpi-forum.org/docs/ Technlogies project and co-leader of the Advanced
22. MPICH2: High performance and widely portable MPI; Simulation and Computing program’s Application
of Energy by Lawrence Livermore Na- http://www.mcs.anl.gov/mpi/mpich Development Environment and Performance Team at
tional Laboratory under contract DE- 23. Multicore Association. Multicore Communications Lawrence Livermore National Laboratory, Livermore, CA.
API, El Dorado Hills, CA; http://www.multicore-
AC52-07NA27344. association.org Martin Schulz (schulzm@llnl.gov) is a computer scientist
24. NASA Advanced Supercomputing Division. Parallel at Lawrence Livermore National Laboratory, Livermore,
Benchmarks; http://www.nas.nasa.gov/Resources/ CA.
References Software/npb.html
1. Ahn, D.H., de Supinski, B.R., Laguna, I., Lee, G.L., Liblit, 25. Open MPI: Open Source High Performance MPI. Greg Bronevetsky (bronevetsky@llnl.gov) is a computer
B., Miller, B.P., and Schulz, M. Scalable temporal order Indiana University, Bloomington, IN; http://www.open- scientist at the Lawrence Livermore National Laboratory,
analysis for large-scale debugging. In Proceedings mpi.org/ Livermore, CA, and a Department of Energy Early Career
of the ACM/IEEE Conference on Supercomputing 26. Pacheco, P. Parallel Programming with MPI. Morgan Investigator.
(Portland, OR, Nov. 14–20). ACM Press, New York, Kaufmann, San Francisco, 1996.
2009. 27. Siegel, S.F. et al. The Toolkit for Accurate Scientific © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 91


review articles

credit t k

92 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles
doi:10.1145/2043174 . 2 0 4 3 1 9 5

The motivation and key concepts behind


answer set programming—a promising
approach to declarative problem solving.
by Gerhard Brewka, Thomas Eiter,
and Mirosław TruszczyŃski

Answer Set
Programming
at a Glance
computational problems be
C an s olv ing h a r d
made easy? If we restrict the scope of the question to
computational problems that can be stated in terms of
constraints over binary domains, and if we understand
“easy” as “using a simple and intuitive modeling
language that comes with software for processing
programs in the language,” then the answer is Yes!
Answer Set Programming (ASP, for short) fits the bill.
While already well represented at research
conferences and workshops, ASP has been around for
barely more than a decade. Its origins, however, go
back a long time; it is an outcome of years of research
in knowledge representation, logic programming, and
constraint satisfaction—areas that sought and studied
declarative languages to model domain knowledge,
Illustration by gwen vanh ee

as well as general-purpose computational tools for


processing programs and theories that represent
problem specifications in these languages. ASP
credit t k

borrows from each of these areas, all the time aiming


dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 93
review articles

to maintain a balance between ex-


pressivity, ease of use, and computa-
tional effectiveness. To give just a few
examples, emerging applications in ASP-based Team Building
molecular biology, decision support
systems for space shuttle controllers, at Gioia Tauro Seaport
and team building at Gioia Tauro Sea-
The seaport of Gioia Tauro, Reggio Calabria, Italy, is the largest transshipment terminal
port (see sidebar here) bear witness to on the Mediterranean coast. A crucial management task for a port of this size is to build
its potential. teams of employees to handle incoming ships. This is difficult and time consuming, as
one must ensure that teams have appropriate skills, the workload is divided fairly, and
legal workload regulations are met. Until recently this task was performed manually,
Programs and Answer Sets which took several hours per day.
We start our ASP discussion with the In cooperation with Exeura Srl, a University of Calabria (UNICAL) spin-off, and ICO
propositional setting. The building BLG, an Italian logistics company, Nicola Leone’s group at UNICAL has developed
blocks for programs are atoms, literals, an ASP-based system for team building based on the DLV solver.38 Rules describe the
requirements that should be fulfilled regarding: necessary skills of team members;
and rules. Atoms are elementary propo- availability of employees; fairness of workload distribution; and distribution of “heavy"
sitions (factual statements) that may or “risky" tasks. Since in practice not all requirements can be satisfied, the system has an
be true or false; literals are atoms a and implicit conflict handling strategy that gives higher priority to more important criteria.
The system, which has been adopted by ICO BLG for work-force management, can
their negations not a. Rules are expres- generate shift plans for 130 employees within a few minutes. In addition, the plan
sions of the form quality turned out to be considerably better and overtime was decreased by 20%.
Key factors for the success of ASP in this application were its high expressiveness
a ← b1, . . . , bm, not c1, . . . , not cn (1) and the possibility to evolve an executable specification in close interaction with
domain experts on site who, although not computer experts, could help getting it right
in short time.
where a and all bi’s and cj’s are atoms.
Intuitively, a rule (1) is a justification to
“establish” or “derive” that a (the so- what we know, has no rule to derive This bottom-up process can be ex-
called head) is true, if all literals to the broken). This in turn allows us to de- tended to an arbitrary program without
right of ← (the so-called body) are true rive light_on. the not operator. In the general case,
in the following sense: a non-negated Formalizing these intuitions posed however, once negation is allowed the
literal bi is true if it has a derivation, a a challenge to the knowledge repre- situation gets more complicated. For
negated one, not cj, is true if the atom sentation and logic programming instance, let P2 consist of two rules:
cj does not have one. For instance, the communities for years. Eventually,
rule answer sets provided a solution that open ← not closed
gained acceptance. closed ← not open.
light_on ← power_on, not broken Answer sets. To trace the key points
of answer sets, we consider two further In the first example it was clear how
informally means we can assert that examples. Let P1 be the program con- to start and how to proceed. It is not
the light is on, if we established the sisting of the following rules: so here. The reason is we do not know
power is on and there is no reason to which atoms cannot be derived, there-
think the lamp is broken. Rules may high_salary ← employed, educated fore, we cannot verify the conditions
have no body. For instance, we may educated ← high_salary for applying any of the rules.
have a rule: employed ← motivated A way out of the problem is to start
motivated. by assuming which atoms will not be
power_on ← . derived. For instance, let us assume
We can regard motivated as estab- that closed will not be derived. Then,
Such rules are called facts, as the head lished as it is the head of a rule that the first rule can be used and we can es-
is unconditionally true, and the arrow has no preconditions. Consequently, tablish open. Since open is established,
← is typically omitted. the third rule allows us to derive em- the second rule cannot be used and
Programs are finite collections of ployed. Can we obtain anything else? closed indeed will not be established,
rules. They are thought of as “justifi- To get high_salary we need to have verifying our assumption. Thus, the
cations” for sets of atoms that contain established educated and, similarly, set {open} is justified by the program
precisely those atoms that can be es- to get educated we need to have estab- in the following sense. Assuming that
tablished. It is important to point out lished high_salary. This “vicious cycle” atoms not contained in the set cannot
that not is not a standard negation of dependencies cannot be broken as be derived, and using program rules
operator. Rather, it is meant to stand there is no other rule with high_salary (under our intuitive understanding of
for a modality “non-derivable.” Look- or educated in the head. Hence, nei- how they work), we can derive in the
ing at the small program with the ther high_salary nor educated can be bottom-up fashion precisely those at-
two rules mentioned here, power_on derived given the information in the oms that are in the set. Interestingly
should be derived (as it is given as a program. We conclude the set {mo- and importantly, {open} is not the only
fact), while intuitively broken should tivated, employed} is the only one the set justified by the program P2. Anoth-
not (the program, which describes program “justifies.” er one is {closed}: if we assume that

94 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles

open cannot be derived, we can use reading of the rules the correspond-
the second rule to derive closed. Hav- ing literal can be eliminated from the
ing derived closed, we have that open body without affecting the usability of
cannot be derived, confirming the as- the rule. Once this is done, we are left
sumption we made.
Our examples suggest the case of The answer set with a negation-free program, called
the reduct of the program with respect
programs that contain no rules with
not in the body is easier. We do not
semantics of to M. If the set of atoms we can derive
from that program or, in other words,
need to make any assumptions about programs is the the answer set of that program, coin-
what cannot be derived, as no rule has
negated atoms in its body. Instead, we
foundation of cides with M, all non-derivability as-
sumptions we made based on M are
proceed in an iterative fashion collect- ASP. But equally confirmed, and all atoms in M can be
ing atoms that can be established, in
each step using atoms derived already
important is the derived. Thus, M is justified by P. We
call each such set M an answer set of P.
to establish new ones. When no more understanding of The definitions of the reduct and an
atoms can be derived, the process ter-
minates. The unique set of atoms de- how programs answer set are due to Gelfond and Lif-
schitz.20 Originally, they used the term
rived in this way is justified by the pro- encode search stable model and introduced the term
gram, and we call it the answer set of
the program. problems and answer set later for a generalization of
the concept to a broader class of pro-
The concept of an answer set for
negation-free programs (also called
their instances. grams that feature strong negation and
disjunction, which we will discuss.
Horn programs) is a springboard to The new term eventually took over.
the general definition. The intuitions There is some similarity between
we discussed earlier in the context of rules and propositional logic implica-
the program P2 are crucial. We start tions. Indeed, the rule (1) looks like the
with a set M of atoms (in our example, implication
with {open}) and make an assumption
that no atom outside M can be derived. (b1 ∧ . . . ∧ bm ∧ ¬ c1 ∧. . . ∧ ¬ cn) → a (2)
Given this assumption, rules that con-
tain a negated atom not a, where a is written in a “reversed” fashion. Each
in M, become unusable (as the non- answer set of a program is a model of
derivability of a is not assumed; in our the program viewed as a set of implica-
example, closed ← not open is unus- tions (models are truth value assign-
able). These rules are “blocked” by M ments to atoms such that each implica-
and can be disregarded. Therefore, tion evaluates to true). However, not all
we remove them from the program. In models are answer sets as not all mod-
every other rule, if an atom is negat- els satisfy the foundedness requirement
ed, it must have been assumed non- that atoms be derivable in the sense
derivable, otherwise, the rule would described here.
have been removed. According to our It should be noted that ASP has solid
logic foundations, and is closely linked
key insights to nonmonotonic reasoning. In fact,
programs under answer set seman-

Answer set programming is an emerging tics can be seen as a fragment of Re-
approach to modeling and solving search iter’s Default Logic and as theories in
and optimization problems. It combines
an expressive representation language,
nonmonotonic modal logics, includ-
a model-based problem specification ing Moore’s Autoepistemic Logic and
methodology, and efficient solving tools. nonmonotonic KD45.31 David Pearce

The answer set programming language showed that the answer set semantics
allows domain and problem-specific can be elegantly captured by a non-
knowledge, including incomplete monotonic variant of the logic of here
knowledge, defaults, and preferences,
to be represented in an intuitive and and there,35 a logic located between in-
natural way. tuitionistic and classical logic.

Because of its strong declarative Close connection to nonmonoton-
aspect, the language of answer set ic logics provides ASP with the power
programming supports rapid prototyping to model default negation and, more
and development of software for solving
search and optimization problems, and
generally, to deal with incomplete in-
facilitates modifications and refinements formation. We illustrate that by con-
leading to better performance. tinuing our light_on example. The rule

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 95


review articles

broken ← lightning, not lightning_rod both to an atom a and to its (standard) is, unless there are specific reasons for
negation. To represent the latter, the it not to be). Such default rules, which
specifies that the lamp breaks when programmer introduces a new atom ā embody the law of inertia, allow for an
a lightning strikes, unless a lightning and includes in the program the two elegant solution of the frame problem
rod was installed. With this rule ap- rules here. Intuitively, the role of these that arises when one reasons about
pended to the program here, we still rules is to select, in case B is satisfied, actions and their effects, for instance
derive light_on, as we cannot derive exactly one of a and ā; this is precisely when modeling and solving planning
broken. However, things change if we what they do under the answer set se- problems.1
further add the fact lightning. As light- mantics. Pairs of such rules are often Modeling considerations also moti-
ning_rod cannot be derived, we can written in a shorthand notation as a vated allowing disjunctions in the heads
establish broken, and so light_on can single choice rule of rules. Disjunctive rules
no longer be derived. Thus, answer set
programs behave nonmonotonically— {a} ← B. a1∨ . . . ∨ ak ← b1, . . . , bm, not c1, . . . , not cn
conclusions may have to be retracted
when more rules or facts are added Strong negation, denoted with the stan- often make representations more in-
to the theory. Further, if we add one dard negation symbol ¬, allows us to tuitive, for example, in a rule like
more fact lightning_rod, the situation distinguish between having no justifi-
changes again; we can no longer de- cation for an atom a, expressed by not open ∨ closed ← valve.
rive broken, and thus light_on will be a, and having one for the negation of
derived. What this shows is that ASP a, expressed by ¬a. In program rules, To eliminate the possibility for a valve
provides convenient ways for handling ¬ can only appear in front of atoms. to be both, a form of minimality is
exceptions and nested exceptions. Gelfond and Lifschitz showed that needed. It is reflected in the answer
Shorthands and further connec- the definition of answer sets extends sets of a disjunctive program.21 The
tives. A common and important type of to programs of this form almost liter- definition uses the same process as
rules has its head atom occur negated ally.21 Every program P with strong ne- before to “reduce” the program with
in the body: gation can be reduced to an ordinary respect to a candidate atom set M and
program P̄: we simply have to replace yields the reduct that is free of (de-
a ← B, not a. each literal ¬a in P by a new atom ā. It fault) negation. However, the reduct
can be shown that a consistent set of may have disjunctions in the heads
If such a rule, let us denote it by r, is literals S is a (generalized) answer set of its rules and thus, in general, there
added to a program P that has no oc- of P if and only if the set S̄ obtained might be multiple minimal sets of at-
currences of a, then r works as a con- from S by the same modification is an oms that satisfy all rules (and some
straint. Namely, a set M of atoms is answer set of P̄. Thus, strong negation are guaranteed to exist). The idea now
an answer set of the program P ∪{r} is only a modeling convenience. How- is to check whether M is one of these
if and only if M is an answer set of the ever, it makes formulating defaults as minimal sets of the reduct. If this is
program P and does not satisfy (as in in Reiter’s Default Logic easier. For ex- the case, then M is an answer set. Im-
propositional logic) the conjunction of ample, a rule portantly, unlike strong negation ¬,
literals B. In other words, adding r to P disjunction in the rule heads does in-
simply eliminates those answer sets of closedt+1 ← closedt, not ¬ closedt+1 crease the problem-solving capacity of
P that satisfy B. As atom a is auxiliary programs, as witnessed by results on
and thus irrelevant (we do not allow might be interpreted as saying that by complexity and expressive power (see
it in P), a common way to write a con- default, the valve remains closed at the accompanying sidebar “Complex-
straint is as a “headless” rule time t+1 if it was closed at time t (that ity of ASP”).

←B

which conveys the intuition of a con-


straint: satisfying B results in a contra-
Complexity of ASP
diction. To decide whether a given program has some answer set is NP-complete,29 thus
It is also quite common that pro- as complex as the classical propositional satisfiability problem (SAT); in the
presence of disjunctive rules, the problem is NPNP-complete11 (NPNP are the problems
grams contain pairs of rules decidable in NP with an oracle for NP problems); roughly speaking, this means NP-
completeness even if calls to a subroutine for SAT are for free.9,22 Predicate programs
a ← B, not ā have exponentially higher complexity (intuitively, this is because the reduction by
ā ← B, not a, grounding causes an exponential blow up in general). Regarding search problems, ASP
can express all NP-search problems, that is, those solvable using a nondeterministic
Turing machine in polynomial time, in such a way that the answer sets encode the
where neither a nor ā appear as the solutions. In fact, each such problem (for example, finding some Hamiltonian cycle)
head of any other rule in the program, is expressible by a fixed predicate program to which logical facts encoding a given
problem instance (for example, a graph) are added. Again, additional constructs like
and B is a conjunction of literals. This
disjunctive rules may increase the expressivity.
happens, in particular, when the pro-
grammer wants to refer in the program

96 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles

Table 1. ASP grounders. tions. The set S∏(I) may be empty, that
is, problem ∏ may have no solution
for instance I.
To solve a search problem ∏, a pro-
LPARSE www.tcs.hut.fi/Software/smodels/
gram P∏ is designed that captures the
DLV www.dbai.tuwien.ac.at/proj/dlv/ or www.dlvsystem.com/
problem specifications so that when
GRINGO potassco.sourceforge.net/#gringo/
extended with facts D(I), representing
an instance I of the problem, the an-
swer sets of P∏ ∪ D∏(I) describe all solu-
Table 2. Some ASP systems. tions of problem ∏ for the instance I.
The upshot of this design is that solv-
ing the problem is reduced in a uni-
ASSAT assat.cs.ust.hk/ form way (the program P∏ is fixed and
CLASP 1 potassco.sourceforge.net/#clasp/ only the data component changes) to
CMODELS www.cs.utexas.edu/users/tag/cmodels/ the task of finding answer sets.
DLV 2 www.dbai.tuwien.ac.at/proj/dlv/ or www.dlvsystem.com/ We now illustrate how ASP works
GNT www.tcs.hut.fi/Software/gnt/ by analyzing the problem of finding a
SMODELS www.tcs.hut.fi/Software/smodels/ Hamiltonian cycle in a directed graph.
XASP xsb.sourceforge.net/, distributed with XSB The choice is not arbitrary: this is an
1
+ CLASPD, CLINGO, CLINGCON, among others; http://potassco.sourceforge.net/ important combinatorial problem,
2
+ DLVHEX, DLVDB, DLT, DLV-COMPLEX, ONTO-DLV, and others.
arising in several practical situations
(for example, as an essential com-
ponent of the well-known Traveling
Predicate programs. The proposi- A graph for the Hamiltonian cycle problem. Salesperson problem). While simple
tional case is crucial for the definition to state, it is still complex enough to
of answer set semantics. But it is the allow us to emphasize all key aspects
predicate version of the formalism that of ASP. In the problem, we are given a
a b
facilitates modeling and makes ASP an directed graph G = (V,E), where V is the
effective problem-solving technique. set of vertices and E the set of (directed)
The language has relation (or predicate) edges of G. The goal is to find a Hamil-
symbols, constant symbols and vari- tonian cycle in G, that is, a set of edges
ables, as well as the logical connectives that induce in G a directed cycle going
we discussed earlier, but no function through each vertex exactly once.
symbols (we will discuss this restric- We will use two relation symbols to
tion later). A rule is an expression of represent graphs: vtx and edge. Let us
the form consider the graph G shown in the ac-
d c companying figure.
A ← B1, . . . , Bm, not C1, . . . , not Cn (3) We represent the graph G as the set
of ground atoms
where A, Bi, and Ci are atomic formulas
in the language. Rules are regarded as and the answer sets of P are defined to Dhc (G) = { vtx (a), vtx (b), vtx (c), vtx (d)} ∪
being implicitly universally quantified. be those of grnd(P). {edge(a, b), edge(b, c),
The concepts of the head and body of edge(c, d), edge(d, a), edge(b, d)}.
the rule are defined as before and we The ASP Paradigm
interpret a rule (3) similarly as before, ASP is an approach to solving search Next, we need to capture the speci-
too. That is, we understand it as a de- problems. The answer set semantics fication of the problem. A key part is
vice that, under some conditions, al- of programs is the foundation of ASP. the definition of a Hamiltonian cycle.
lows us to derive its head. But equally important is the under- According to our description, it must
More formally, the semantics of standing of how programs encode be a subset of the edges of the graph. To
a predicate program P is defined in search problems and their instances. describe this subset formally, we use
terms of its ground version grnd(P). Niemelä32 and Marek and Truszczyn- a relation symbol in and expressions
The program grnd(P) consists of all ski30 first formulated explicitly the in(a, b) that informally read: the edge
ground instantiations of rules in P with basic principles of the ASP approach, (a, b) is selected for a Hamiltonian cy-
respect to constants that appear in P. Lifschitz26 was the one to propose the cle. To indicate that any edge (X, Y) can
In case P contains no constants (a situ- term. In our discussion we rely on a be “selected” to be in a Hamiltonian
ation that does not occur in practice), rather intuitive understanding of a cycle, we use the choice rule:
one is selected arbitrarily and used to search problem. Namely, we assume
produce grnd(P). The program grnd(P) that a search problem ∏ consists of a (HC1) {in (X, Y)} ← edge (X, Y).
can be regarded as a propositional one set of instances, D∏, with each instance
over all ground atoms in the language, I assigned a finite set S∏(I) of solu- Next, we stipulate that no two selected

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 97


review articles

edges start or end in the same vertex.


To this end, we use two constraint

ASP for Repairing


rules:

(HC2) ← in (V 2, V 1), in (V 3, V 1), V 2 ≠ V 3


(HC3) ← in (V 1, V 2), in (V 1, V 3), V 2 ≠ V 3. Large-Scale Biological
We stress the use of the relation sym-
bol ≠ here. In the predicate version of
Networks
New high-throughput methods have led to a dramatic increase of measurable data in
ASP, we assume the set of constants modern molecular biology, and a number of corresponding knowledge repositories
includes integers, and the set of rela- are available on the Web. However, both the data and the available biological networks
are highly incomplete and error-prone, and inconsistencies are the rule rather than
tion symbols includes symbols such the exception.
as =, ≠, ≤, <, ≥, and >, as well as symbols In a joint project by Potsdam University, INRIA, and Institut Cochin, led by Torsten
for (bounded) arithmetic operations Schaub, an approach for repairing biological networks based on ASP has been
such as +. To be consistent with stan- developed.19 It builds on a range of available repair actions inspired by biological use
cases. Examples are modifications of the role of a node in a biological influence graph
dard notation, we use the infix nota- (for example, from inhibitor to activator), additions of missing links between nodes, or
tion and write X ≤ Y instead of ≤ (X, Y ). modifications of experimental data in cases where it is plausible to assume errors in the
Similarly, we write X + Y = Z instead of measurements.
+(X, Y, Z). All these symbols are always The program rules encode biological knowledge about the repair actions needed
and possible in a particular situation. A possible repair is then achieved by minimizing,
interpreted in the standard way. according to a variety of strategies, the set of applied repair actions. The system uses not
To be a Hamiltonian cycle, the set more than 20 rules to encode five types of repair actions with different targets.
of edges in(x, y) must determine a sin-
gle cycle. To enforce this condition,
we need a concept of one vertex being
reachable from another. To this end, “Complexity of ASP”) tells us that each efficient. Consider the rule p(X) ←
we use an auxiliary relation symbol NP-search problem ∏ is expressible by p(X1), . . . , p(Xn) and assume it needs to
rchble and the following rules: a program P∏ as noted earlier. be grounded for two constants a and
b. Then, the naive grounding will pro-
(HC4) rchble(V, V) Processing Answer Set Programs duce 2n+1 ground instances, as we can
(HC5) rchble(V 1, V 3) ← in(V 1, V 2), Current tools for computing with an- choose for X and each Xi either a or b.
rchble(V 2, V 3). swer set programs support several However, in this case, the full ground-
basic reasoning tasks, which include ing amounts to just two propositional
The rules (HC4) and (HC5) define the computing a single answer set (or de- rules p(a) ← p(b) and p(b) ← p(a), as
transitive closure of the relation in;a that termining that none exist), computing repeated literals in the bodies of rules
is, all pairs of vertices (x, y) such that y a given number of answer sets, and and tautological rules, where the head
can be reached from x by following zero computing all of them. Most tools also atom occurs non-negated in the body,
or more edges that are “in.” Clearly, support deciding whether an atom can be eliminated without affecting
the selected edges form a Hamiltonian is true in every (resp. some) answer the answer sets. Intelligent grounding
cycle if and only if every pair of vertices set, known as cautious (resp. brave) techniques incorporate such equiva-
is in the transitive closure. This condi- reasoning. These modalities are im- lences and many further optimiza-
tion is captured by the following con- portant for reasoning applications; tions. They aim to produce, given a
straint rule: for example, when we want to know predicate program P, a possibly small
whether a fact is true in every (resp. propositional program, not necessari-
(HC6) ← vtx (V 1), vtx (V 2), some) possible evolution of a system ly a subset of grnd(P), that is equivalent
not rchble(V 1, V 2). executing a sequence of actions of to P, that is, has the same answer sets.
bounded length. Current grounders exploit techniques
Let Phc be the program consisting of the ASP processing typically works in such as partial evaluation, rewriting,
rules (HC1) - (HC6). One can show that two stages. First, the predicate program and a great deal of database technol-
a set of edges H is a Hamiltonian cycle is replaced with an equivalent propo- ogy to make grounding efficient. We
in a graph G if and only if H = {(x, y) | sitional program by so-called variable refer to Table 1 for information on the
in(x, y) ∈ M} for some answer set M of replacement or grounding. Second, that three grounders most broadly used
Phc ∪ Dhc(G). program is processed by a proposi- in ASP. Their input formats serve as
Finding a Hamiltonian cycle of an tional ASP solver. Most implemented de facto specifications of three most
arbitrary input G is an NP-hard prob- ASP processing systems make a clear popular ASP dialects. They are quite
lem, and under a suitable notion com- distinction between the two stages and close to each other. Nevertheless, the
plete for all NP-search problems. In offer separate tools for each, others in- need for standardization is recognized
fact, complexity theory (see the sidebar tegrate them. by the ASP community. Extensions to
Grounding. The naive approach to the GRINGO grounder are an impor-
a It is well-known that this is not expressible in grounding is to replace a program P tant step in this direction, making its
first-order logic. with grnd(P); but generally this is not input language much closer to that of

98 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles

the DLV grounder. holds. The completion captures some larly common. For instance, one often
Propositional solving. Table 2 pro- aspects of the foundedness condition, needs to say that exactly one out of a
vides pointers to several current ASP but not all. To capture it entirely, the given set of atoms is true. In the well-
solvers. All of them more or less di- completion must be extended by loop known n-queens problem, we must
rectly exploit methods developed in formulas, that exclude self-supporting place n queens on the n × n chessboard
the field of satisfiability solving. Some derivations.28 Loosely speaking, this so that no two queens attack each
ASP solver algorithms, often referred approach could be cast as other. Here one of the constraints is
to as native (to ASP), follow the gen- that exactly one queen is in each row.
eral backtracking search pattern of ASP = completion + loop formulas. Even though this can be naturally en-
SAT solvers but append SAT-based coded in the basic ASP language, the
propagation techniques with ones Once the completion and loop formu- grounding will result in a large num-
implied by an additional foundedness las are built, an off-the-shelf SAT solver ber of rules. ASP input languages thus
condition that models must satisfy to is used to find models of the resulting provide constructs for constraints on
be answer sets.25 It means every atom theory and so, answer sets of the origi- sets of atoms that ASP solvers handle
that is true in a model must be de- nal program. In the worst case, there suitably. Basically, there are two ap-
rived (in a certain precise sense) by a can be exponentially many loop for- proaches.
rule in the program. The search back- mulas, which complexity theory tells The first approach, which originat-
tracks when either a contradiction is is somehow unavoidable. Therefore, ed with LPARSE, uses the concept of a
derived, or a complete and consistent some ASP solvers based on this idea, cardinality atom. In the propositional
assignment is found but some atoms for example, ASSAT, add loop formulas case, it has the form
that are true lack a derivation (are not incrementally and test whether models
founded). In each case, the need to are already answer sets, while others, l {a1, . . . , an} k
backtrack indicates that some deci- such as CMODELS2, similarly employ
sions made in the search earlier are special techniques to select promising and reads: at least l and at most k atoms
incompatible with any answer set of loop formulas to add and to “forget” in the set {a1, . . . , an} are true (if l or k
the program and must be changed. them later. are missing, it implies no restriction
This group of algorithms embodies a Other reductions of ASP computa- from the respective side). In the predi-
perspective on answer sets best cap- tion to SAT solving use auxiliary atoms cate language, one can be even more
tured by a catchphrase for level rankings to represent founded concise and write expressions such as
derivation by keeping track of succes-
(propositional) ASP = SAT + foundedness. sive rule applications. Following this L {a(X) : p(X, Y)} K,
direction, translations of ASP to SAT
The answer set search outlined ear- modulo difference logic have been pro- where L, K, X, and Y are variables. The
lier can be improved by sophisticated posed that exploit fast solvers for theo- expression captures a condition that
search heuristics and techniques like ries in that formalism.33 given a value for Y, for at least L and
backjumping and clause learning de- at most K of the values of X such that
veloped in the field of SAT solvers. The ASP Extensions p(X, Y) holds, a(X) is true. To ensure the
current ASP solvers take full advan- Motivated by the needs of applications, grounding process is well defined, syn-
tage of these techniques. The native several extensions of the basic ASP par- tactic conditions on variables are used.
ASP solver CLASP, dressed as a SAT adigm have been proposed. Let us denote by q(X, Y) that some
solver, won two tracks of the 2009 SAT Constraints and aggregates. Con- queen is in row X and column Y. We
solver competition. straints on sets of atoms are particu- can state the uniqueness constraint on
Other successful ASP solver algo-
rithms are based on reductions of
answer set solving to satisfiability
testing. They modify the formula cor-
responding to a program so that its
ANTON—An ASP-based
models are exactly (or up to trivial
projections) the answer sets of the
Music Composition System
program. One approach is to produce ANTON,4 developed at University of Bath in cooperation with University of Glamorgan,
the so-called program completion. is an automatic system for the composition of Renaissance-style music. It represents
It reflects the idea that the program musical knowledge in the form of about 500 ASP rules. The rules describe the
provides all conditions under which progression of a melody, both at the local level (the choice of the next note) and at
the global level (the overall structure), the harmony that arises from the relationship
atoms are true; that is, it is a defini- between the melodic line and the supporting instruments, and also the rhythm, such as
tion of the atoms in its rule heads. the intervals between notes, of a piece.
Accordingly, the completion is the Given some initial information, for example, fixed notes or number of parts,
the program generates answer sets representing musical pieces that satisfy the
formula containing for each atom a
composition rules. With minor modifications, the system can also be used to detect
an equivalence saying that a holds violations of composition rules in given pieces of music.
if and only if the disjunction of the
bodies of all rules with a in the head

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 99


review articles

queens in each row concisely by the fol- sets of P that minimize the sum of the effectiveness of ASP-based methods
lowing two constraint rules: weights of violated weak constraints. for these tasks by analysis of natural
Other, non-numerical approaches use languages and parasite-host systems
← 2 {q(X, Y) : col(Y)}, row(X) an external partial preference order on species of oak trees.
← {q(X, Y) : col(Y)} 0, row(X). rules or special syntactic constructs in Industrial applications. An early,
the rules; for example, Brewka et al.6 almost prototypical industrial applica-
The first rule states that for no row X In each case the available preference tion for ASP is product configuration.39
there are distinct Y and Y' such that information induces a corresponding The general idea is to have rules in a
q(X, Y ) and q(X, Y') are true (no row ordering on answer sets, and the best program that generate the space of
contains two or more queens). The ones are chosen. all combinations of product compo-
second rule states that for no row X, Modularity and external data ac- nents. Constraint rules then filter out
it holds that all atoms q(X, Y) are false cess. Modularity is an important no- configurations that are impossible,
(there is no row without queens). tion in software development. In the either due to some given, fixed restric-
There is a more general version of context of ASP it is only beginning to tions on how components can be com-
cardinality constraints, weight con- receive the attention it deserves but bined, or due to a violation of specific
straints, where each atom is associat- already several key concepts and ideas user requirements. Another early ap-
ed with a weight and the bounds con- have been developed.10,23 Modulariza- plication is a decision support system
strain the sum of the weights of atoms tion is a way to structure and ease for the space shuttle.34 During normal
that have some property. the program development process. shuttle operations, astronauts follow
The second approach to modeling Modular ASP programs consist of pre-scripted plans. However, in case
constraints on sets of atoms follows modules that are combined through of failure different courses of action
the idea of aggregates familiar from suitable interfaces. This way parts of are needed to ensure safety of the crew
SQL in databases.16 Those implement- a program can be developed and veri- and completion of the mission. As ex-
ed in ASP languages include count, fied independently, and they can be ponentially many failures are possible,
sum, maximum, and minimum and fol- more easily reused. A related issue is pre-planning for all exceptional cir-
low closely the database syntax. In the to integrate external sources into ASP cumstances is unfeasible, and decision
DLV input language, the unique-queen programs. In a rule one would often support is needed. Based on failure in-
constraint is expressible by like to access a database, an ontol- formation, the ASP system suggests a
ogy or some other source of informa- course of action.
← 1 ! = #count{Y : q(X, Y)}, row(X). tion. To serve this, HEXprograms13 Data management. INFOMIXb is a
provide a universal interface for arbi- project on advanced information inte-
The input language of GRINGO also trary sources of external computation gration. The main task is to provide a
recognizes aggregates such as count through the notion of external atom, uniform interface to pre-existing data
and sum but specifies bounds as in which is akin to a remote procedure sources, where an information integra-
cardinality constraints; this points to call but facilitates proper recursion. tion system frees the user from finding
the need for standardization of ASP and accessing relevant data sources,
input languages. Applications and from cleaning and combining data
Preferences. A basic assumption of The ASP paradigm is rather new but it in them. Here, in particular, proper
the ASP paradigm is that problems are has already led to many successful ap- handling of incomplete and inconsis-
modeled in a way such that answer sets plications. We briefly discuss a few ex- tent data is challenging. The INFOMIX
represent their solutions. However, it amples in different categories. Further prototype showed that ASP provides
is impossible to further distinguish examples can be found in the team- effective technology to deal with ad-
between better and poorer solutions. building sidebar noted earlier as well vanced information integration tasks.
One way to address this problem is to as the ones entitled “ASP for Repairing ASP also proved to be a valuable host
introduce preferences. Simple forms Large-Scale Biological Networks” and for realizing query engines in the con-
of preferences can be expressed using “ANTON—An ASP-based Music Com- text of the Web. In fact, one of the first
#minimize and #maximize statements position System.” SPARQL reasoning engines for query-
that are supported by several of the Applications in science and hu- ing RDF data sources has been realized
existing ASP solvers. They allow us to manities. An illustrative example is via an ASP encoding.37
associate weights with specific liter- phylogenetic systematics—the study Artificial intelligence. Given the
als. The generated answer sets then of evolutionary relations between fact that ASP has roots in knowledge
are those for which the sum of the species based on their shared traits.15 representation and nonmonotonic
weights of satisfied literals is minimal/ These relations can form a tree reasoning, its usage for problem solv-
maximal. The DLV system provides so- (called a “phylogeny”) where leaves ing in artificial intelligence (AI) has
called “weak constraints,” which carry represent the species, internal ver- been investigated early on. Classic
a weight of importance; they should be tices their ancestors, and edges the AI problems including planning, di-
satisfied if possible, but their violation genetic relationships between them. agnosis, and agent decision making
does not “kill” answer sets. The an- The computational task is to con- have been reduced to ASP, resulting
swer sets of a program P plus a set W struct phylogenies, and researchers
of weak constraints are those answer demonstrated the applicability and b www.mat.unical.it/infomix/

100 commun ications of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles

in effective realizations (several are equally well in ASP. But there are prob-
available, for example, as DLV fron- lems—typically involving concepts
tends). As it turned out, thanks to its defined inductively such as reachabil-
features—high expressiveness, non- ity in graphs—that are easy to cast in
determinism via multiple answer sets,
and high declarativity—ASP is a valu- Thanks to its ASP, but representing them appropri-
ately for SAT solving results in larger
able host language for domain-specif-
ic AI formalisms, allowing for quick
features--high instances that slow down solving. In
a similar vein, the language of ASP of-
experimental prototyping. A recent expressiveness, fers constructs such as “minimized”
example of this is repair of Web-ser-
vice workflows,18 where these features
nondeterminism disjunction, aggregates and priori-
ties that are useful in practical appli-
were fruitfully exploited. via multiple answer cations, are easy to use, and are sup-

Relation to Other Formalisms


sets, and high ported by most current ASP solvers.
These constructs require specialized
ASP is just one of many ways to solve declarativity-- ad hoc treatment when modeling for
search problems by means of logic
reasoning procedures. We briefly com- ASP is a valuable SAT solving. For some of them concise
representations are not even possible.
ment on three formalisms for declara- host language for ASP and Prolog. Prolog is the most
tive problem solving that are both re-
lated and relevant to ASP. domain-specific widely known logic programming lan-
guage. For some time, however, the
ASP and SAT solving. The key idea
of ASP—to encode the solutions of
AI formalisms, interest in Prolog has been declining,
in part because expectations of ambi-
a search problem in the models of a allowing for quick tious endeavors like the Fifth Genera-
logical theory for declarative problem
solving—had been exploited before. In
experimental tion Project could not be met. Is ASP,
which is sometimes called Answer Set
a landmark paper, Kautz and Selman24 prototyping. Prolog, a better Prolog? The two are
showed that encoding a planning similar in syntax and there are seman-
problem as a theory in propositional tic connections, too. For a large class
logic, with plans represented by mod- of programs, if Prolog returns “yes”
els, and using SAT solvers to find mod- (respectively “no”) to a ground query,
els and so plans, could outperform then the query belongs (respectively,
specialized planners. Applications of does not belong) to the unique an-
similar nature led to a boom in SAT swer set of the program. But in spite
solver technology. of these similarities, ASP and Prolog
While both SAT solving and core are actually quite different. Prolog
ASP apply in principle to the same was designed as a general purpose,
problems, there are differences. First, Turing-complete programming lan-
ASP supports variables that range guage. It uses function symbols for
over finite domains and enable uni- nested terms to build potentially in-
form and compact representation of finite data structures, and recursion
problems independently of data. In for unbounded computation; solu-
the Hamiltonian cycle example, we tions are computed by query answer-
have a single, fixed program that works ing, which amounts to proof search.
uniformly with all input graphs. ASP In contrast, ASP was not conceived
grounders produce instances for ASP for such generality and works over a
solvers based on the program and the finite domain of “flat” data (though
input graph. Having problem specifi- work on function symbols in ASP is
cations separate from data facilitates under way); solutions are encoded in
debugging and testing, supports op- answer sets; that is, in models, and
timization and developing reusable thus model-finding, not proof-find-
problem modules, all topics currently ing, methods matter.
under research. There is no such sepa- To be an effective Prolog program-
ration of problem specification and mer one needs to understand how to
data in SAT, where the two are hard- use terms as data structures, not quite
wired into programs that generate intuitive, and not part of any standard
satisfiability instances to be solved. CS curriculum, and to understand
This makes development of software Prolog’s evaluation strategy, SLD reso-
engineering techniques for SAT dif- lution with unification, which is ar-
ficult. Second, any problem that can guably quite difficult to master with
be modeled in SAT can be modeled no adequate logic background. To

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 101


review articles

complicate matters more, the order functions and partitions as possible discovering errors is difficult. There
of rules in a Prolog program and of values of decision variables; mod- is some research in this direction al-
subgoals (literals) in rule bodies mat- eling requires some mathematical ready,5,7 but the ideas proposed need
ters. Changing it may turn a working sophistication. Mapping the high- to be explored further. Methodologies
program useless. These features give level specification of a problem into for development and optimization are
a programmer control over the execu- constraints that will lend themselves also important issues. Much progress
tion of search and make Prolog a pro- well to processing also requires cer- was made in understanding the the-
gramming language, a formalism in tain mathematical background, and ory behind modularity of answer set
which one can implement algorithms. expertise in constraint modeling and programs. We discussed some of that
In this sense, Prolog misses true de- solving. On the other hand, the lan- research earlier. Here we mention re-
clarativity. ASP, on the other hand, guage of ASP and its extensions were search on strong equivalence27 or, to
offers ways to model specifications developed with knowledge represen- put it informally, equivalence for re-
yet does not allow the programmer tation applications in mind and their placement within larger systems, and
to control the search. Consequently, constructs were designed to capture further notions of equivalence.40 El-
while less expressive, ASP is “more de- patterns of natural language state- egant technical results are now avail-
clarative:” it is intuitive, requires less ments, definitions, and default ne- able, but their impact on practical de-
background in logic, and its seman- gation. The language is simple and velopments remains open.
tics is robust to changes in the order of intuitive to use. In addition, once a Function symbols often make mod-
literals in rules and rules in programs. problem is modeled in ASP all subse- eling easier and the resulting encod-
Still, to solve practical application quent steps are performed automati- ings more readable and concise.
problems in ASP efficiently some ex- cally. A grounder compiles a program Thus, not allowing them in ASP (ex-
perience is required. Typically, there into its propositional form and a cept in built-ins for arithmetic) was
are alternative ways to model a prob- solver computes solutions. There are perceived as a limitation. But allow-
lem as an answer set program, and also differences at the solving stage. ing uninterpreted function symbols
the resulting programs may perform For constraint programming this step renders most of the ASP program pro-
quite differently. One of the more ob- consists of solving a CSP over an arbi- cessing techniques useless, as ground
vious and in the same time more im- trary but finite value domain. For ASP, programs typically become infinite. A
portant considerations for designing all domains are binary (the variables middle ground can be found, though.
efficient answer set programs is that are propositional atoms). This restric- It requires imposing restrictions on
the size of the ground program be pos- tion opens a way to highly efficient how function symbols can occur in
sibly small. implementations, as witnessed by the programs. Some globally constrain
ASP and constraint programming. recent impressive advances in SAT atom dependency in the grounded
Constraint programming is concerned solving technology. program,3,8 while others locally con-
with modeling and solving problems, strain the rule syntax.14 The LPARSE
where solutions are assignments of Ongoing Developments grounder was the first to offer (albeit
values from finite domains to decision ASP processing tools are under con- limited) support of function symbols,
variables. These assignments are sub- tinuous development and already while GRINGO and the DLV system
ject to constraints given in the prob- achieved levels that make them effec- (latest release) include some of the
lem statement. tive in large-scale practical applica- more recent advances. Recent re-
For instance, we can specify the n- tions. Efforts to increase efficiency by search indicates that ASP can provide
queens problem as follows: assign to new grounding technology and solv- a full first-order language for non-
each of n decision variables x1, . . . , xn, ing methods, but also non-ground monotonic reasoning, with the no-
a value from 1, . . . , n so that xi ≠ xj , evaluation are under way. To a large tion of an answer set extended to this
for i ≠ j, and |xi − xj | ≠ |i − j|. To solve degree the advances are the result of a setting.17,36 Computational support
a problem like this in constraint pro- communitywide effort to build bench- and further research will be required,
gramming, one describes it in some marks, collect hard test problems and however, to make this available for
high-level modeling language, such as instances, and organize regular ASP practical applications.
ESSENCE or ZINC, and then maps the system competitions. Integration of SAT solving with
description into a set of constraints However, the situation is quite dif- constraint solving techniques known
in some low-level format or, in other ferent as concerns basic software de- as Satisfiability Modulo Theories has
words, into a constraint satisfaction velopment support in ASP. Although proved successful for SAT. The ASP
problem (CSP), which is then solved. the first integrated development en- community has recently taken up
The similarities with ASP—modeling vironment ASPIDEc was recently an- this idea, with CLINGCON (see Ta-
in a high-level language and compil- nounced, much remains to be done. ble 2) being a very promising system
ing to a low-level representation—are One of the areas in need of progress combining ASP with specialized con-
evident. But there are differences. is program debugging. Even if devel- straint solvers.
High-level languages including oping answer set programs benefits Quantitative methods turned out to
those mentioned here closely follow from the declarative nature of ASP, be extremely effective in knowledge
mathematical notation and, in par- representation applications in which
ticular, support using sets, relations, c www.mat.unical.it/~ricca/aspide/ uncertainty cannot be avoided. ASP as

102 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


review articles

it exists now is not designed for such 6. Brewka, G., Niemelä, I. and Truszczyński, M. Answer 27. Lifschitz, V., Pearce, D. and Valverde, A. Strongly
set optimization. In Proc. 18th International Joint equivalent logic programs. ACM Trans. Computational
applications. This is a drawback and Conference on Artificial Intelligence. G. Gottlob and Logic 2, 4 (2001), 526–541.
so there are already research efforts to T.Walsh, Eds. Morgan Kaufmann, 2003, 867–872. 28. Lin, F. and Zhao, Y. ASSAT: Computing answer sets
7. Brummayer, R. and Järvisalo, M. Testing and of a logic program by SAT solvers. In Proc˙18th
enhance ASP with means to combine debugging techniques for answer set solver National Conference on Artificial Intelligence and 14th
probabilities and utilities with quali- development. Theory and Practice of Logic Conference on Innovative Applications of Artificial
Programming 10, 4-6 (2010) 741–758. Intelligence, 2002, 112–117.
tative representations of uncertainty.2 8. Calimeri, F., Cozza, S., Ianni, G. and Leone, 29. Marek , V.W. and Truszczyński, M. Autoepistemic logic.
This research direction has not yet ma- N. Computable functions in ASP: Theory and J. ACM 38, 3 (1991) 588–619.
implementation. In Proc. 24th International 30. Marek , V.W. and Truszczyński, M. Stable models and
tured, though, and it is too early to say Conference on Logic Programming, LNCS 5366. M. an alternative logic programming paradigm. The Logic
Garcia de La Banda and E. Pontelli, Eds. Springer, Programming Paradigm—A 25-Year Perspective.
how successful such integration will K. Apt, K.V. V.W. Marek, M.W. Truszczyński and D.S.
2008, 407–424.
turn out to be. 9. Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, Warren, Eds. Springer, 1999, 375–398.
A. Complexity and expressive power of logic 31. Marek , V.W. and Truszczyński, M. Nonmonotonic
programming. ACM Computing Surveys 33, 3 (2001), Logics – Context-Dependent Reasoning. Springer,
Conclusion 374–425. 1993.
10. Dao-Tran, M., Eiter, T., Fink, M. and Krennwallner, T. 32. Niemelä, I. Logic programming with stable model
The aim of this article was to provide Modular nonmonotonic logic programming revisited. semantics as constraint programming paradigm.
the reader with a basic understand- In Proc. 25th International Conference on Logic Annals of Mathematics and Artificial Intelligence 25,
Programming, LNCS 5649. P. M. Hill and D.S. Warren, 3–4 (1999), 241–273.
ing of the main motivation, the most Eds. Springer, 2009, 145–159. 33. Niemelä, I. Stable models and difference logic. Annals
important concepts, and the relevant 11. Eiter, T. and Gottlob, G. On the computational cost of of Mathematics and Artificial Intelligence 53, 1 (2008),
disjunctive logic programming: Propositional case. 313–329.
techniques underlying ASP, a rather Annals of Mathematics and Artificial Intelligence 15, 34. Nogueira, M., Balduccini, M., Gelfond, M., Watson, R.
new yet highly promising declarative 3/4 (1995), 289–323. and Barry, M. A Prolog decision support system for the
12. Eiter, T., Ianni, G., and Krennwallner, T. Answer space shuttle. In Proc. 1st International Workshop on
problem-solving paradigm. set programming: A primer. Reasoning Web, LNCS Answer Set Programming. A. Provetti and T. C. Son,
We covered answer set semantics, 5689. S. Tessaris, E. Franconi, T. Eiter, C. Gutierrez, Eds, 2001.
S. Handschuh, M.-C. Rousset, and R. A. Schmidt, Eds. 35. Pearce, D. Equilibrium logic. Annals of Mathematics
both for propositional and predicate Springer, 2009, 40–110. and Artificial Intelligence 47, 1-2 (2006), 3–41.
programs, discussed the ASP para- 13. Eiter, T., Ianni, G., Schindlauer, R. and Tompits, H. A 36. Pearce, D. and Valverde, A. Towards a first order
uniform integration of higher-order reasoning and equilibrium logic for nonmonotonic reasoning. In
digm, and related it to some other external evaluations in answer-set programming. In Proc. 9th European Conference on Logics in Artificial
Intelligence, LNCS 3229. Springer, 2004, 147–160.
problem-solving approaches. More- Proc. 19th International Joint Conference on Artificial
37. Polleres, A. From SPARQL to rules (and back). In Proc.
Intelligence. L. P. Kaelbling and A. Saffiotti, Eds. 2005,
over, we presented algorithms and 90–96. 16th International Conference on World Wide Web.
C.L. Williamson, M.E. Zurko, P.F. Patel-Schneider, and
solvers, several extensions of the basic 14. Eiter, T. and Simkus, M. FDNC: Decidable
P.J. Shenoy, Eds. ACM, 2007, 787–796.
nonmonotonic disjunctive logic programs with function
approach, and some illustrative ap- symbols. ACM Trans. Computational Logic 11, 2 38. Ricca, F., Grasso, G., Alviano, M., Manna, M. Lio, V.
(2010). Liritano, S. and Leone, N. Team-building with answer
plications. This article should not be set programming in the Gioia-Tauro seaport. Theory
15. Erdem, E. Applications of answer set programming
viewed as a complete overview of the in phylogenetic systematics. Logic Programming, and Practice of Logic Programming, 2011; doi:10.1017/
Knowledge Representation, and Nonmonotonic S147106841100007X.
field. It is meant as an appetizer. For 39. Soininen, T. and Niemelä, I. Developing a declarative
Reasoning: Essays Dedicated to Michael Gelfond on
a more complete picture we recom- the Occasion of His 65th Birthday, LNCS 6565. M. rule language for applications in product configuration.
Balduccini and T. C. Son, Eds. Springer, 2011, 415–431. In Proc. 1st International Workshop on Practical
mend Eiter et al.12 or Baral.1 16. Faber, W., Pfeifer, G., Leone, N., Dell’Armi, T. and Ielpa, Aspects of Declarative Languages, LNCS 1551. G.
G. Design and implementation of aggregate functions Gupta, Ed. Springer, 1999, 305–319.
in the DLV system. Theory and Practice of Logic 40. Woltran, S. A common view on strong, uniform,
Acknowledgments Programming 8, 5-6 (2008), 545–580. and other notions of equivalence in answer-
The authors are grateful to the review- 17. Ferraris, P., Lee, J. and Lifschitz, V. Stable models and set programming. Theory and Practice of Logic
circumscription. Artificial Intelligence 175, 1 (2011), Programming 8, 2 (2008), 217–234.
ers for comments that helped improve 236–263.
the presentation of the material. Brew- 18. Friedrich, G., Fugini, M., Mussi, E., Pernici, B. and Tagni,
Gerhard Brewka (brewka@informatik.uni-leipzig.de)
G. Exception handling for repair in service-based
ka’s work was supported by the DFG processes. IEEE Trans. on Software Engineering 36, 2 is a professor of computer science at University of
Leipzig's Informatics Institute, Leipzig, Germany.
grant Br1817/3; Eiter’s work was sup- (2010) 198–215.
19. Gebser, M., Guziolowski, C., Ivanchev, M., Schaub, T., Thomas Eiter (eiter@kr.tuwien.ac.at) is a professor of
ported by the Austrian Science Fund Siegel, A., Thiele, S. and Veber, P. Repair and prediction computer science at Vienna Univ. of Technology’s Institute
(FWF) grants P20840 and P20841, Vi- (under inconsistency) in large biological networks with of Information Systems, Vienna, Austria.
answer set programming. In Proc. 12th International
enna Science and Technology Fund Conference on Principles of Knowledge Representation Mirosław Truszczyński (mirek@cs.uky.edu) is a
professor at University of Kentucky’s Department of
(WWTF) ICT08-020, and the European and Reasoning. F. Lin, U. Sattler, and M. Truszczynski,
Computer Science, Lexington, KY.
Eds., 2010, 497–507.
Commission grant ICT FP7 231875. 20. Gelfond, M. and Lifschitz, V. The stable model
Truszczyński’s work was supported by semantics for logic programming. Logic Programming:
The 5th International Conference and Symposium.
NSF grant IIS-0913459. R.A. Kowalski and K. Bowen, Eds. MIT Press,
Cambridge, MA, 1988, 1070–1080,
21. Gelfond M. and Lifschitz, V. Classical negation in logic
References programs and disjunctive databases. New Generation
1. Baral, C. Knowledge Representation, Reasoning and Computing 9 (1991), 365–385.
Declarative Problem Solving. Cambridge University 22. Greco, S., Molinaro, C., Trubitsyna, I. and Zumpano,
Press, 2003. E. NP datalog: A logic language for expressing search
2. Baral, C., Gelfond, M. and Rushton, J.N. Probabilistic and optimization problems. Theory and Practice of
reasoning with answer sets. Theory and Practice of Logic Programming 10, 2 (2010), 125–166.
Logic Programming 9, 1 (2009), 57–144. 23. Janhunen, T., Oikarinen, E., Tompits, H. and Woltran,
3. Baselice, S., Bonatti, P.A. and Criscuolo, G. On finitely S. Modularity aspects of disjunctive stable models.
recursive programs. Theory and Practice of Logic Journal of Artificial Intelligence Research 35 (2009),
Programming 9, 2 (2009), 213–238. 813–857.
4. Boenn, G., Brain, M., Vos, M.D. and Fitch, J. Automatic 24. Kautz, H.A. and Selman, B. Planning as satisfiability.
music composition using answer set programming. In Proc. 10th European Conference on Artificial
Theory and Practice of Logic Programming 11, 2-3 Intelligence. B. Neumann, Ed. 1992, 359–363.
(2011), 397–427. 25. Leone, N., Rullo, P. and Scarcello, F. Disjunctive stable
5. Brain, M. and Vos, M.D. Debugging logic programs models: Unfounded sets, fixpoint semantics and
under the answer set semantics. In Proc. 3rd computation. Information and Computation 135, 2
International Workshop on Answer Set Programming, (June 1997), 69–112.
CEUR Workshop Proceedings 142, 2005. M. De Vos and 26. Lifschitz, V. Answer set programming and plan
A. Provetti, Eds. generation. Artificial Intelligence 138 (2002), 39–54. © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 103


careers

American University eral core areas of interest, including, but not lim- world’s largest Baptist University. Baylor’s mis-
Assistant Professor in Computational ited to, game design and development, software sion is to educate men and women for worldwide
Neuroscience engineering, computational biology, machine leadership and service by integrating academic
learning or large-scale data mining. A successful excellence and Christian commitment within a
The College of Arts and Sciences at American Uni- candidate will also exhibit a passion for teaching caring community. Baylor is actively recruiting
versity (Washington, DC) invites applications for and mentoring at the graduate and undergradu- new faculty with a strong commitment to the
a full-time, tenure-track, Assistant Professor posi- ate level. For position details and application in- classroom and an equally strong commitment to
tion, beginning in August 2012, in computational formation please visit: http://www.baylor.edu/hr/ discovering new knowledge as Baylor aspires to
neuroscience (broadly defined, including but not index.php?id=81302 become a top tier research university while reaf-
limited to neural networks, simulation, image The Department: The Department offers a CS- firming and strengthening its distinctive Chris-
processing, and bio-informatics). The appoin- AB-accredited B.S. in Computer Science degree, a tian mission as described in Baylor 2012 (www.
tee’s tenure home and departmental affiliation B.A. degree with a major in Computer Science, a baylor.edu/vision/). The combination of teach-
will depend on his or her research background. B.S. in Informatics with a major in Bioinformat- ing, research and service has made Baylor one of
Applicants must have a PhD in a relevant disci- ics, and a M.S. degree in Computer Science. The the best universities for faculty, according to the
pline. Teaching and post-doctoral experience are Department has 13 full-time faculty members, Chronicle of Higher Education http://chronicle.
preferred. Responsibilities include: teaching and over 250 undergraduate majors and approximate- com/article/Great-Colleges-to-Work-For/128312/.
curriculum development; establishing an interna- ly 30 master’s students. We are currently seeking The Department of Computer Science seeks
tionally recognized research program, preferably approval to offer a dual Ph.D. degree in coopera- a productive scholar and dedicated teacher for
one that can involve undergraduate research par- tion with a well-established partner institution. a tenure-track position beginning August, 2012.
ticipation; strengthening connections to neurosci- Interested candidates may contact any faculty All specializations will be considered. Game/
ences across campus; and service to the appoin- member to ask questions and/or visit the web site simulated environments, mobile computing,
tee’s home department and the wider university. of the School of Engineering and Computer Sci- and graphics are of particular interest. The suc-
American University has made other recent ence at http://www.ecs.baylor.edu. cessful candidate will hold a terminal degree in
hires in neuroscience, and benefits from prox- The University: Chartered in 1845 by the Re- Computer Science or a closely related field, dem-
imity to other scientific institutions in the Wash- public of Texas, Baylor University is the oldest onstrate scholarly capability in his or her area of
ington area. (For example, NIH is three metro university in Texas and the world’s largest Baptist specialization, and exhibit a passion for teaching
stops from the AU campus.) The College of Arts University. It is situated on a 500-acre campus and mentoring at the graduate and undergradu-
and Sciences offers a variety of degrees at the next to the Brazos River and annually enrolls more ate level. For position details and application in-
undergraduate, masters, and doctoral levels. For than 14,000 students in over 150 baccalaureate formation please visit: http://www.ecs.baylor.edu.
more information about our programs, visit www. and 80 graduate programs. Baylor’s mission is to The Department: The Department offers a CS-
american.edu/cas/. educate men and women for worldwide leader- AB-accredited B.S. in Computer Science degree, a
Applicants should submit a cover letter, ship and service by integrating academic excel- B.A. degree with a major in Computer Science, a
curriculum vitae, teaching statement, and re- lence and Christian commitment within a caring B.S. in Informatics with a major in Bioinformat-
search statement, and applicants must arrange community. Baylor is actively recruiting new fac- ics, and a M.S. degree in Computer Science. We
for three letters of recommendation to be sent ulty with a strong commitment to the classroom are currently seeking approval to offer a dual
directly to the search committee. Materials can and an equally strong commitment to discover- Ph.D. degree in cooperation with a well-estab-
be submitted online (highly preferred) at http:// ing new knowledge as Baylor aspires to become a lished European institution. The Department has
academicjobsonline.org/ajo, or via email to Com- top tier research university while reaffirming and 15 full-time faculty, over 370 undergraduate ma-
pNeuroSearch@american.edu, or in hard copy to strengthening its distinctive Christian mission as jors and 30 master’s students. The Department’s
Computational Neuroscience Search Committee, described in Baylor 2012 (www.baylor.edu/vision/). greatest strength is the faculty’s dedication to the
Department of Mathematics and Statistics, Ameri- Application Procedure: Applications, includ- success of the students and each other. Interest-
can University, Washington, DC 20016-8050. Ap- ing detailed curriculum vitae, a statement dem- ed candidates may contact any faculty member
plications received by December 10, 2011 will re- onstrating an active Christian faith, and contact to ask questions and/or visit the web site of the
ceive full consideration. American University is an information for three references should be sent School of Engineering and Computer Science at
EEO/AA institution, committed to a diverse faculty, to: Chair Search Committee, Department of Com- http://www.ecs.baylor.edu.
staff, and student body. Women and minority can- puter Science, Baylor University, One Bear Place The University: Baylor University, situated on
didates are strongly encouraged to apply. Ameri- #97356, Waco, TX 76798-7356. a 500-acre campus next to the Brazos River. It an-
can University offers employee benefits to same- Appointment Date: Fall 2012. For full consid- nually enrolls more than 14,000 students in over
sex domestic partners of employees and prohibits eration, applications should be received by Janu- 150 baccalaureate and 80 graduate programs
discrimination on the basis of sexual orientation/ ary 1, 2012. through: the College of Arts andSciences; the
preference and gender identity/expression. Schools of Business, Education, Engineering and
Baylor is a Baptist university affiliated with the Bap- Computer Science, Music, Nursing, Law, Social
tist General Convention of Texas. As an Affirmative Work, and Graduate Studies; plus Truett Semi-
Baylor University Action/Equal Employment Opportunity employer, nary and the Honors College. For more informa-
Assistant, Associate or Full Professor Baylor encourages minorities, women, veterans, and tion see http://www.baylor.edu.
of Computer Science persons with disabilities to apply. Application Procedure: Please submit a let-
ter of application, current curriculum vitae,
The Department of Computer Science seeks a and transcripts. Include names, addresses, and
productive scholar and dedicated teacher for a Baylor University phone numbers of three individuals from whom
tenure-track position beginning August, 2012. Assistant or Associate Professor you have requested letters of recommendation
The ideal candidate will hold a terminal degree in of Computer Science to: Jeff Donahoo, Ph.D., Search Committee Chair,
Computer Science or closely related field, demon- Baylor University, One Bear Place #97356, Waco,
strate scholarly capability and an established and Chartered in 1845 by the Republic of Texas, Baylor Texas 76798-7356, Materials may be submitted
active independent research agenda in one of sev- University is the oldest university in Texas and the to: Jeff_Donahoo@baylor.edu

104 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Appointment Date: Fall 2012. For full con- The CS&IS Department at Bradley University Ph.D.), beginning mid-August 2012. Outstand-
sideration, applications should be received by currently consists of 9 full time tenure/tenure- ing candidates in all areas will be considered.
January 1, 2012. However, applications will be ac- track faculty professorial positions and one lec- We are particularly interested in candidates
cepted until the position is filled. turer position. We currently offer bachelors and whose research area is compatible with teaching
masters degrees in both Computer Science and a theory of computation course. Background in
Baylor is a Baptist university affiliated with the Bap- Computer Information Systems. the area of programming languages would be a
tist General Convention of Texas. As an Affirmative To apply, send a letter of application, a curric- plus. Candidates are expected to have completed
Action/Equal Employment Opportunity employer, ulum vitae, transcripts, and at least three letters their Ph.D. requirements in computer science
Baylor encourages minorities, women, veterans, and of reference to: or a closely related field by August 15th, 2012.
persons with disabilities to apply. Chair Search Committee A strong commitment to excellence in teach-
Department of CS&IS ing and scholarship is required. The success-
Bradley University ful candidate must be able to participate in the
Bradley University Peoria, IL 61625 teaching of required core courses and be able to
Assistant Professor e-mail: jawalter@bradley.edu develop elective courses in the candidate’s area
Department of Computer Science and of expertise. Bucknell is a highly selective pri-
Information Systems Initial screening will begin on February 1, vate university emphasizing quality undergradu-
2012 and will continue until the position is filled. ate education in engineering and in liberal arts
The CS&IS Department invites applications for a Employment with Bradley University is con- and sciences. The B.S. programs in computer
tenure track Assistant Professor position starting tingent upon satisfactory completion of a crimi- science are ABET accredited. The computing
in August 2012. The position requires that a PhD nal background check. Bradley University is an environment is Linux/Unix-based. More infor-
in Computer Science or a closely related field is EEO/AA Employer. mation about the department can be found at:
completed prior to the start date. We will consid- Bradley University is an Equal Opportunity http://www.bucknell.edu/ComputerScience/ Ap-
er strong candidates from all areas of computer Employer. The administration, faculty and staff plications will be considered as received and re-
science but applicants with expertise in mobile are committed to attracting qualified candidates cruiting will continue until the position is filled.
computing, architecture, security, and gaming from groups currently underrepresented on our Candidates are asked to submit a cover letter,
are especially encouraged. campus. CV, a statement of teaching philosophy and re-
The successful candidate is expected to search interests, and the contact information for
develop into an outstanding teacher at the un- three references. Please submit your application
dergraduate and graduate levels, establish and Bucknell University to http://jobs.bucknell.edu/ by searching for the
maintain an active and rigorous research pro- Assistant Professor of Computer Science “Computer Science Faculty Position”. Please di-
gram, actively seek funding in support of his/her rect any questions to Professor Stephen Guattery
research activities, and involve students in his/ Applications are invited for a tenure track as- of the Computer Science Department at guat-
her research activities. The teaching load is three sistant professor position in computer science, tery@bucknell.edu. Bucknell University values a
courses per semester. Some teaching experience preferably at the entry-level (four or fewer years diverse college community and is committed to
is preferred. of full-time teaching experience with a recent excellence through diversity in its faculty, staff

CALL FOR PhD STUDENTS


The Graduate School at IST Austria invites applicants from all countries to its PhD program. IST Austria is a new
CAMPUS
VISIT DAY
November 26,
institution located on the outskirts of Vienna dedicated to cutting-edge basic research in the natural sciences and rela- 2011
ted disciplines. The language at the Institute and the Graduate School is English.
The PhD program combines advanced coursework and research, with a focus on Biology, Computer Science, Neuro-
science, and interdisciplinary areas. IST Austria offers internationally competitive PhD salaries supporting 4-5 years of study.
Applicants must hold either a BS or MS degree or equivalent.
The Institute offers PhD students positions with the following faculty:
Nick Barton Evolutionary and Mathematical Biology Jozsef Csicsvari Systems Neuroscience
Jonathan P. Bollback Evolutionary Biology Peter Jonas Neuroscience
Sylvia Cremer Evolutionary and Behavioral Biology Gašper Tkačik Theoretical Biophysics and Neuroscience
Caroline Uhler Statistics Krishnendu Chatterjee Game Theory and Software Systems Theory
Tobias Bollenbach Biophysics and Systems Biology Herbert Edelsbrunner Algorithms, Geometry, and Topology
Cǎlin C. Guet Systems and Synthetic Biology Thomas A. Henzinger Software Systems Theory
Carl-Philipp Heisenberg Cell and Developmental Biology Vladimir Kolmogorov Computer Vision and Graph Algorithms
Harald Janovjak Molecular and Cellular Biophysics Christoph Lampert Computer Vision and Machine Learning
Daria Siekhaus Cell and Developmental Biology Krzysztof Pietrzak Cryptography
Michael Sixt Cell Biology and Immunology Chris Wojtan Computer Graphics
Additional faculty members will be announced on the IST website www.ist.ac.at.

For further information and access to the online application please consult www.ist.ac.at/gradschool.
For inquiries, please contact gradschool@ist.ac.at. For students wishing to enter the program in the
fall of 2012, the deadline for applications is January 15, 2012.
IST Austria values diversity and is committed to equality. Female students are encouraged to apply.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 105


careers

and students. An Equal Opportunity/Affirmative year) in computer science, in any area of special- Our MS program in SE currently enrolls about
Action Employer, Bucknell University especially ization, beginning September 1, 2012. 80 students, both full-time and part-time, with
welcomes applications from women and minor- Carleton is a highly selective liberal arts col- many employed at top SV companies. The program
ity candidates. lege with outstanding, enthusiastic students. We is project-based, team-oriented, and follows a learn-
seek an equally enthusiastic computer scientist ing-by-doing approach, with small seminar-style
committed to excellence in teaching, curriculum classes. Faculty work directly as advisors to student
California State Polytechnic design, ongoing research, and undergraduate re- teams on their deliverables, teaching knowledge
University, Pomona search advising. We are particularly interested in and skills on a just-in-time basis. CMUSV is growing
Computer Science Department applicants who will strengthen the departmental its research activities, emphasizing mobility, net-
http://www.csupomona.edu/~cs/ commitment to students from underrepresented working, and security. We are building up research
groups. To learn more about the position or to in agile methods, cloud computing, and mobile and
The Computer Science Department invites ap- apply, visit jobs.carleton.edu. Applications com- embedded system development. CMUSV also offers
plications for a tenure-track position at the rank pleted by December 16, 2011 will receive full con- PhD degrees through CMU’s ECE Dept.
of Assistant Professor to begin Fall 2012. We are sideration. The ideal candidate for this position will have
particularly interested in candidates with special- Carleton College does not discriminate in sufficient experience to justify an appointment to
ization in Secure Software Engineering, although providing employment. Please view the descrip- a senior faculty position. We will give strong con-
candidates in all areas of Computer Science will tion for this position at jobs.carleton.edu for Car- sideration to candidates who have spent most of
be considered, and are encouraged to apply. Cal leton’s full anti-discrimination statement. their professional career in industry, and are now
Poly Pomona is 30 miles east of L.A. and is one of seeking an academic position. Please provide us
23 campuses in the California State University. with your curriculum vita with publication list, a
The department offers an ABET-accredited B.S. Carnegie Mellon University, Silicon statement about your practical experience, and
program and an M.S. program. Valley (CMUSV) letters from five references. Starting date is Au-
Qualifications: Possess, or complete by Sep- Senior Faculty gust, 2012, or sooner.
tember 2012, a Ph.D. in Computer Science or Apply for this position (#8652) at http://
closely related area. Demonstrate strong Eng- We are seeking applicants with both industrial sv.cmu.edu/se-positions
lish communication skills, a commitment to experience and traditional academic credentials More information on CMUSV may be found
actively engage in the teaching, research, and to fill a senior position in our growing software at http://sv.cmu.edu. Direct queries to SeniorS-
curricular development activities of the depart- engineering (SE) program. The faculty member Esearch@sv.cmu.edu
ment at both undergraduate and graduate lev- will play key roles in expanding our software engi- Carnegie Mellon University does not discrim-
els, and ability to work with a diverse student neering research program, in teaching, in recruit- inate in admission, employment, or administra-
body and multicultural constituencies. Ability ing, and in overall campus leadership. Familiar- tion of its programs or activities on the basis of
to teach a broad range of courses, and to articu- ity with software development practices used in race, color, national origin, sex, handicap or dis-
late complex subject matter to students at all Silicon Valley (SV) is a significant advantage. We ability, age, sexual orientation, gender identity,
educational levels. First consideration will be place a strong emphasis on written and spoken religion, creed, ancestry, belief, veteran status, or
given to completed applications received no communication. genetic information.
later than January 9, 2012.
Contact: Faculty Search Committee, Com-
puter Science Department, Cal Poly Pomona,
Pomona, CA 91768. Email: cs@csupomona.edu. InstItute of
Cal Poly Pomona is an Equal Opportunity, Af- InformatIon and
firmative Action Employer. CommunICatIon
Position announcement available at: http:// teChnology
academic.csupomona.edu/faculty/positions.aspx
Lawful authorization to work in US required Ahmedabad University-AU
for hiring. A State Private University, Gujarat, India
ict.ahduni.edu.in
AU is in the process of establishing a new Institute of ICT by July 2012.
California State University, Chico Institute of ICT, AU invites applications for faculty positions at the level of
Assistant Professor Director, Profes¬sors, and Associate/Assistant Professors. Academicians
committed to teaching and research, excited by institution building are
California State University, Chico, Dept. of Com- invited to participate in our vi¬sion to establish a leading new institute of ICT.
puter Science has two full time tenure track Asst. The institute aims to redefine ICT education -where high powered
Prof positions, starting 8/2012. EOE Employer. technologi¬cal innovations will complement sustainable growth in various
Please see the full Announcement at: http://csci. sectors such as healthcare, energy, finance. We are effectively looking at
redefining how ICT education is handled in the country today and bolster
ecst.csuchico.edu/jobs
our efforts by training the students not only in fundamentals of computing/
engineering but also in handling multidisciplinary product development,
team work and real-time problem solving. To build a high quality research
California State University, Fullerton driven academic program (B Tech, M Tech and PhD), the school will
Assistant Professor leverage its multi-disciplin¬ary position as one of the four schools planned
under the umbrella of Institute of Science and Technology, AU: Engineering,
Life sciences, Physical sciences & ICT. AU is also engaged in developing a
The Department of Computer Science invites ap- network cluster with leading institutes in a variety of disciplines.
plications for a tenure-track position at the Assis-
Candidates, from any branch of ICT or related cross disciplinary fields such
tant Professor level starting Fall 2012. For a com- as computer science, electrical engineering, bio informatics, and maths,
plete description of the department, the position, physics may apply. For all positions, a PhD in related field, significant
desired specialization and other qualifications, demonstrated research record commensurate with the level of the position
please visit http://diversity.fullerton.edu/. being applied for are required. Being a State University (privately funded)
we offer more attractive remuneration package as compared to other
institutions in the country. Faculty will be encouraged and supported to
establish research labs and get involved in institution building, innovate,
Carleton College teach, consult and conduct collaborative research.
Assistant Professor of Computer Science
Applications should consist of a cover letter, CV, a research statement,
names and contact information of at least 3 references, and URLs’ /Pdf of
Carleton College invites applications for a one- at least 3-5 papers. Submit CV and queries to: ict@ahduni.edu.in
year position (potentially renewable for a second

106 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Carnegie Mellon University tive action employer with particular interest in iden- The successful candidate will be expected to
Tepper School of Business tifying women and minority applicants for faculty teach at both the undergraduate and graduate
Tenure-track faculty opening in Information positions. levels, direct graduate research, and develop a
Systems, starting in September 2012 strong externally funded research program.
An earned Ph.D. in Computer Science or a
We invite applications from individuals who are Colgate University closely related field by the time of appointment
interested in how information technology will Department of Computer Science is required. Candidates must provide evidence
transform businesses, markets, and economic of teaching competence in computer science and
processes. We have a particular interest in ap- The Department of Computer Science invites ap- should demonstrate the potential to develop a
plicants who can access and make sense of the plications for a tenure-stream position in com- self-sustaining research program.
large volume of data that is available today and puter science at the rank of Assistant Professor, For the complete job announcement and
help design new systems and improve the design beginning fall semester 2012. Completion of a directions on how to apply, visit: http://inside.
of existing systems. Applicant should send a cur- Ph.D. in computer science is expected prior to or mines.edu/HR-Academic-Faculty
rent curriculum vita, evidence of research such shortly after the date of hire. Preference will be
as publications, working papers, or dissertation given to applicants with expertise in either Data
proposal to: isgroup@andrew.cmu.edu and three Mining or Graphics though applicants in other Colorado State University
recommendation letters (via the Postal Service) areas will also be considered. A letter of applica- Department of Computer Science
to Mr. Phillip Conley, Information Systems Fac- tion, curriculum vitae and three letters of recom- Assistant Professor
ulty Recruiting, Carnegie Mellon University, Tep- mendation should be submitted through https://
per School of Business, Room 369 Posner Hall, academicjobsonline.org/ajo/jobs/1132 where you Colorado State University has an opening for
5000 Forbes Avenue, Pittsburgh, PA 15213-3890 will also find a full ad description. one or more tenure-track assistant professor
(Phone: 412-268-6212). In order to ensure full Review of applications will begin January 10, positions in Computer Science, beginning fall
consideration, completed applications must be 2012, and continue until the position is filled. 2012. Areas of interest include computational
received by December 31, 2011. Colgate University is an EO/AA Employer; wom- biology, parallel computing, programming
Applicants may hold a doctoral degree in any en and minorities are especially encouraged to languages that focus on parallel programming
business discipline, Information Systems, Com- apply. models, mobile computing and HCI. For more
puter Science, Economics or Statistics. We are information go to: http://www.cs.colostate.
primarily seeking candidates at the Assistant Pro- edu. Applications must be received by Janu-
fessor level. Applicants should have completed Colorado School of Mines ary 9, 2012 at http://www.natsci.colostate.edu/
or be nearing completion of a Ph.D., and should Assistant Professor employment/compsci/. Complete applications
demonstrate potential of excellence in research of semi-finalists will be available to department
and teaching. Teaching assignments encompass Applications are invited for an anticipated ten- faculty for review.
BS, Masters, and Ph.D. programs. ure-track faculty position in data mining, data CSU is an EO/EA/AA employer. Colorado State
management, machine learning, or high perfor- University conducts background checks on all fi-
Carnegie Mellon is an equal opportunity, affirma- mance computing. nal candidates.

IST AuSTrIA IS lookIng for


Professors and
Assistant Professors
IST Austria (Institute of Science and Technology Austria) invites applications for
Professors and Assistant Professors in all fields of the natural and mathematical
sciences and related disciplines. Outstanding scientists in software systems
(operating, distributed, database systems) are especially encouraged to apply.
The Institute, which is situated on the outskirts of Vienna, was established by the Austrian
government with a focus on basic research. The campus opened in 2009 and is expected to
grow to 45 research groups and over 500 employees by 2016. IST Austria is entitled to award
PhD degrees and includes an English-language graduate school. It aims to achieve an inter-
national mix of scientists and chooses them solely on the basis of their individual excellence
and potential contribution to research.
The Institute recruits tenured and tenure-track leaders of independent research groups.
The successful candidates will receive a substantial annual research budget but are expected
to also apply for external research grants.

For further information and access to the online application material, please consult:
www.ist.ac.at/professor-applications
Deadline for receiving Assistant Professor applications: January 15, 2012
IST Austria values diversity and is committed to equality. Female researchers are encouraged to apply.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 107


careers

Lane Department of Computer Science & Electrical Engineering


College of Engineering & Mineral Resources
Tenure-Track Position in Cyber Security Tenure-Track Position in
and Information Assurance Biomedical Imaging
The Lane Department of Computer Science and Electrical Engineering The Lane Department of Computer Science and Electrical Engineering
(LCSEE) invites applications for a tenure track faculty position in (LCSEE) invites applications for a tenure track faculty position in the
cyber security and information assurance. The position is open at area of biomedical imaging. The position is open at the assistant
the assistant or associate professor level. Areas of interest include, or associate professor level. Areas of interest include, but are not
but are not limited to applications, systems, and network security; limited to imaging with application to medical systems, visualization
wireless and mobile security; digital forensics; secure programming; and graphics, medical informatics, data fusion, feature extraction,
and software and information assurance. An earned Ph.D. in computer network biology, and biomedical applications of artificial intelligence.
science or a closely related discipline is required. Successful An earned Ph.D. in computer science or a closely related discipline is
candidates are expected to develop a vigorous extramurally funded required. Successful candidates are expected to develop a vigorous
research program, build effective collaborations, and demonstrate extramurally funded research program, build effective collaborations,
commitment to teaching excellence. The rank of initial appointment and demonstrate commitment to teaching excellence. The successful
will be commensurate with experience and accomplishments. candidate should be qualified to teach core undergraduate courses
in the field of Computer Science such as programming languages,
West Virginia University (www.wvu.edu) is a comprehensive land operating systems, and computer organization, in addition to developing
grant research institution enrolling 30,000 students in 113 degrees courses in their area of expertise. The rank of initial appointment will
programs, including engineering and health sciences. WVU has a be commensurate with experience and accomplishments.
Carnegie classification of Research – High. The Lane Department
(www.csee.wvu.edu) has 35 tenure-track faculty members, 350 West Virginia University (www.wvu.edu) is a comprehensive land grant
undergraduate students, and 250 graduate students. It offers BS research institution enrolling 30,000 students in 113 degrees programs,
degrees in Computer Science, Computer Engineering, Electrical including engineering and health sciences. WVU has a Carnegie
Engineering, and Biometric Systems; MS degrees in Computer classification of Research – High. The Lane Department (www.csee.
Science, Software Engineering, and Electrical Engineering; and Ph.D. wvu.edu) has 35 tenure-track faculty members, 350 undergraduate
degrees in Computer Science, Computer Engineering and Electrical students, and 250 graduate students. It offers BS degrees in Computer
Engineering. It also offers Graduate Certificates in Computer Forensics, Science, Computer Engineering, Electrical Engineering, and Biometric
Biometrics and Information Assurance, and Software Engineering. In Systems; MS degrees in Computer Science, Software Engineering,
addition, WVU is recognized by the NSA/DHS as a Center for Academic and Electrical Engineering; and Ph.D. degrees in Computer Science,
Excellence in Information Assurance Education, as well as a Center Computer Engineering and Electrical Engineering. The Robert C. Byrd
for Academic Excellence in Information Assurance Research. The Health Science Center at WVU has a comprehensive medical school
Department conducts approximately $6 million annually in externally and offers opportunities for collaboration with nationally recognized
sponsored research, with major research activities in the areas of Centers in Neuroscience, Advanced Imaging, Cancer Research, Low
biometric identification, nanotechnology, power systems, software Vision Research, and Rural Medicine. The Department conducts
and systems engineering, and wireless networks. Strong opportunities approximately $6 million annually in externally sponsored research,
exist for building collaborative partnerships with nearby federal with major research activities in the areas of biometric identification,
research facilities, including the Department of Defense, Department nanotechnology, power systems, software and systems engineering,
of Energy, FBI, and NASA. and wireless networks. Strong opportunities exist for building
collaborative partnerships with nearby federal research facilities,
To apply for this position, interested candidates should submit a including the Department of Energy, DOD, FBI, and NASA.
letter of application, curriculum vitae, research statement, statement
of teaching philosophy, and contact information for at least three To apply for this position, interested candidates should submit a
technical references (as a single PDF document) to: CEMR-LCSEE- letter of application, curriculum vitae, research statement, statement
Search@mail.wvu.edu. Please include “Cyber security” as a subject of teaching philosophy, and contact information for at least three
line. Applications will be processed until the position is filled but those technical references (as a single PDF document) to CEMR-LCSEE-
received by January 10, 2012 will receive full consideration. For Search@mail.wvu.edu. Please include “Biomedical Imaging” as a
further information, contact Dr. Katerina Goseva-Popstojanova, Search subject line. Applications will be processed until the position is filled
Chair, at katerina.goseva@mail.wvu.edu (queries only). but those received by Jan. 10, 2012 will receive full consideration. For
further information, contact Dr. Don Adjeroh, Search Chair, at Donald.
adjeroh@mail.wvu.edu (queries only).

West Virginia University is an affirmative action, equal opportunity employer dedicated to building a culturally diverse and
pluralistic faculty and staff committed to teaching and working in a multicultural environment. West Virginia University is
the Recipient of an NSF ADVANCE Award for Gender Equity. Applications are strongly encouraged from women, minorities,
individuals with disabilities and covered veterans. Dual career couples are also encouraged to apply.

108 commun ications of t h e ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Columbia University tion, and Animation. Applicants whose work con- Cornell University
Department of Computer Science tributes to CyberBioPhysical Systems, where the Assistant Professor
Open-rank Faculty Position biological, physical, and digital worlds fuse, are
in Computer Science also encouraged to apply. The SIBLEY SCHOOL OF MECHANICAL AND
Candidates must have a Ph.D. or its profes- AEROSPACE ENGINEERING, CORNELL UNIVER-
Columbia Engineering’s Department of Com- sional equivalent by the starting date of the ap- SITY, Ithaca, NY, invites applications for a fac-
puter Science at Columbia University in New York pointment. Applicants for this position at the ulty position in DIGITAL MANUFACTURING and
City invites applications for tenured or tenure- Assistant Professor and Associate Professors DIGITAL DESIGN. Areas of particular interest in-
track faculty positions. One or more appoint- without tenure must have the potential to do clude additive manufacturing processes, models,
ments at the assistant professor, associate profes- pioneering research and to teach effectively. Ap- and materials; on-demand, autonomous, flexible
sor and full professor, will be considered. plicants for this position at the tenured level manufacturing; programmable/smart/electronic
Columbia Engineering’s strategic theme ar- (Associate or Full Professor) must have a dem- materials; energy-efficient manufacturing; bio-
eas are Health, Information and Sustainability, onstrated record of outstanding research accom- printing, micro-manufacturing; computational
and the successful candidate should contribute plishments, excellent teaching credentials and design automation and optimization. Details at,
to the advancement of the department in these established leadership in the field. http://www.mae.cornell.edu/.
areas by developing an externally funded re- Candidates must hold a doctorate in an ap-
search program, being a thought leader in the Candidates should apply online at: propriate field and are expected to establish an
profession, contributing to the undergraduate academicjobs.columbia.edu/applicants/ outstanding, funded research program as well as
and graduate educational mission of the De- Central?quickFind=55402 contribute fully to both undergraduate and grad-
partment and providing active service to pro- uate instruction.
fessional societies The successful candidate is and should submit electronically the following: Cornell University is an equal opportunity, af-
expected to establish multidisciplinary research curriculum-vitae including a publication list, firmative action educator and employer.
and educational collaborations with academic a description of research accomplishments, a
departments and units across Columbia Uni- statement of research/teaching interests and
versity. The Department is especially interested plans, contact information for three people who Dartmouth College
in qualified candidates who can contribute, can provide letters of recommendation, and up to Department of Computer Science
through their research, teaching, and/or service, three pre/reprints of scholarly work. The position Roth Family Distinguished Professorship
to the diversity and excellence of the academic will close no sooner than 12/31/2011, and will re-
community. main open until filled. The Department of Computer Science at Dart-
Applications are specifically sought in any of Applicants can consult www.cs.columbia.edu mouth College invites applications for the
the areas that fall under the umbrella of Com- for more information about the department. inaugural Roth Family Distinguished Profes-
puter Systems, Software, Artificial Intelligence, sorship. We seek candidates with a strong aca-
Theory, and Computational Biology, with partic- Columbia is an affirmative action/ demic or industry track record in the general
ular emphasis on, but not limited to: Computer equal opportunity employer with a strong area of Digital Arts (including, but not limited
Graphics, Human-Computer Interaction, Simula- commitment to the quality of faculty life. to, Computer Graphics, Computer Vision, Vi-
sualization, Human-Computer Interaction,
Design & Media Arts). Candidates at the level of
full professor or senior-level associate profes-
sor will be considered.
Dartmouth is home to a growing program in
the Digital Arts with affiliated faculty and students
in Computer Science, English, Film & Media Stud-
ies, Mathematics, Music, Psychology, Studio Art,
and Theater. In the coming years we expect con-
tinued investment in the Digital Arts. The Roth
Family Distinguished Professor will help shape
what is expected to be a leading undergraduate
Advertising in Career Opportunities and graduate program in Digital Arts.
The Computer Science department (www.
How to Submit a Classified Line Ad: Send an e-mail to cs.dartmouth.edu) is home to 17 tenured and
acmmediasales@acm.org. Please include text, and indicate the issue/or tenure-track faculty members whose research en-
issues where the ad will appear, and a contact name and number. compasses the areas of digital arts, graphics, vi-
sion, algorithms, theory, systems, security, robot-
Estimates: An insertion order will then be e-mailed back to you. The ad ics, and computational biology. The Computer
will by typeset according to CACM guidelines. NO PROOFS can be sent. Science department is in the School of Arts and
Classified line ads are NOT commissionable. Sciences, and it has strong Ph.D. and M.S. pro-
Rates: $325.00 for six lines of text, 40 characters per line. $32.50 for each grams and outstanding undergraduate majors.
The department is affiliated with Dartmouth’s
additional line after the first six. The MINIMUM is six lines.
M.D.-Ph.D. program and has strong collabora-
Deadlines: 20th of the month/2 months prior to issue date. For latest tions with the Tuck School of Business and Dart-
deadline info, please contact: mouth Medical School.
acmmediasales@acm.org Dartmouth College is located in Hanover, New
Hampshire. Dartmouth has a beautiful, historic
Career Opportunities Online: Classified and recruitment display ads campus, located in a scenic area on the Connecti-
receive a free duplicate listing on our website at: cut River. Recreational opportunities abound in
http://jobs.acm.org all four seasons. Dartmouth hosts an annual film
festival, as well as renowned musical and theatri-
Ads are listed for a period of 30 days. cal performers. Convenient public transportation
For More Information Contact: to Boston and New York is available. Airports with
ACM Media Sales commercial service are located 15 minutes away
at 212-626-0686 or (Lebanon, New Hampshire) and 75 minutes away
acmmediasales@acm.org (Manchester-Boston Regional Airport).
Applicants are invited to send their CV, re-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 109


careers

110 commun ications of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


search statement, teaching statement, and Florida International University (FIU), the mester, Chair Search Committee, recruitment@
names of at least four references. All mate- state university of Florida in Miami, is ranked cs.fsu.edu or to Prof. Robert van Engelen, Depart-
rial and inquiries should be sent to roth2012@ by the Carnegie Foundation as a comprehensive ment Chair, chair@cs.fsu.edu.
cs.dartmouth.edu. Application review will start doctoral research university with high research The Florida State University is a Public Re-
on January 15, 2011. activity. The School of Computing and Informa- cords Agency and an Equal Opportunity/Access/
Applicants should arrange to have letters tion Sciences (SCIS) is a rapidly growing pro- Affirmative Action employer, committed to diver-
of recommendation sent directly by the recom- gram of excellence at the University, with 31 fac- sity in hiring.
mender, either by email or physical mail, to: ulty members and 1,400 students, including 65
Roth Family Professorship Search Ph.D. students. SCIS offers B.S., M.S., and Ph.D.
Department of Computer Science degrees in Computer Science, an M.S. degree in George Mason University
Dartmouth College Telecommunications and Networking, and B.S., Department of Computer Science
6211 Sudikoff Laboratory B.A., and M.S. degrees in Information Technol- Assistant Professor
Hanover, NH 03755 ogy. SCIS has received approximately $12.6M in
roth2012@cs.dartmouth.edu the last three years in external research funding, The Department of Computer Science at George
has six research centers/clusters with first-class Mason University invites applications for a ten-
Dartmouth College is an equal opportunity/ computing infrastructure and support, and en- ure-track faculty position at the rank of Assistant
affirmative action employer and encourages ap- joys broad and dynamic industry and interna- Professor beginning Fall 2012.
plications from women and members of minority tional partnerships. We are seeking a faculty member who can es-
groups. tablish strong research and teaching programs
HOW TO APPLY: in the area of computer game design. Applicants
Applications, including a letter of interest, con- must have a research focus in an area of com-
Dartmouth College tact information, curriculum vitae, and the names puter games technology - for example, in artifi-
Assistant Professor of at least three references, should be submitted cial intelligence, multi-agent systems, computer
directly to the FIU J.O.B.S Link website at https:// graphics, real-time animation, simulation and
We invite applications for tenure-track faculty as- www.fiujobs.org; refer to Position # 33334. The modeling, networked and distributed systems,
sistant professor of Computer Science, in the ar- application review process will begin on January or software engineering, as applied to computer
eas of systems, computer architecture (especially 16, 2012, and will continue until the position is games. Minimum qualifications include a Ph.D.
with applications to computer graphics), network- filled. Further information can be obtained from in Computer Science or a related field, demon-
ing, or human-computer interaction. For detailed the School website http://www.cis.fiu.edu, or by e- strated potential for excellence and productivity
information, see www.cs.dartmouth.edu. mail to recruit@cis.fiu.edu. in research, and a commitment to high quality
teaching.
FIU is a member of the State University System of The department currently offers a graduate
Florida International University (FIU) Florida and is an Equal Opportunity, Equal Access certificate in Computer Games Technology and
School of Computing and Information Sciences Affirmative Action Employer. a concentration in Computer Game Design at
Multiple tenure-track & tenured faculty the undergraduate level. The Computer Game
Positions at all levels Design concentration is offered in collaboration
Florida State University with faculty in the College of Visual and Per-
FIU is a multi-campus public research university Assistant Professor forming Arts at Mason. For more information on
located in Miami, a vibrant, international city. Tenure-Track Assistant Professor Positions these and other programs offered by the depart-
FIU offers more than 180 baccalaureate, masters, ment, visit our Web site: http://cs.gmu.edu/
professional and doctoral degree programs to The Department of Computer Science at the Flor- The department has over 40 faculty members
over 42,000 students. As one of South Florida’s ida State University invites applications for mul- with wide-ranging research interests including
anchor institutions, FIU is worlds ahead in its lo- tiple tenure-track Assistant Professor positions artificial intelligence, algorithms, computer
cal and global engagement and is committed to to begin August 15, 2012. Positions are 9-mo, graphics, computer vision, databases, data min-
finding solutions to the most challenging prob- full-time, tenure-track, and benefits eligible. We ing, distributed virtual environments, expert
lems of our times. encourage strong applicants in all areas of Com- systems, human computer interaction, parallel
The School of Computing and Information puter Science to apply. Preference may be given and distributed systems, real-time systems, ro-
Sciences seeks exceptionally qualified candi- to applicants with research experience in the ar- botics, security, software engineering, and wire-
dates for multiple tenure-track and tenured fac- eas of Databases and Security. Applicants should less and mobile computing.
ulty positions at all levels. Outstanding candi- hold a PhD in Computer Science or closely related George Mason University is located in Fair-
dates are sought in areas of bio/medical/health field, and have excellent research and teaching fax, Virginia, a suburb of Washington, DC, and
informatics, computer architecture, computer accomplishments/potential. The department of- home to one of the highest concentrations of
graphics, large-scale data management, search, fers degrees at the BS, MS, and PhD levels. The high-tech firms in the nation. There are excel-
and visualization, human-computer interaction department is an NSA Center of Academic Excel- lent opportunities for interaction with govern-
(HCI), networking, programming languages, lence in Information Assurance Education (CAE/ ment agencies and industry, including many
robotics and game theory, and telecommunica- IAE) and Research (CAE-R). game and serious game development compa-
tion. Exceptional candidates in other areas will FSU is classified as a Carnegie Research I nies. In particular, the Washington DC region
be considered as well. Preference will be given to university. Its primary role is to serve as a center is fast becoming a hub for the serious games
candidates who will enhance or complement our for advanced graduate and professional studies industry. Fairfax is consistently rated as being
existing research. while emphasizing research and providing excel- among the best places to live in the country, and
Ideal candidates for junior positions should lence in undergraduate education. The depart- has an outstanding local public school system.
have a record of exceptional research in their ear- ment has experienced rapid growth in the major For full consideration please submit applica-
ly careers. Candidates for senior positions must and new degree programs. Further information tion and application materials on-line at http://
have an active and proven record of excellence in can be found at jobs.gmu.edu (position number F9542Z). To
funded research, publications, and professional http://www.cs.fsu.edu apply, you will need a statement of professional
service, as well as a demonstrated ability to devel- goals including your perspective on teaching
op and lead collaborative research projects. In ad- Screening will begin January 1, 2012 and will and research, a complete C.V. with publications,
dition to developing or expanding a high-quality continue until the positions are filled. Please and the names of four references. The review
research program, all successful applicants must apply online with curriculum vitae, statements of applications will begin immediately and will
be committed to excellence in teaching at both of teaching and research philosophy, and the continue until the position is filled. GMU is an
graduate and undergraduate levels. An earned names of five references, at http://www.cs.fsu. equal opportunity/affirmative action employer.
Ph.D. in Computer Science or related disciplines edu/positions/apply.html Women and minorities are strongly encouraged
is required. Questions can be e-mailed to Prof. Mike Bur- to apply.

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 111


careers

The Institute for Interdisciplinary field, with strong publication record. pointment will be at the assistant or untenured
Information Sciences (IIIS) KAUST offers: Very attractive salary and ben- associate professor level. In special cases, a se-
Tenure-track Assistant/Associate/Full Professor efits; generous research funding; state-of-the-art nior faculty appointment may be possible. Fac-
research facilities, including one of the fastest ulty duties include teaching at the graduate and
IIIS invites applications from highly-qualified supercomputers in the world; collaboration with undergraduate levels, research, and supervision
candidates in areas including (but not limited top institutions such as Stanford, Texas A&M, of student research. We will consider candidates
to) Computer Systems, Algorithms and Complex- IBM Watson, etc. with backgrounds and interests in any area of
ity, Machine Learning, Multimedia, Databases, KAUST is an international graduate-only re- electrical engineering and computer science. Fac-
Computer Networks, Wireless Sensor Networks, search university located on the coast of the Red ulty appointments will commence after comple-
Information Security, Web Technologies, Energy- Sea, near Jeddah, Saudi Arabia. All activities of the tion of a doctoral degree.
Efficient Computing, Computational Finance, University are conducted on the basis of equality, Candidates must register with the EECS
Quantum Information, Computational Biology. without regard to race, color, religion or gender. search website at https://eecs-search.eecs.mit.
Positions at Assistant/Associate/Full Professor Further information can be found at: http:// edu, and must submit application materials elec-
levels are available. cloud.kaust.edu.sa tronically to this website. Candidate applications
Apply for this job: For enquiries, please contact Dr. Panos Kal- should include a description of professional in-
Email: iiisdean@mail.tsinghua.edu.cn nis: panos.kalnis@kaust.edu.sa terests and goals in both teaching and research.
Tel: +86-01-62789157 Each application should include a curriculum
vita and the names and addresses of three or
Lawrence Technological University more individuals who will provide letters of rec-
Iowa State University Assistant Professor of Computer Science ommendation. Letter writers should submit their
Software Engineering Program letters directly to MIT, preferably on the website
Tenure-track or tenured faculty position For appointment in August 2012. The ideal can- or by mailing to the address below. Please submit
didate will have a Ph.D. degree in computer sci- a complete application by December 15, 2011.
The Software Engineering Program at Iowa State ence, will have experience with intelligent robotic Send all materials not submitted on the web-
University, Ames, IA, has an immediate opening systems, be primarily committed to the develop- site to:
for a tenure-track or tenured faculty position that ment of undergraduate and professional gradu- Professor Anantha Chandrakassan
will commence in August 2012. Appointments ate computer science students through teaching, Department Head, Electrical Engineering
will be considered at all experience levels. applied projects and scholarship, be able to work and Computer Science
Duties for the position will include under- effectively in interdisciplinary teams, and believe Massachusetts Institute of Technology
graduate and graduate education; mentoring and strongly in the value of both theory and applica- Room 38-401
engaging undergraduate as well as prospective tion. Applicants should email a cover letter, cur- 77 Massachusetts Avenue
students; developing and sustaining externally- riculum vitae, statement of teaching philosophy Cambridge, MA 02139
funded research; graduate student supervision and research interests, and three letters of recom-
and mentoring; and professional and institu- mendation. Computer Science Search Commit- M.I.T. is an equal opportunity/affirmative
tional service. tee; cssearch@ltu.edu action employer.
An earned Ph.D. or equivalent in software en-
gineering, computer science, computer engineer-
ing or a closely related field is required. For ap- Marist College Max Planck Institute for Software
pointment at the level of assistant professor, the Lecturer, Assistant or Associate Professor of Systems (MPI-SWS)
successful candidate must have demonstrated Computing Technology Tenure-track openings
potential to establish and maintain a productive
externally funded research program and poten- Marist College’s School of Computer Science and Applications are invited for tenure-track and
tial to excel in the classroom. Commensurate Mathematics invites applications for two faculty tenured faculty positions in all areas related to
experience and a proven track record will be ex- positions. Marist College is a highly selective, the study, design, and engineering of software
pected for appointment at a more senior level. independent, liberal arts institution located in systems. These areas include, but are not limited
The tenure home in either the Department of the historic Hudson River Valley, 60 miles north to, data and information management, program-
Computer Science or the Department of Electri- of New York City. Marist currently enrolls 4,200 ming systems, software verification, parallel, dis-
cal and Computer Engineering will be decided in traditional undergraduate, 950 graduate and 530 tributed and networked systems, and embedded
consultation with the successful candidate, with continuing education students. The College has systems, as well as cross-cutting areas like securi-
joint appointment in both departments. been recognized for excellence by U.S. News & ty, machine learning, usability, and social aspects
Apply for this job: World Report, The Princeton Review, Entrepre- of software systems. A doctoral degree in comput-
Contact Person: Sara K. Harris neur Magazine, and is noted for its leadership er science or related areas and an outstanding re-
Email Address: skharris@iastate.edu in the use of technology to enhance the teaching search record are required. Successful candidates
Phone: 515-294-1097 and learning process. are expected to build a team and pursue a highly
Fax: 515-294-3637 PhD in CS, IT, IS or closely related field pre- visible research agenda, both independently and
ferred; Master’s degrees with significant industry in collaboration with other groups. Senior candi-
Apply URL: http://www.se.iastate.edu/careers/ experience will be considered. Candidates with dates must have demonstrated leadership abili-
faculty-staff-openings/faculty-position expertise in software development, security, ap- ties and recognized international stature.
Candidates are subject to a background plied networking, business analytics, and man- MPI-SWS, founded in 2005, is part of a net-
check. ISU is an EO/AA employer. agement information systems is highly desirable. work of eighty Max Planck Institutes, Germany’s
To learn more or to apply, please visit http://jobs. premier basic research facilities. MPIs have an
marist.edu. Only online applications are accepted. established record of world class, foundational
King Abdullah University of Science research in the fields of medicine, biology, chem-
and Technology AN EQUAL OPPORTUNITY/AFFIRMATIVE istry, physics, technology and humanities. Since
Postdoc, Computer Science ACTION EMPLOYER 1948, MPI researchers have won 17 Nobel prizes.
MPI-SWS aspires to meet the highest standards of
The InfoCloud group at KAUST invites applica- excellence and international recognition with its
tions for PostDoc positions in all areas of Data- Massachusetts Institute of Technology research in software systems.
bases, Data Mining, Cloud Computing, Parallel/ Faculty Positions To this end, the institute offers a unique en-
Distributed Systems and High-performance vironment that combines the best aspects of a
Computing. The positions are for 1 to 3 years. The Department of Electrical Engineering and university department and a research laboratory:
An ideal candidate should have (or be expecting Computer Science (EECS) seeks candidates for a) Faculty receive generous base funding to
soon) a PhD in Computer Science or a related faculty positions starting in September 2012. Ap- build and lead a team of graduate students and

112 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


post-docs. They have full academic freedom and for excellence in research, undergraduate and interest are: software systems with an emphasis
publish their research results freely. graduate teaching and service. Effective oral and on operating systems; machine learning with
b) Faculty supervise doctoral theses, and have written communication skills. an emphasis on cross-disciplinary research and
the opportunity to teach graduate and undergrad- applications; and programming languages with
uate courses. Qualifications preferred but not required: an emphasis on parallel languages and parallel
c) Faculty are provided with outstanding tech- Experience or other evidence of potential for ex- processing. For all of these fields there is special
nical and administrative support facilities as well cellence in research in high performance com- interest in interdisciplinary work such as cyber-
as internationally competitive compensation puting, mobile and distributed computing. Secu- physical systems and predictive computational
packages. rity, networks or a related area. The capability for modeling and simulation.
MPI-SWS currently has 11 tenured and ten- teaching a variety of courses in computer science. The department has 32 tenure-track faculty
ure-track faculty, and is funded to support 17 representing major areas of computer science
faculty and about 100 doctoral and post-doctoral To Apply: see https://jobs.ndsu.edu/ and engineering. Fifteen members of our fac-
positions. Additional growth through outside ulty are recipients of the NSF Career Award. Two
funding is possible. We maintain an open, in- For additional information see http://cs.ndsu. faculty members have received the prestigious
ternational and diverse work environment and edu/positions.htm NSF PECASE Award. In recent years, our faculty
seek applications from outstanding researchers Review of applications begins January 15, received seven NSF ITR Grants, a $35M Network
regardless of national origin or citizenship. The 2012. Applications will be accepted until position Science Center Award, over $4.5M in computing
working language is English; knowledge of the is filled. and research infrastructure and instrumentation
German language is not required for a successful North Dakota State University is an Equal grants from NSF, eleven NSF Cyber Trust and Net-
career at the institute. Opportunity/AA employer. working awards, and several awards from DARPA,
The institute is located in Kaiserslautern and This position is exempt from North Dakota DOE, DTRA and DoD. There are state-of-the-art
Saarbruecken, in the tri-border area of Germany, Veterans’ Preference requirements. research labs for computer systems, computer vi-
France and Luxembourg. The area offers a high sion and robotics, Microsystems design and VLSI,
standard of living, beautiful surroundings and networking and security, high performance com-
easy access to major metropolitan areas in the Northeastern Illinois University puting, bioinformatics and virtual environments.
center of Europe, as well as a stimulating, com- Computer Science Department The department offers a graduate program with
petitive and collaborative work environment. In Assistant Professor over 40 Masters students and 167 Ph.D. students,
immediate proximity are the MPI for Informatics, and undergraduate programs in computer science
Saarland University, the Technical University of The Computer Science Department of North- and computer engineering. The university is com-
Kaiserslautern, the German Center for Artificial eastern Illinois University in Chicago invites in- mitted to growing the faculty ranks over the next
Intelligence (DFKI), and the Fraunhofer Insti- terested individuals to apply for a tenure-track several years and promoting interdisciplinary re-
tutes for Experimental Software Engineering and assistant professor position starting Fall 2012. search toward cyber-enabled discovery and design.
for Industrial Mathematics. PhD in Computer Science or closely related Penn State is a major research university and
Qualified candidates should apply online at field required. Preference will be given to candi- is ranked 3rd in the nation in industry-sponsored
http://www.mpi-sws.org/application. The review dates with interests in bioinformatics, software research. Computer science is ranked 6th in the
of applications will begin on January 3, 2012, and engineering, computer networks and secu- nation in research expenditures. U.S. News and
applicants are strongly encouraged to apply by rity, and emerging technologies. Strong candi- World Report consistently ranks Penn State’s Col-
that date; however, applications will continue to dates in other areas will be considered. AA/EOE lege of Engineering undergraduate and graduate
be accepted through January 2012. View complete job posting at: http://www.neiu. programs in the top 15 of the nation.
The institute is committed to increasing the edu/~compsc/cs_faculty_search.html The university is located in the beautiful col-
representation of minorities, women and individ- lege town of State College in the center of Penn-
uals with physical disabilities in Computer Sci- sylvania. State College has 40,000 inhabitants
ence. We particularly encourage such individuals Peking University and offers a variety of cultural and outdoor recre-
to apply. School of EECS ational activities nearby. The university offers out-
Tenure-track Faculty Positions - Center for standing events from collegiate sporting events
Energy-Efficient Computing and Applications to fine arts productions. Many major population
North Dakota State University centers on the east coast (New York, Philadelphia,
Assistant/Associate Professor The School of EECS at Peking University invites ap- Pittsburgh, Washington D.C., Baltimore) are only
plications for tenure-track positions in the areas a few hours drive away and convenient air services
Department of Computer Science at North Da- of energy efficient computing (including but not to several major hubs are operated by four major
kota State University seeks to fill a tenure-track limited to energy-efficient architectures, compila- airlines out of State College.
Assistant Professor position starting Fall of 2012. tion, communication, and system software) and Applicants should hold a Ph.D. in Computer
PhD required. NDSU offers degrees in Computer applications (such as mobile computing, wireless Science, Computer Engineering, or a closely relat-
Science and Software Engineering. Research and health, and cloud computing). These positions ed field and should be committed to excellence in
teaching excellence is expected. Normal teaching are associated with the Center for Energy-Effi- both research and teaching. Support will be pro-
loads are 3 courses per year. Salary is competitive. cient Computing and Applications (http://ceca. vided to the successful applicants for establish-
The Department has 18 Faculty in diverse ar- pku.edu.cn), which offers a new level of startup ing their research programs. We encourage dual
eas, approximately 180 graduate (PhD and MS) and compensation packages. Applications from career couples to apply. Applications should be
students and 240 BS/BA students. NDSU is a distinguished candidates at senior levels are also received by January 31, 2012 to receive full consid-
Carnegie research extensive class institution. encouraged. To apply, please email the resume, eration. To apply by electronic mail, send your re-
Fargo is a clean, growing metropolitan area statements of research and teaching, and at least sume (including curriculum vitae and the names
of 250,000 that consistently ranks very high in na- three names for references to Dr. Tao Wang (ceca_ and addresses of at least three references) as a pdf
tional quality-of-life surveys. We have low levels recruiting@pku.edu.cn). Applications received by file to recruiting@cse.psu.edu.
of crime and pollution, excellent schools, short January 20, 2012 will be given full consideration. Chair, Faculty Search Committee
commutes and proximity to the Minnesota lakes The Pennsylvania State University
country. The community has a symphony, an Department of Computer Science
opera, a domed stadium, a community theater, The Pennsylvania State University and Engineering
three universities and many other opportunities Faculty Position Box A
for advancement, recreation and entertainment. University Park, PA 16802-6106 USA
Applications are invited for several tenure-track
Minimum Qualifications: faculty positions at all ranks. Outstanding can- For more information about the Department
Ph.D in Computer Science or a closely related didates in all areas of Computer Science and En- of CSE at PSU, see http://www.cse.psu.edu. Click
area. Experience or other evidence of potential gineering will be considered. Areas of particular here to fill out an Affirmative Action Applicant

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 113


careers

Data Card. Our search number is 015-92. You Forty-eight faculty members direct research pro- particularly interested in synergy with CBIM and
MUST include this search number in order to grams in analysis of algorithms, bioinformatics, thus we’re excited about receiving applications
submit this form. databases, distributed and parallel computing, primarily in areas related to multimodal sensing,
Penn State is committed to affirmative action, graphics and visualization, information security, decision making under uncertainty, planning,
equal opportunity and the diversity of its workforce. machine learning, networking, programming lan- learning and novel designs for collaborative ro-
guages and compilers, scientific computing, and bots, co-robots, social robots, network-based ro-
software engineering. Information about the de- botics, underwater autonomous robots and het-
Princeton University partment and a detailed description of the open erogeneous swarms of robots. Rutgers University
Computer Science Department position are available at http://www.cs.purdue.edu. offers an exciting and multidisciplinary research
Tenure-Track Positions, Assistant Professor All applicants should hold a PhD in Computer environment and encourages collaborations be-
Science, or a closely related discipline, be commit- tween Computer Science and other disciplines.
The Department of Computer Science at Princeton ted to excellence in teaching, and have demon- Applicants for this research/teaching posi-
University invites applications for faculty positions strated potential for excellence in research. The tion must, at minimum, be in the process of com-
at the Assistant Professor level. We are accepting successful candidate will be expected to teach pleting a dissertation in Computer Science or a
applications in all areas of Computer Science. courses in computer science, conduct research in closely related field, and should show evidence
Applicants must demonstrate superior re- field of expertise and participate in other depart- of exceptional research promise, potential for de-
search and scholarship potential as well as teach- ment and university activities. Salary and benefits veloping an externally funded research program,
ing ability. A PhD in Computer Science or a re- are highly competitive. Applicants are strongly en- and commitment to quality advising and teach-
lated area is required. Successful candidates are couraged to apply online at https://hiring.science. ing at the graduate and undergraduate levels.
expected to pursue an active research program purdue.edu. Hard copy applications can be sent Hired candidates who have not defended their
and to contribute significantly to the teaching to: Faculty Search Chair, Department of Computer Ph.D. by September 2012 will be hired at the rank
programs of the department. Applicants should Science, 305 N. University Street, Purdue Univer- of Instructor, and must complete the Ph.D. by
include a CV and contact information for at least sity, West Lafayette, IN 47907. Review of applica- December 31, 2012 to be eligible for tenure-track
three people who can comment on the appli- tions will begin on November 10, 2011, and will title retroactive to start date. Senior applicants at
cant’s professional qualifications. continue until the position is filled. A background the Associate or Full Professor level will need to
There is no deadline, but review of applica- check will be required for employment in this po- have demonstrated significant funding, scholar-
tions will start in December 2011. Princeton Uni- sition. Purdue University is an Equal Opportunity/ ship, collaborative, and leadership abilities.
versity is an equal opportunity employer and com- Equal Access/Affirmative Action employer fully Applicants should go to http://www.cs.rutgers.
plies with applicable EEO and affirmative action committed to achieving a diverse workforce. edu/employment/ and submit their curriculum
regulations. You may apply online at: http://jobs. vitae, a research statement addressing both past
cs.princeton.edu/. Requisition Number: 0110422 work and future plans and a teaching statement
Rutgers, The State University along with three letters of recommendation. If
of New Jersey electronic submission is not possible, hard cop-
Princeton University Assistant Professor ies of the application materials may be sent to:
Computer Science Department Professor Dimitris Metaxas, Hiring Chair
PostDoctoral Research Associate The Department of Management Science and Computer Science Department
Information Systems of Rutgers Business School- Rutgers University
The Department of Computer Science at Princ- Newark and New Brunswick invites applications 110 Frelinghuysen Road
eton University is seeking applications for post- for a tenure-track position at the Assistant Profes- Piscataway, NJ 08854
doctoral or more senior research positions in sor rank to start in September 2012.
theoretical computer science. Candidates will be This position is focused in the area of infor- Applications should be received by January
affiliated with the Center for Computational In- mation systems and the candidate must be an 31st, 2012 for full consideration.
tractability (CCI) or the Princeton Center for The- active researcher and have a strong record of Rutgers subscribes to the value of academic
oretical Computer Science. Candidates should scholarly excellence. Special consideration will diversity and encourages applications from indi-
have a PhD in Computer Science, a related field, be given to candidates with knowledge in any of viduals with varied experiences, perspectives, and
or on track to finish by August 2012. the areas: data mining, machine learning, secu- backgrounds. Females, minorities, dual-career
Candidates affiliated with the CCI will have rity, data management and analytical methods couples, and persons with disabilities are encour-
visiting privileges at partner institutions NYU, related to business operations. aged to apply.
Rutgers University, and The Institute for Advanced A letter of application articulating the candi- Rutgers is an affirmative action/equal oppor-
Study. Review of candidates will begin on Jan 1, date’s fit (in terms of research and teaching) with tunity employer.
2012, and will continue until positions are filled. the position description, a curriculum vitae, and the
Applicants should submit a CV and research state- names and contact information of three persons
ment, and contact information for three refer- that can provide references should be sent electron- State University of New York
ences. Princeton University is an equal opportu- ically to Luz Kosar at: kosar@business.rutgers.edu. at Binghamton
nity employer and complies with applicable EEO Luz Kosar, Department of Computer Science
and affirmative action regulations. Apply to:http:// MSIS
jobs.princeton.edu/ Requisition# 0110698 Rutgers Business School - Applications are invited for a tenure-track Assis-
Newark and New Brunswick tant Professor starting Fall 2012. Our preferred
1 Washington Park # 1068 specializations are embedded systems, energy-
Purdue University Newark, New Jersey 07102-1895 aware computing and systems development. We
Computer Science Department have well-established BS (accredited), MS and
Faculty Position PhD programs, with over 60 full-time PhD stu-
Rutgers University dents. We offer a significantly reduced teaching
The Department of Computer Science at Purdue Department of Computer Science and the load for junior faculty for at least the first three
University invites applications for tenure-track Center for Computational Biomedicine, years. A new NSF supported industry-university
positions at the assistant professor level begin- Imaging and Modeling (CBIM) collaborative research center on energy-efficient
ning August 2012. Outstanding candidates in all Tenure Track Faculty Position in Robotics electronic systems offers an added venue for re-
areas of Computer Science and with a multi-dis- search and funding. Please submit a resume and
ciplinary focus are encouraged to apply. Specific The Rutgers University Department of Computer the names of three references at: http://bingham-
needs that have been identified include theory Science and the Center for Computational Bio- ton.interviewexchange.com
and software engineering. medicine, Imaging and Modeling (CBIM) seeks First consideration will be given to applica-
The Department of Computer Science offers a applicants at in robotics, for a tenure-track fac- tions received by January 10, 2012.
stimulating and nurturing academic environment. ulty position starting September 2012. We’re We are an EE/AA employer.

114 commun icat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Swarthmore College Candidates must have a Ph.D. in Computer Sci- (http://www.uclouvain.be/en-info.html). Courses
Computer Science Department ence or a related area. Our new colleague will teach are taught in French and English.
Tenure Track and Two Year Leave Position introductory and advanced courses in CS, advise The University is located in the new city of
first-year students and majors, and be willing to Louvain-la-Neuve, 25 kms southeast of Brussels,
Applications are invited for a tenure track assis- include undergraduates in research activities. He/ the capital of Belgium, in the heart of Europe.
tant professor position and for a two-year leave she will participate in ongoing development of the Applications must be posted before Decem-
replacement position at the assistant professor CS curriculum and will be active in research and ber 15th, 2011 on http://www.uclouvain.be/en-
level. Both positions begin Fall semester 2012. professional activities. The teaching load is 9 stu- 287770.html
Swarthmore College is a small, selective liberal dent contact hours (3 courses) per semester.
arts college located in a suburb of Philadelphia. Trinity is an independent, coeducational,
The Computer Science Department offers majors primarily undergraduate and residential univer- University of Arkansas
and minors in computer science at the under- sity founded in 1869. Enrollment is approximately Assistant Professor
graduate level. The teaching load is 3 courses and 2400 from all areas of the US and many foreign
3 labs per year. Applicants must have teaching countries. Trinity has highly selective admission The Computer Science and Computer Engineer-
experience and should be comfortable teaching standards and occupies an attractive campus over- ing (CSCE) Department at the University of Arkan-
a wide range of courses at the introductory and looking downtown San Antonio, a city rich in heri- sas seeks outstanding individuals to fill two ten-
intermediate level. tage and ethnic diversity. Trinity’s new Entrepre- ure-track, assistant professor positions. Successful
For the tenure track position, all areas of CS neurship program is housed in our Department. candidates will have an earned doctorate in com-
will be considered, but we are particularly inter- Perhaps unexpectedly, San Antonio has a par- puter science, computer engineering, or a related
ested in areas that complement our current offer- ticularly lively computing community! Indeed, field. We are seeking candidates with expertise in
ings, including databases, networking, and other San Antonio is second in the nation (only to the cyber security, data mining or machine learning,
systems and algorithms areas. Candidates should Washington, D.C. area) in cyber-security. While human-computer interaction, or pervasive/ubiqui-
additionally have a strong commitment to involv- the position is open to all areas, a new colleague tous computing. Outstanding candidates in other
ing undergraduates in their research. A Ph.D. in would find San Antonio to be a particularly good related areas are also encouraged to apply.
CS by or near the time of appointment is required. fit for information assurance & security, bioinfor- The CSCE department offers BS, MS, and PhD
For the leave replacement position, all ar- matics, or cloud computing. degrees in Computer Science and Computer En-
eas of CS will be considered. A Ph.D. in CS by or The Department, 42 years old, is well inte- gineering to approximately 250 undergraduate
near the time of appointment is preferred (ABD grated into campus life. Our students are among and 60 graduate students. The department has
required). the best in the University and secure excellent of- fourteen faculty with an annual external research
See http://cs.swarthmore.edu/jobs for appli- fers for jobs and graduate schools. Late in 2013 funding of $1.8 million and is located in a new
cation submission information and more details the Department is scheduled to move into a new 90,000 sq. ft. building. For more information
about both positions. We expect to begin inter- Center for Science and Innovation, housing all concerning the department see http://www.csce.
viewing by mid January 2012. Applications will be the STEM departments. Additional information uark.edu
accepted until the positions are filled. may be found at Application materials (cover letter, teaching
Swarthmore College has a strong institutional http://web.trinity.edu/x605.xml and research statement, curriculum vitae, and
commitment to excellence through diversity in its http://web.trinity.edu/x13585.xml the names and addresses of three references)
educational program and employment practices http://web.trinity.edu/ should be submitted to the CSCE Search Com-
and actively seeks and welcomes applications mittee electronically at http://www.csce.uark.
from candidates with exceptional qualifications, Please send a letter of application; CV; teach- edu/search or mailed to:
particularly those with demonstrable commit- ing and research interests; and names and con- Search Committee
ments to a more inclusive society and world. tact information of at least three references to CSCE Department, 504 JBHT
Paul Myers, Chair Fayetteville, AR 72701
Department of Computer Science search@csce.uark.edu
Toyota Technological Institute Chicago Trinity University
Computer Science at TTI Chicago 1 Trinity Place Completed applications received by January
Faculty Positions at All Levels San Antonio, Texas 78212-7200 15, 2012 will be assured full consideration. Late
(210) 999-7398 applications will be reviewed as necessary to fill
Toyota Technological Institute at Chicago (TTIC) pmyers@trinity.edu the position.
is a philanthropically endowed degree-granting The University of Arkansas is ranked as a
institute for computer science located on the Early review of applications will begin Decem- Carnegie Foundation RU/VH (research university,
University of Chicago campus. Applications are ber 12, 2011. very high research activity) university and is locat-
being accepted in all areas, but we are particu- Trinity University is an ed in Fayetteville, ranked as the 4th “Best Metro”
larly interested in machine learning, speech pro- Equal Opportunity Employer. in Forbes’ 2009 list of “Best Places for Business
cessing, computational linguistics, Computer and Careers,” and recognized as one of the “Top
vision, computational biology and optimization. 10 Best Cities to Live, Work and Play” by Kiplinger
Positions are available at all ranks, and we have a UCL in 2008.
large number of three year limited term positions Université Catholique de Louvain, Belgium The University of Arkansas is an equal oppor-
currently available. For all positions we require a Professor - Programming Systems tunity, affirmative action employer. Qualified un-
Ph.D. Degree or Ph.D. candidacy, with the degree derrepresented minority and women candidates
conferred prior to date of hire. Submit your appli- UCL invites applications for a full-time faculty are especially invited to apply. All applicants are
cation electronically at: position in Programming Systems. The success- subject to public disclosure under the Arkansas
http://ttic.uchicago.edu/facapp/ ful candidate will carry out research in the field of Freedom of Information Act and persons hired
programming systems, including but not limited must have proof of legal authority to work in the
Toyota Technological Institute at Chicago to programming languages, adaptive software, ar- United States.
is an Equal Opportunity Employer tificial intelligence, algorithmics, software analy-
sis and synthesis. Still, other areas of competence
will also be considered, since qualifications take University of Calgary
Trinity University precedence over specialization. Department of Computer Science
Assistant Professor Responsibilities include research, supervi- Assistant Professor Positions
sion of undergraduate and graduate students, as
Trinity University seeks applications for a tenure- well as PhD theses, submission and management The Department of Computer Science at the
track assistant professor position in the Department of research grants, and undergraduate/graduate University of Calgary seeks outstanding candi-
of Computer Science, to commence August 2012. teaching within the curricula in Computer Science dates for two tenure-track positions at the Assis-

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 115


careers

tant Professor level. Applicants from the areas of Professor, but experienced candidates with out- fessor (exceptional candidates at other ranks may
Database Management and Scientific Visualization standing credentials may be considered for Asso- also be considered). Candidates in the following
are of particular interest. Details for each position ciate or Full Professor. areas are especially encouraged to apply: Comput-
appear at: http://www.cpsc.ucalgary.ca/. Applicants Candidates interested in rigorous and in- er Security, Software Engineering, Machine Learn-
must possess a doctorate in Computer Science at novative approaches to the design and analysis ing and Computer Systems (Mobile/Ubiquitous
the time of appointment, and have a strong poten- of complex computing systems (from embed- computing and other experimental subareas).
tial to develop an excellent research record. The De- ded and cyberphysical to large-scale distributed The University of Illinois at Chicago (UIC)
partment is one of Canada’s leaders as evidenced systems) should apply. We seek candidates with ranks among the nation’s top 50 universities in
by our commitment to excellence in research and background in programming languages, con- federal research funding. It is the largest research
teaching. It has large undergraduate and graduate currency, security, formal methods, verification, university in the Chicago area, and is one of the
programs and extensive state-of-the-art comput- or system engineering. Preference will go to re- most diverse universities in the country. The Com-
ing facilities. Calgary is a multicultural city that is searchers whose work spans multiple areas. puter Science department has 27 tenure-track
the fastest growing city in Canada. Calgary enjoys a The positions will help shape the cooperation faculty representing major areas of computer sci-
moderate climate located beside the natural beau- with the Department of Computer Science on ence, and offers BS, MS and PhD degrees. Two of
ty of the Rocky Mountains. Further information computing systems. our faculty members are ACM Fellows and eight
about the Department is available at http://www. Candidates must have a Ph.D. in electrical members are recipients of NSF CAREER awards.
cpsc.ucalgary.ca/. Interested applicants should engineering, computer engineering, computer Our annual research funding has averaged $6.5M
send a CV, a concise description of their research science, or related discipline; they must have the over the last five years and includes major fund-
area and program, a statement of teaching philoso- ability to develop an independent research pro- ing from NSF, DARPA, DoD and NASA, including
phy, and arrange to have at least three reference gram, and enthusiasm for working with under- two NSF IGERT awards, nine Trustworthy Com-
letters sent to: Dr. Carey Williamson, Head, Depart- graduate and graduate students. puting awards and several other research and in-
ment of Computer Science, University of Calgary, The University of Colorado Boulder is com- strumentation grants; awards from state agencies
Calgary, Alberta, Canada, T2N 1N4 or via email to: mitted to diversity and equality in education and such as the Illinois Department of Transporta-
search@cpsc.ucalgary.ca. Completed applications employment. We encourage applications from tion, and from companies such as Google, Yahoo!
received by December 15, 2011 will receive full con- women, minority candidates, people with dis- and Motorola. Our department is home to many
sideration, though the review process will continue abilities, and veterans. pioneering and discipline-defining efforts in the
until the positions are filled. Hiring decisions will Applications will be evaluated starting De- areas of virtual reality (CAVE), software engineer-
be finalized in Spring 2012, with the successful can- cember 6, 2011 and until the positions are filled. ing (Petri Nets, Model Checking), Data Manage-
didates joining the U of C on July 1, 2012. Applications must include a letter of applica- ment and Mining, and Computational Trans-
tion specifying the desired position and area of portation. We have growing research programs
All qualified candidates are encouraged to apply; specialization, complete curriculum vitae, state- in areas such as computational biology, learning
however, Canadians and permanent residents will ments of research and teaching interests, and technologies, mobile and distributed systems,
be given priority. The University of Calgary respects, names and contact information of three refer- and security and privacy. At UIC, there are plenty
appreciates, and encourages diversity. ences. Applications must be submitted on-line at of opportunities for interdisciplinary work—UIC
http://www.jobsatcu.com/ using posting number houses the largest medical school in the country,
#815103 (computer systems). Additional infor- and our faculty are engaged with several cross-
University of California, Los Angeles mation is available at that site. departmental collaborations with faculty from
Computer Science Department health sciences, social sciences and humanities,
Tenure Track Positions, All Areas of Computer urban planning and the business school.
Science & Computer Engineering University of Colorado, Chicago is the third most populous city in the
Tracking #0145-1112-01 Colorado Springs USA. Located by the shore of Lake Michigan, the
Assistant Professor city offers an outstanding array of cultural and cu-
The Computer Science Department of the Henry linary experiences. As the birthplace of the mod-
Samueli School of Engineering and Applied Sci- The University of Colorado, Colorado Springs ern skyscraper, Chicago boasts one of the world’s
ence at the University of California, Los Angeles, invites applications for up to three tenure-track tallest and densest skylines, combined with an
invites applications for tenure-track positions in Assistant Professor positions in all areas of Com- extensive system of parks and public transit. Its
all areas of Computer Science and Computer En- puter Science and Software Engineering. The CS primary airport is the second busiest in the world,
gineering. Applications are also encouraged from Dept offers Bachelor, Master and PhD degrees. with frequent non-stop flights to virtually any-
distinguished candidates at senior levels. Quality See full ad and apply electronically at http://www. where. Yet the cost of living, whether in an 85th
is our key criterion for applicant selection. Appli- JobsatCU.com, refer to posting #815131. Review floor condominium downtown or on a tree-lined
cants should have a strong commitment both to of applications will begin on January 15, 2012 and street in one of the nation’s finest school dis-
research and teaching and an outstanding record continue until the positions are filled. tricts, is surprisingly low.
of research for their level of seniority. Salary is Applications must be submitted at https://
commensurate with education and experience. jobs.uic.edu/. Please include a resume, teaching
UCLA is an Equal Opportunity/Affirmative Ac- University of Houston-Clear Lake and research statements, and names and ad-
tion Employer. The department is committed to Assistant Professor of Computer Science dresses of at least three references in the online
building a more diverse faculty, staff and student application. Applicants needing additional infor-
body as it responds to the changing population The University of Houston-Clear Lake Computer mation may contact the Faculty Search Chair at
and educational needs of California and the na- Science program invites applications for a ten- search@cs.uic.edu.
tion. To apply, please visit http://www.cs.ucla. ure-track Assistant Professor of CS to begin Au- Application processing will commence on Nov
edu/recruit. Faculty candidates are urged to en- gust 2012. A Ph.D. in CS, or closely related field, 15th. We will continue to accept and process ap-
sure that their applications and letters of refer- is required. Applications accepted online only at plications after that date until all the positions are
ence are received by January 1, 2012. https://jobs.uhcl.edu. See http://sce.uhcl.edu/cs. filled. The University of Illinois at Chicago is an
AA/EOE. Affirmative Action/Equal Opportunity Employer.

University of Colorado Boulder


Tenure-track positions in Computer Systems University of Illinois at Chicago University of Massachusetts Amherst
Faculty Position Faculty Positions in Computer Science
University of Colorado Boulder: Department of Department of Computer Science, UIC
Electrical, Computer, and Energy Engineering We invite applications for tenure-track faculty po-
(ECEE) seeks outstanding candidates for two The Computer Science Department at the Univer- sitions in computer science with a preference for
tenure-track positions in computer systems. The sity of Illinois at Chicago invites applications for applicants with expertise in software engineer-
openings are targeted at the level of Assistant tenure-track positions at the rank of Assistant Pro- ing, programming languages, security, graphics,

116 commun icat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


robotics, and computer vision. Applicants must computer and information science, and super- tems, Information Security and Assurance, Data
have completed (or be completing) a Ph.D. in vise MS and PhD students. An applicant must Privacy, Formal Analytics, Modeling and Simula-
Computer Science, or a related area, and should provide evidence of research potential, effective tion and Human Computer Interaction. The Se-
show evidence of exceptional research promise. communication skills, and a broad background curity and Privacy program is recognized by the
Three positions are available at either the Assis- in computing. National Security Agency as a National Center of
tant Professor or Associate Professor level. The Department has an ABET/CAC-accred- Academic Excellence in Information Assurance
Our department is highly supportive of junior ited undergraduate program and MS and PhD Education and Research. SIS is part of the College
faculty, providing both formal and informal men- programs. See the website http://www.cs.olemiss. of Computing and Informatics along with the
toring. Many of our faculty are involved in inter- edu for more information about the Department. Departments of Computer Science and Bioinfor-
disciplinary research, working closely with other The University is located in Oxford, one of matics and Genomics. CCI is the university leader
departments including statistics/mathematics, America’s top-ranked college towns. Oxford has a in external funding per faculty member and has
electrical and industrial engineering, biology, wonderful small-town atmosphere and excellent the largest Ph.D. program at UNC Charlotte.
physics, linguistics, and nursing, as well as new schools.
“green” initiatives. Amherst, a historic New Eng- Individuals may apply online at http://jobs.
land town, is the center of a vibrant and culturally olemiss.edu. The application requires a cover The University of North Texas
rich area that includes four other colleges. For letter, curriculum vitae, research and teaching Department of Computer Science
more information about the department, visit statements, and contact information for four ref- and Engineering
http://www.cs.umass.edu/. erences. Review of applications will begin imme- Assistant/Associate/Full Professors
To apply, please send a cover letter referenc- diately and continue until the position is filled or
ing search R41582 with your vita, a research state- an adequate applicant pool is reached. The Department of Computer Science and Engi-
ment, a teaching statement, and at least three The University of Mississippi is an EEO/AA/Ti- neering at the University of North Texas (UNT) is
letters of recommendation. Electronic submis- tle VI/Title IX/Section 504/ADA/ADEA Employer. seeking candidates for multiple tenure-track/ten-
sion of application materials in pdf format is ured faculty positions at the Assistant, Associate
preferred. Send to facrec@cs.umass.edu. Alter- or Full Professor level beginning August 15, 2012.
natively, paper copies of application materials University of North Carolina The department plans to build on its existing
may be sent to: Search R41582 c/o Chair of Facul- at Charlotte strengths in 3 areas: Computer Systems, includ-
ty Recruiting, Department of Computer Science, Chairperson ing operating systems, runtime systems for cloud
University of Massachusetts, Amherst, MA 01003. Department of Software and high performance or mobile and handheld
We will begin to review applications on No- and Information Systems devices, software engineering of net-centric, real-
vember 15, 2011 and will continue until the posi- time and embedded systems , and energy efficient
tions are filled. Salary will be commensurate with The SIS Department invites applications for the and low power circuits and systems; Intelligent
education and experience. Inquiries and requests position of Department Chair. Candidates for the Systems, including data mining, machine learn-
for more information can be sent to: facrec@ position must have a Ph.D. in Computer Science, ing, information retrieval, scientific visualization,
cs.umass.edu. Information Technology, or a closely related field, human-computer interaction, and computational
The University of Massachusetts is an Af- a record of research and publication commen- life sciences; and Security, including information
firmative Action/Equal Opportunity employer. surate with that of a Full Professor, evidence of a assurance, network security and intrusion detec-
Women and members of minority groups are en- commitment to excellence in teaching, and strong tion, and secure software systems and vulnerabili-
couraged to apply. administrative skills. SIS has a fast-growing enroll- ty analysis. Candidates should have demonstrated
ment and expects to grow significantly in faculty the potential to excel in research in one or more of
and research funding over the next 5 years. The these areas and in teaching. A Ph.D. in Computer
University of Minnesota – Twin Cities chair is expected to bring dynamic leadership in Science, Computer Engineering or closely related
Department of Computer Science and research, curricular, and faculty development, and field is required at the time of appointment. At the
Engineering to build consensus within the department for stra- Assistant Professor level, the applicant’s record
Faculty Position tegic initiatives such as large, multidisciplinary must include high quality publications. At the As-
research efforts. The university is one of the most sociate Professor level, the applicant must have
The Department of Computer Science and En- rapidly growing in the country, and the goal within at least 5 years of experience beyond an earned
gineering at the University of Minnesota-Twin the College of Computing and Informatics is that doctoral degree with a significant record of pub-
Cities invites applications from candidates in all the college and the SIS Department will lead this lications and extramural funding. A Full Professor
areas of Computer Science for two faculty posi- growth both in enrollment and quality of pro- would be expected to be a leader in his/her field
tions. We strongly encourage applications from grams. Applications must be made electronically with a record of building and maintaining a large-
women and members of minority groups. Candi- at https://jobs.uncc.edu (position1770) and must scale research program of international renown.
dates should have a PhD in Computer Science or include a CV, references, and a statement on teach- The Computer Science and Engineering de-
a closely related discipline. The position is open ing, management, leadership, and research goals. partment is home to 730 Bachelors students, 136
until filled, but for full consideration apply at Informal inquiries can be made to the Search Masters students and 77 Ph.D. students. Addition-
www.cs.umn.edu/employment/faculty by Decem- Committee Chair, William Ribarsky, ribarsky@ al information about the department is available
ber 15, 2011. The University of Minnesota is an uncc.edu. Review of applications will begin as at the department’s website at: www.cse.unt.edu.
equal opportunity employer and educator. they are received and continue until the position is
filled. All inquiries and applications will be treated Application Procedure:
as confidential. The University of North Carolina All applicants must apply online to: https://fac-
The University of Mississippi at Charlotte is an EOE/AA employer and an NSF ultyjobs.unt.edu. Submit nominations and ques-
Department of Computer ADVANCE Institution. For additional information, tions regarding the position to Dr. Philip Sweany
& Information Science please visit our website at http://sis.uncc.edu. (sweany@cse.unt.edu).
Assistant Professor Positions The Department of Software and Information
Systems has 15 faculty members, a large Ph.D. Application Deadline:
The Department of Computer and Information program, and over 350 students. The department The committee will begin its review of applica-
Science at the University of Mississippi invites ap- offers a B.A. degree in Software and Information tions on December 1, 2011 and continue to ac-
plications for two tenure-track Assistant Profes- Systems, an M.S. degree in Information Technol- cept and review applications until the positions
sor positions. ogy, and a Ph.D. degree in Computing and Infor- are closed.
An applicant must hold a PhD or equivalent mation Systems. SIS also plays a key role in the
in computer science or a closely related field by interdisciplinary Professional Science Master’s The University:
August 15, 2012. An applicant must have the in Health Informatics. The department has fo- With about 36,000 students, UNT is the nation’s
ability to teach both graduate and undergradu- cus areas in Security, Design, and Analytics, with 33rd largest university. As the largest, most com-
ate students, conduct research in major areas of research that includes Complex Adaptive Sys- prehensive university in Dallas-Fort Worth, UNT

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 117


careers

drives the North Texas region. UNT offers 97 in systems-related fields. The successful candi- ing and computer security (cybersecurity). In
bachelor’s, 88 master’s and 40 doctoral degree date will have strong record of accomplishments both areas we seek candidates with a record of
programs, many nationally and internationally that demonstrate a highly creative approach to outstanding-quality research publications and
recognized. A student-focused public research systems research, knowledge of state-of-the-art potential for excellence in teaching.
university, UNT is the flagship of the UNT System. techniques and technology, and significant col- The Department of Computer Science and
The University of North Texas is an AA/ADA/ laborative project work. In addition, the candidate Engineering (http://www.cse.usf.edu) has 23 fac-
EOE committed to diversity in its educational should have experience creating, evaluating, and ulty members and offers B.S., M.S., and Ph.D.
programs. applying experimental systems artifacts. Oppor- degrees. The research program is well supported
tunities for interactions with systems-oriented by federal and state agencies and industry. The
faculty include: parallel and distributed systems, University of South Florida serves over 47,000
University of Notre Dame networking, databases, intelligent systems, infor- students and is one of the nation’s top public re-
Department of Computer Science and matics, and computational science. Applicants search universities.
Engineering must have a Ph.D. in computer science or closely For further information and for application
Assistant or Associate Professor related field, a demonstrated record of excellence instructions, please see our faculty search web-
in research, and a strong commitment to teach- site: http://www.cse.usf.edu/faculty-search/. For
The Department of Computer Science and Engi- ing. A successful candidate will be expected to questions please send email to faculty-search@
neering at the University of Notre Dame invites ap- conduct a vigorous research program and to teach cse.usf.edu. Applications will be considered start-
plications for Assistant or Associate Professor. Ex- at both the undergraduate and graduate levels. ing immediately until the positions are filled.
cellent candidates in all areas will be considered. Applications will be accepted electronically According to Florida law, applications and
The Department offers the PhD degree and through the department’s web site (only). Appli- meetings regarding them are open to the pub-
accredited undergraduate Computer Science and cation information can be found at http://www. lic. The University of South Florida is an Equal
Computer Engineering degrees, with currently cs.uoregon.edu/Employment/. Review of applica- Opportunity/Equal Access/Affirmative Action
over 80 PhD students and over 150 undergraduate tions will begin March 01, 2012 and continue un- Institution. Women and minorities are strongly
majors. Faculty are expected to excel in classroom til the position is filled. Please address any ques- encouraged to apply.
teaching and to build and lead cutting-edge and tions to faculty.search@cs.uoregon.edu.
highly visible research projects that attract sub- The University of Oregon is an equal opportu-
stantial external funding. nity/affirmative action institution committed to University of South Florida
The University of Notre Dame is a private, Cath- cultural diversity and is compliant with the Amer- Instructor Position
olic university with a doctoral research extensive icans with Disabilities Act. We are committed to Computer Science and Engineering
Carnegie classification, and consistently ranks in creating a more inclusive and diverse institution
U. S. News & World Report as a top-twenty national and seek candidates with demonstrated potential Applications are invited for one Instructor posi-
university. The South Bend area has a vibrant and to contribute positively to its diverse community. tion in the Department of Computer Science and
diverse economy with affordable housing and ex- Engineering. We are seeking an instructor who
cellent school systems, and is within easy driving can teach a broad range of core computer sci-
distance of Chicago and Lake Michigan. University of Rochester ence and computer engineering courses – both
Applicants should send (pdf format preferred) Tenure Track Position in Computer Science software and hardware – at the undergraduate
a CV, statement of teaching and research interests, level, as well as advise students. Candidates must
and contact information for three professional ref- The University of Rochester Department of Com- have completed, or be near completion, of a Ph.D.
erences to: facultysearch AT cse.nd.edu puter Science seeks applicants for a tenure-track degree in computer science or a related area. For
The University of Notre Dame is an Equal faculty position. We are particularly interested in exceptionally qualified candidates an M.S. degree
Opportunity, Affirmative Action Employer. researchers in human-computer interaction and may be considered.
machine learning, but will consider all outstanding The Department of Computer Science and
candidates. See http://www.cs.rochester.edu/recruit Engineering (http://www.cse.usf.edu) has 23 fac-
University of Oregon for details. UR is an Equal Opportunity Employer. ulty members and offers B.S., M.S., and Ph.D. de-
Department of Computer grees. The undergraduate program graduates ap-
and Information Science proximately 80 students per year. The University
Faculty Position University of Science of South Florida is one of the nation’s top public
Assistant Professor and Technology of China research universities.
School of Computer Science and Technology For further information and for application
The Department of Computer and Information Faculty Positions instructions, please see our faculty search web-
Science (CIS) seeks applications for a tenure track site: http://www.cse.usf.edu/faculty-search/. For
faculty position at the rank of Assistant Professor, The School of Computer Science and Technol- questions please send email to faculty-search@
beginning Fall 2012. The University of Oregon is ogy at University of Science and Technology of cse.usf.edu. Applications will be considered start-
an AAU research university located in Eugene, two China (USTC) invites applications for tenure- ing immediately until the position is filled.
hours south of Portland, and within one hour’s track or tenured positions at all levels. Research According to Florida law, applications and
drive of both the Pacific Ocean and the snow- areas of particular interest include programming meetings regarding them are open to the pub-
capped Cascade Mountains. languages and compilers, formal verification, lic. The University of South Florida is an Equal
The CIS Department is part of the College of robotics, computational intelligence, machine Opportunity/Equal Access/Affirmative Action
Arts and Sciences and is housed within the Lorry learning, data mining, computer architectures, Institution. Women and minorities are strongly
Lokey Science Complex. The department offers parallel and high-performance computing, net- encouraged to apply.
B.S., M.S. and Ph.D. degrees. More information work and distributed systems. For more informa-
about the department, its programs and faculty tion about the positions, please see http://en.cs.
can be found at http://www.cs.uoregon.edu. ustc.edu.cn/join_us. University of Texas at Austin
We offer a stimulating, friendly environment Computer Science Department
for collaborative research both within the depart- Tenure Track/Tenured Faculty Positions
ment and with other departments on campus. University of South Florida
Faculty in the department are affiliated with the Assistant Professor Positions The Department of Computer Science of the Uni-
Cognitive and Decision Sciences Institute, the Computer Science and Engineering versity of Texas at Austin invites applications for
Computational Science Institute, and the Neuro- tenure-track positions at all levels. Excellent can-
Informatics Center. Applications are invited for two tenure-track As- didates in all areas will be seriously considered,
The department seeks to hire faculty in the sistant Professor positions in the Department of especially in Computer Architecture and other
general area of systems, with specific specializa- Computer Science and Engineering. The Depart- areas of computer systems research. All tenured
tion that complements existing faculty strengths ment is hiring in all areas of computer engineer- and tenure-track positions require a Ph.D. or

118 communicat ions of t he acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


equivalent degree in computer science or a re- Chair of Faculty Search Committee plementing new or revised system components to
lated area at the time of employment. Department of Computer Science improve Colleague use and reliability. (E)
Successful candidates are expected to pur- The University of Texas at San Antonio Help develop plans for upgrading of software
sue an active research program, to teach both One UTSA Circle application systems, lead department teams, and
graduate and undergraduate courses, and to San Antonio, TX 78249-0667 use effective project management tools (E)
supervise graduate students. The department is Phone: 210-458-4436 Fax: 210-458-4437 Test and analyze the impact of system up-
ranked among the top ten computer science de- fsearch@cs.utsa.edu grades/patches on custom programming. (E)
partments in the country. It has 43 tenured and Conduct and evaluate department needs and
tenure-track faculty members across all areas of provide analysis of computer systems and rec-
computer science. Many of these faculty partici- University of Toronto Scarborough ommend solutions. (E) Design & provide speci-
pate in interdisciplinary programs and centers in Lecturer Computer Science fications for subroutines and extensions to cus-
the University, including those in Computational tomize the integrated system components. (E)
and Applied Mathematics, Computational Biol- The Department of Computer & Mathematical Sci- Provide documentation, train staff, and end users
ogy, and Neuroscience. ences at the University of Toronto Scarborough in- in system features and usage. (E)
Austin, the capital of Texas, is located on vites applications for a Lecturer (teaching-stream) Evaluate and implement state and federal re-
the Colorado River, at the edge of the Texas Hill in Computer Science. For more information see porting requirements for information collection
Country, and is famous for its live music and out- https://www.mathjobs.org/jobs/jobs/3173 and processing. (E) Provide analytical and report
door recreation. Austin is also a center for high- writing services. (E)
technology industry, including companies such Determine appropriate methods for gather-
as IBM, Dell, Freescale Semiconductor, Advanced University of Virginia ing, presenting and reporting data. (E)
Micro Devices, National Instruments, AT&T, Intel Senior Director for Enterprise Infrastructure Develop report formats and specifications. (E)
and Samsung. For more information please see Conduct analysis of large data files; ability to
the department web page: http://www.cs.utexas. We invite applications for our Sr. Director position. make assessments & recommendations on re-
edu/. The department prefers to receive applica- Responsible for leading all operational aspects of porting assessments. (E) Participate in short- and
tions online, beginning October 17, 2011. the University’s IT organization. Significant rele- long-range planning. (E)
To submit yours, please visit http://services. vant technical expertise, coupled with the ability to Keep current about new developments and
cs.utexas.edu/recruit/faculty lead a large-scale enterprise, and foster a collabor- technology regarding Colleague information
If you do not have internet access, please send ative, high performance environment is required. analysis, planning, and reporting systems. (E)
a curriculum vita, a homepage URL, a description Apply for this Job: Explore innovations and trends in technology
of research interests, and a list of selected pub- Contact Person: Catherine Brand with and for institutional applicability. (E)
lications to the address below. Applicants for an Email Address: cab3ae@virginia.edu Meet with staff in other cross-functional sys-
assistant professor position must have at least Website: https://jobs.virginia.edu tems to identify and resolve problems. (E)
three (3) referees send letters of reference directly Participate in internal and external user
to the address provided. Applicants for a tenured groups; serve on committees as assigned. (E)
position (associate or full professor) must have at Yuba Community College District Other related duties as assigned.
least six (6) referees send letters of reference di- Information Systems Financial Aid Liaison KNOWLEDGE OF:
rectly. The address for all hard copy material is: Uses, capabilities characteristics and limitations
Faculty Search Committee, Department of Com- CATEGORICALLY FUNDED – POSITION of computer systems and related equipment Prin-
puter Science, The University of Texas at Austin, CONTINGENT UPON CONTINUED FUNDING ciples and techniques of programming, data pro-
1616 Guadalupe Street, Suite 2.408, Austin, Texas AND BOARD APPROVAL cessing and programming documentation
78701, USA. COMMENCING: As Soon As Possible
Inquiries about your application may be di- INFORMATION SYSTEMS FINANCIAL AID ABILITY TO:
rected to facultysearch@cs.utexas.edu. For full LIAISON Analyze, design, write, test and resolve problems
consideration of your application, please ap- DEPARTMENT: FINANCIAL AID with complex and technical computer systems,
ply by January 31, 2012. Women and minority programs and subsystems
candidates are especially encouraged to apply. FINAL FILING DATE: FRIDAY, SEPTEMBER 30, Read, interpret, and apply technical publica-
The University of Texas is an Equal Opportunity 2011 BY 12:00 NOON. (POSTMARKS ARE NOT tions, manuals, and other documents
Employer. ACCEPTED) Independently diagnose problems, develop
solutions and communicate effectively with users
LOCATION: *SCHEDULED TO BE ASSIGNED TO: Interpret and implement requirements based
YUBA COLLEGE on laws and regulations Prepare clear and con-
University of Texas at San Antonio
Faculty Positions in Computer Science SALARY: $3675.83 - $4039.28/MO. (CSEA Sal- cise reports
ary Schedule, initial placement will not be higher Work independently or as a member of a team
The Department of Computer Science at The than the above listed salary; the top step for this Communicate effectively both orally and in
University of Texas at San Antonio (UTSA) invites position is $5122.60/MO) writing
applications for tenure/tenure-track positions at Maintain accurate records
BASIC FUNCTION: Under the direction of the
the Assistant, Associate or Professor level, start- Establish and maintain effective working re-
Dean of Financial Aid, the Information Systems
ing Fall 2012. All areas of computer science will lationships with staff, faculty and vendors
Financial Aid Liaison will act as liaison between
be considered. We are particularly interested in Financial Aid and Information Systems provid- EDUCATION AND EXPERIENCE:
candidates in theory and algorithms or computer ing: assessment, implementation and testing of Any combination equivalent to a BA/BS degree
architecture, with a focus on computer and infor- updates, new features and new state or federal and two (2) years of professional experience pro-
mation security as a cross-cutting concern. regulations using the District’s administrative viding end user support in an integrated database
The Department of Computer Science cur- software, Colleague. This position will be physi- environment; experience programming and de-
rently has 23 faculty members and offers B.S., cally located in the Financial Aid Office. veloping reports, working knowledge of database
M.S., and Ph.D. degrees supporting a dynamic interfaces and reporting tools; including commu-
DISTINGUISHING CHARACTERISTICS: The Stu-
and growing program with 560 undergraduates nity college experience.
dent Services Liaison will assume responsibility
and more than 160 graduate students, including
for overall report development, testing and im- ENVIRONMENT:
81 Ph.D. students. See http://www.cs.utsa.edu for
plementation of new functionality, and content Office environment; subject to constant interrup-
application instructions and additional informa-
development that will interface with third party tions
tion on the Department of Computer Science.
applications. PHYSICAL ABILITIES:
Applications must be submitted by email in PDF
format to fsearch@cs.utsa.edu. UTSA is an EO/AA REPRESENTATIVE DUTIES: Sitting and operating a keyboard to enter data into
Employer. Design, develop and plan efficient means of im- a computer terminal for extended periods of time

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 119


careers

Hearing and speaking to exchange information required to provide verification of freedom from TIAL PLACEMENT WILL NOT BE HIGHER THAN
Moderate lifting up to 25 pounds tuberculosis. THE LISTED SALARY, STEP 3 OF THE CSEA SAL-
ARY SCHEDULE, ACCORDING TO THE CLASSI-
REQUIRED DUTIES: EQUAL EMPLOYMENT:
FIED POLICY AND PROCEDURES HANDBOOK.
Demonstrate sensitivity to and understanding of Yuba Community College District is an Equal Em-
THIS DOES NOT APPLY TO INTERNAL CAN-
the diverse academic, socioeconomic, cultural, ployment Opportunity Employer and guarantees
DIDATE’S; PLEASE REVIEW THE TOP OF THE
disability and ethnic backgrounds of community equal opportunity regardless of race, color, creed,
FLYER (SALARY).
college students. national origin, ancestry, gender, marital status,
* This position is anticipated to be assigned to the disability, religious or political affiliation, age or APPLICATION PROCEDURE & DEADLINE:
Yuba College but may be assigned temporarily or sexual orientation and does not discriminate in A District Classified application and the Diversity
permanently within the District. its educational programs, in employment nor in Statement are required. The application is avail-
WORKING CONDITIONS: any other of its activities. able at the Human Resources Office, 2088 North
Categorically funded positions are contingent PART-TIME (less than .60 FTE): Part-time posi- Beale Road, Building 100A, Room 21, Marysville,
upon funding. Smoking is restricted in many areas tions less than .60 FTE are not entitled to any Dis- CA 95901. Or you may call our TTY line at (530) 634-
of the Yuba Community College District. Wood- trict paid fringe benefits. The District does howev- 7760 OR visit our Web Site at http://www.yccd.edu/
land Community College is a tobacco free campus. er; provide the employee prorated leaves including It is the sole responsibility of the applicant to en-
vacation, sick leave and paid holidays. Employees sure that all application materials are received by
INTERVIEW: less than .50 FTE contribute to an Alternative Re- the final filing date in the Human Resources Office
A candidate selected for interview will be required tirement System (Apple). Employees whose, FTE by FRIDAY, SEPTEMBER 30, 2011 BY 12:00 NOON.
to visit the Yuba College at his/her own expense is between .50 and .60 contribute to the California All submitted materials become District prop-
upon a date selected by the District. Meeting the Public Employees Retirement System (CalPERS). erty, will not be returned, will not be copied and
minimum qualifications does not guarantee an will be considered for this opening only. Faxed,
interview. BENEFITS/SALARY:
emailed, incomplete and/or late applications will
The District offers a comprehensive benefit pack-
FOREIGN TRANSCRIPTS: not be forwarded for further consideration.
age for employees and dependents for positions
Must include a U.S. evaluation and translation. whose FTE is .60 or higher, valued at over $16,000
Please contact the Office of Human Resources for annually with a $14.50 monthly out of pocket ex-
a list of agencies providing this service. pense to employees or dependents for monthly Central Washington University
PRE-EMPLOYMENT REQUIREMENTS: premiums. The package includes health, dental, Assistant/Associate Professor
All Academic, Classified and Management em- vision, one (1) life insurance policy and an Em-
ployees shall be required to provide fingerprints ployee Assistance program. Additional benefits Central Washington University, Computer Sci-
to the District for the purpose of obtaining a include contributions to the Public Employee’s ence Dept - accepting applications for Ass’t/Assoc
criminal history as authorized by the California Retirement System (PERS) which is integrated Prof. Applicants with research potential in com-
Education Code and all fees are the responsibil- with Social Security, 457/403b options, Vacation putational science areas are encouraged to apply.
ity of the selected candidates. All prospective Ad- days - 7.33 hrs per month for the first year, 96 hrs To apply online, visit: https://jobs.cwu.edu. AA/
ministrative and Classified employees shall be per years, 1-5, 12 sick days and 20 holidays. INI- EEO/Title IX Institution.

You’ve come a long way.


Share what you’ve learned.

ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring network in engineering,
science and mathematics. MentorNet’s award-winning One-on-One Mentoring Programs pair ACM
student members with mentors from industry, government, higher education, and other sectors.
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.

Make a difference to a student in your field.


Sign up today at: www.mentornet.net
Find out more at: www.acm.org/mentornet
MentorNet’s sponsors include 3M Foundation, ACM, Alcoa Foundation, Agilent Technologies, Amylin Pharmaceuticals, Bechtel Group Foundation, Cisco
Systems, Hewlett-Packard Company, IBM Corporation, Intel Foundation, Lockheed Martin Space Systems, National Science Foundation, Naval Research
Laboratory, NVIDIA, Sandia National Laboratories, Schlumberger, S.D. Bechtel, Jr. Foundation, Texas Instruments, and The Henry Luce Foundation.

120 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


research highlights
p. 122 p. 123
Technical Safe to the Last Instruction:
Perspective Automated Verification
Safety First!
By Xavier Leroy of a Type-Safe Operating System
By Jean Yang and Chris Hawblitzel

p. 132 p. 133
Technical Wherefore Art Thou R3579X?
Perspective Anonymized Social Networks,
Anonymity Is
Not Privacy Hidden Patterns, and
By Vitaly Shmatikov Structural Steganography
By Lars Backstrom, Cynthia Dwork, and Jon Kleinberg

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 121


research highlights
doi:10.1145/2043174.2 0 4 3 1 9 6

Technical Perspective
Safety First!
By Xavier Leroy

Software misbehaves all too often. the level of assembly language for the shown memory-safe using only TAL. A
This is a truism, but also the driving ubiquitous x86 processor architecture. major offender is the memory manag-
force behind many computing tech- There are several benefits to enforcing er (allocator, garbage collector, among
niques intended to increase software type and memory safety at the level of others), which has to treat memory in
reliability, safety, and security, rang- bytecode or assembly language: the an essentially untyped way. Similar is-
ing from basic testing to full formal compilers no longer need to be trusted sues occur in the lowest layers of op-
verification. to preserve safety and therefore are no erating systems (context switching,
In this wide spectrum of approach- longer part of the trusted computing interrupt handling, among others).
es, a sweet spot is type and memory base. Moreover, type-safe interoper- The standard approach at this point
safety. Rather than attempting to rule ability between different source lan- is to leave these components in the
out all bugs, type and memory safety guages can be guaranteed. trusted computing base and validate
focuses on strict enforcement of a few The following work by Yang and them only by testing. Instead, Yang
basic safety properties: a character Hawblitzel is a major milestone in and Hawblitzel succeeded in formally
string is not a code pointer; arrays are an ambitious research project: that verifying these components—which
always accessed within bounds; mem- of guaranteeing end-to-end type and they call the “Nucleus” of their Verve
ory blocks are not accessed after deal- memory safety for a complete soft- operating system—against mathemat-
location; pointers or object references ware stack. Leveraging the Bartok ical specifications (pre- and post-con-
cannot be forged from integers; and .NET-to-typed-x86 compiler and the ditions), using the Boogie deductive
so on. Such properties are enforced corresponding TAL checker, it is pos- program verifier.
through a combination of static (com- sible to automatically obtain safety The minimalistic design of the
pile-time) type-checking, dynamic guarantees for most of the software Nucleus is elegant, and the interplay
(runtime) checks such as array bound stack written in C#—not just applica- between its specifications and the ge-
checks, and automatic memory man- tion code, but also large chunks of sys- neric safety guarantees of the TAL code
agement. These humble safety prop- tems code such as network protocols. is subtle. Perhaps the most impres-
erties not only catch a number of com- In particular, the paper shows that sive aspect of this work, however, is
mon programming errors, but are also the major part of a safe, preemptive the remarkable economy of means by
surprisingly effective at thwarting scheduler for multitasking can be de- which it achieves end-to-end type and
many security attacks such as buffer veloped this way, which may come as a memory safety. The high degree of au-
overrun attacks. Moreover, they can surprise to many readers. tomation offered by the Boogie verifier
be leveraged to build software-en- However, not all parts of an operat- and Z3 automatic theorem prover does
forced access control and isolation ar- ing system and runtime system can be wonders here, resulting in an overall
chitectures such as the Java and .NET verification effort that is remarkably
security managers; for if object refer- low by today’s standards.
ences can be forged from integers, any The following work The formal verification of high-
software-only security infrastructure assurance software is making great
can be circumvented. is a major milestone progress lately. Yang and Hawblit-
In the mid-1990s came the realiza- in an ambitious zel’s work, along with other recent
tion that type and memory safety is breakthroughs in software verifica-
not just for high-level programming research project: tion such as the seL4 verified micro-
languages. Java and its bytecode veri- that of guaranteeing kernel of Klein et al. (see Commu-
fier popularized the idea that the byte- nications, June 2010, p. 107), were
code of a virtual machine can be made end-to-end type unthinkable 10 years ago. Little by lit-
type-safe through a combination of and memory safety tle, one point at a time, these results
load-time type-checking (bytecode sketch a promised land where, with
verification) and runtime checks in for a complete mathematical certainty, software
the virtual machine. Going one step software stack. does behave properly after all.
further “down,” Morrisett, Walker,
Crary and Glew introduced Typed As- Xavier Leroy (xavier.leroy@inria.fr) is a senior research
scientist at INRIA Paris-Rocquencourt, France.
sembly Language (TAL), which guaran-
tees type and memory safety directly at © 2011 ACM 0001-0782/11/12 $10.00

122 commun icat ions of t h e ac m | d ec emb e r 2 0 1 1 | vo l. 5 4 | no. 1 2


doi:10.1145/2043174 . 2 0 4 3 1 9 7

Safe to the Last Instruction:


Automated Verification of a Type-Safe Operating System
By Jean Yang and Chris Hawblitzel

Abstract an unsafe language (e.g., C), and any bugs in this runtime
Typed assembly language (TAL) and Hoare logic can be used ­system can undermine the safety of the entire language. For
to verify the absence of many kinds of errors in low-level example, such bugs have left popular Web browsers, includ-
code. We use TAL and Hoare logic to achieve highly auto- ing Mozilla and Internet Explorer, open to attack.10
mated, static verification of the safety of a new operating sys- This paper presents Verve, an operating system and run-
tem called Verve. We have developed techniques and tools time system that we have verified to ensure type and memory
to mechanically verify the safety of every assembly-language safety. Verve has a simple mantra: every assembly-language
instruction in the operating system, runtime system, ­drivers, instruction in the software stack must be mechanically veri-
and applications (in fact, every part of the system software fied for safety. This includes every instruction of every piece
except the boot loader). Verve consists of a “Nucleus” that of software except the boot loader: applications, device driv-
provides primitive access to hardware and memory, a kernel ers, thread scheduler, interrupt handler, allocator, garbage
that builds services on top of the Nucleus, and applications collector, etc. Because of this, Verve does not have to trust
that run on top of the kernel. The Nucleus, written in verified a high-level language compiler to enforce safety, nor does it
assembly language, implements allocation, garbage collec- have to rely on unverified library code.
tion, multiple stacks, interrupt handling, and device access. The goal of formally verifying low-level OS and runtime
The kernel, written in C# and compiled to TAL, builds system code is not new. Nevertheless, very little mechani-
higher-level services, such as preemptive threads, on top of cally verified low-level OS and runtime system code exists,
the Nucleus. A TAL checker verifies the safety of the kernel and that code still requires man-years of effort to verify.5, 8
and applications. A Hoare-style verifier with an automated This paper argues that recent programming language and
theorem prover verifies both the safety and correctness of theorem-proving technologies reduce this effort substan-
the Nucleus. Verve is, to the best of our knowledge, the first tially, making it practical to verify strong properties through-
operating system mechanically verified to guarantee both out a complex system. The key idea is to split a traditional
type and memory safety. More generally, Verve’s approach OS kernel into two layers: a critical low-level “Nucleus,”
demonstrates a practical way to mix high-level typed code which exports essential runtime abstractions of the underly-
with low-level untyped code in a verifiably safe manner. ing hardware and memory, and a higher-level kernel, which
provides more fully fledged services. Because of these two
distinct layers, we can leverage two distinct automated tech-
1. INTRODUCTION nologies to verify Verve: TAL (typed assembly language11)
High-level computer applications build on services provided and automated theorem provers. Specifically, we verify the
by lower-level software layers, such as operating systems and Nucleus using automated theorem proving (based on Hoare
language runtime systems. These lower-level software lay- Logic) and we ensure the safety of the kernel using TAL
ers should be reliable and secure. Without reliability, users ­(generated from C#).
endure frustration and potential data loss when the system A complete Verve system consists of a Nucleus, a ker-
software crashes. Without security, users are vulnerable nel, and one or more applications. We wrote the kernel
to attacks from the network, which often exploit low-level and applications in safe C#, which is automatically com-
bugs such as buffer overflows to take over a user’s computer. piled to TAL. An existing TAL checker3 verifies this TAL
Unfortunately, today’s low-level software still suffers from a code (again, automatically). We wrote the Nucleus directly
steady stream of bugs, often leaving computers vulnerable in assembly language, hand-annotating it with assertions
to attack until the bugs are patched. (preconditions, postconditions, and loop invariants). An
Many projects have proposed using safe languages to existing Hoare-style program verifier called Boogie1 verifies
increase the reliability and security of low-level systems. Safe the assembly language against a specification of safety and
languages ensure type safety and memory safety: accesses correctness. This ensures the safety and correctness of the
to data are guaranteed to be well-typed and guaranteed not Nucleus’s implementation, including safe interaction with
to overflow memory boundaries or dereference dangling the TAL code and safe interaction with hardware (includ-
pointers. This safety rules out many common bugs, such as ing memory, interrupts, timer, keyboard, and  screen).
buffer overflow vulnerabilities. Unfortunately, it is difficult Boogie  relies on Z3,4 an automated theorem prover, to
to express a complete computer system entirely in a safe
language, because safe languages deliberately omit unsafe, The original version of this paper was published in
low-level features, such as explicit memory deallocation. Programming Language Design and Implementation (PLDI),
To perform low-level tasks like memory management, a 2010, ACM.
safe language usually relies on a runtime system written in

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 123


research highlights

check that the assertions are satisfied. Writing the asser- arrays. Based on this reasoning, automated theorem prov-
tions requires human effort, but once they are written, we ers can soundly prove deep properties about computer
can use Boogie and Z3 to verify them completely automati- programs.
cally. As a result, the Verve Nucleus requires only 2–3 lines TAL and automated theorem proving are complementary
of proof annotation per executable statement. Although it technologies. On one hand, TAL is relatively easy to gener-
is difficult to compare annotation burdens across systems ate: because of the similarity between TAL types and high-
that use different proof environments and programming level language types, a compiler can automatically turn
languages, similar projects based on interactive theorem high-level language code into TAL code, relying on the type
provers5, 8 have required more than 10 lines of proof annota- annotations already present in the high-level language code.
tion per line of code. This enables TAL to scale easily to large amounts of code.
Verve boots and runs on real, off-the-shelf x86 hardware, Verve uses the Bartok compiler,3 which automatically gener-
and provides efficient support for realistic language fea- ates TAL code from type-safe C# code.
tures, including classes, virtual methods, arrays, and pre- On the other hand, we can use automated theorem prov-
emptive threads. Nevertheless, the current Verve system is ers to verify deeper logical properties about the code than
still small compared to commodity operating systems and a typical TAL type system can express, using a methodology
has many limitations. It lacks support for many C# features: discussed by Turing12 and described by Floyd and Hoare
exception handling, for example, is implemented by killing in the 1960s, now commonly known as “Hoare logic.” In
a thread entirely, rather than with try/catch. It lacks dynamic this methodology, a programmer annotates various points
loading of code. It runs on only a single processor. Although in the program, such as procedure entry points and loop
it protects applications from each other using type safety, it entry points, with annotations describing the state of the
lacks a more comprehensive isolation mechanism between machine. Such annotations are similar to type annotations,
applications such as Java Isolates or C# AppDomains. The but specify properties of variables in much greater detail
verification does not guarantee termination. Finally, Verve than usually found in type annotations. For example, a type
uses verified garbage collectors7 that are stop-the-world annotation might merely say that registers eax and ebx both
rather than incremental or real time, and Verve keeps inter- have type int, while a Hoare-style annotation might specify
rupts disabled throughout the collection. Except for multi- a precise formula about the values in eax and ebx, such as
processor support, none of the limitations in Verve’s present “eax >= 10 && eax + ebx < 20.” Because of this level
implementation are fundamental. of detail, writing these annotations requires substantial pro-
We expect that with more time, the high degree of auto- grammer effort.
mation in Verve’s verification will allow Verve to scale to a To exploit the tradeoff between TAL and automated theo-
more realistic feature set, such as a large library of safe code rem proving, we decided to split the Verve operating system
and a verified incremental garbage collector. Indeed, we code into two parts, shown in Figure 1: a Nucleus, verified
have already ported about 35,000 lines of safe C# code to with Hoare logic and automated theorem proving, and a ker-
run on top of Verve, including standard C# libraries, device nel, verified with TAL. The relative difficulty of Hoare logic
drivers, and implementations of several Internet protocols. motivated the balance between the two parts: only the func-
In this paper, we describe our verification tools tionality that we could not use TAL to verify as safe went into
(Section  2), the interface that our Nucleus exports to the the Nucleus; all other code went into the kernel.
rest of the kernel (Section 3), the verification of the Nucleus The Nucleus’s source code is not expressed in TAL, but
(Section 4), the kernel (Section 5), the time it takes to verify
Verve (Section 6), and related work in systems verification Figure 1. Verve structure, showing all 20 functions exported by the
(Section 7). Nucleus.

2. TOOLS FOR BUILDING A SAFE OS


Kernel Main Application
Two verification technologies, TAL and automated theo- (TAL)
(TAL)
rem proving, drive Verve’s design. TAL is assembly lan-
guage annotated with type information at the level of the KernelEntryPoint NewThread, Yield, ...
­assembly-language registers. We can use a simple type
checker to verify that each assembly-language instruction in GarbageCollect AllocObject Nucleus
a TAL program respects the types of the instruction’s oper- GetStackState AllocVector (BoogiePL)
NucleusEntryPoint
Boot Loader

ands. For instance, the checker would reject the use of an ResetStack Throw
integer as a memory address. YieldTo readField FaultHandler
VgaTextWrite writeField ErrorHandler
While TAL reasons about types, theorem provers rea-
TryReadKeyboard readStack InterruptHandler
son about logical formulas, attempting to prove formulas StartTimer writeStack FatalHandler
valid or invalid. Automated theorem provers run with little SendEoi
or no human assistance, in contrast to interactive theo-
rem provers, which can prove a wider variety of formulas
but often require considerable human assistance. Modern x86 Hardware
automated theorem provers can reason about various the-
ories, such as integer arithmetic, bitwise arithmetic, and

124 c ommunications of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


rather in Boogie’s programming language, called BoogiePL are written as logical formulas in the BoogiePL language.
(or just Boogie), so that the Boogie verifier can check it. By expressing and checking properties at a low level
Since the Nucleus code consists of assembly-language (assembly language), we can ensure nontrivial properties
instructions, these assembly-language instructions must with high confidence. The bulk of this paper focuses on
appear in a form that the Boogie verifier can understand. As these properties, with an emphasis on the specification
described in detail below, we decided to encode assembly- and verification of the Nucleus’s correctness properties.
language instructions as BoogiePL statements, so that the The next two sections discuss the Nucleus’s design and
Boogie verifier can check that each instruction’s require- verification.
ments are satisfied. After Boogie verification, a separate
tool called “BoogieAsm,” developed for an earlier project,7 3. THE NUCLEUS INTERFACE
extracts standard assembly-language instructions from The core of our verification is the Nucleus, which provides
the BoogiePL code. A standard assembler then turns these a verified interface to the low-level functionality of the oper-
instructions into an object file. ating system. We verify the Nucleus using Hoare logic in
A bug in the Boogie checker or TAL checker could allow Boogie, based on a trusted specification for x86 assembly-
unsafe code to pass into the Verve system. Therefore, we language instructions. In Verve, all access to low-level func-
currently must trust that these checkers are correct; they tionality must occur through the Nucleus—the kernel’s TAL
are part of Verve’s “trusted computing base.” Figure 2 code and application’s TAL code can access low-level func-
shows the various components of Verve’s trusted comput- tionality only indirectly, through the Nucleus. For example,
ing base. The only trusted components are the tools used TAL code cannot directly access devices. Furthermore, even
to verify, assemble, link, and boot the verified Nucleus and though TAL code can directly read and write words of mem-
kernel. Note that although we rely on various compilers (the ory, it can read and write only words designated as safe-for-
Bartok compiler generates TAL code, while a separate com- TAL by the Nucleus’s garbage collector.
piler called “Beat” generates much of the BoogiePL code), The Nucleus consists of a minimal set of functions neces-
none of our compilers are part of our trusted computing sary to support the TAL code that runs above it. We wanted
base: we do not need to trust the compilers to ensure the a minimal set because even with an automated theorem
correctness of the Nucleus and safety of the Verve system prover, Hoare-style verification is still hard work; less code in
as a whole. the Nucleus means less code to verify. At the same time, the
The trusted computing base includes the specification of set has to guarantee safety in the presence of arbitrary TAL
correctness for the Nucleus’s BoogiePL code. This includes code; it can assume that the TAL code is well typed, but can
specifications of the behavior of functions exported by the make no further assumptions about the behavior of the TAL
Nucleus, shown in Figure 1. (For example, the specifica- code. For example, when an interrupt occurs, the Nucleus
tion of “YieldTo” ensures that the Nucleus sets the stack tries to transfer control to a designated TAL interrupt han-
pointer to the top of the correct stack during a yield.) It also dler. The Nucleus cannot assume that this handler is in the
includes specifications for assembly-language instructions correct state to handle the interrupt, and must therefore
and for interaction with hardware devices and memory; we check the handler’s state at runtime.
took some of these specifications from existing work,7 and One design decision greatly simplified the Nucleus: fol-
wrote some of them from scratch. All Boogie specifications lowing a design used in recent microkernels,6, 8 no Nucleus
function ever blocks. In other words, every Nucleus func-
Figure 2. Building the Verve system: trusted, untrusted components. tion performs a finite (and usually small) amount of work
and then returns. The Nucleus may, however, return to a
Nucleus.beat Kernel.cs App.cs
UNTRUSTED

different thread than the thread that invoked the func-


tion. This allows the kernel built on top of the Nucleus to
C# compiler implement blocking thread operations, such as waiting on
Beat compiler a semaphore.
Bartok compiler Another design decision simplified reasoning about
the Nucleus: following the approach taken by the recent
VERIFIED

verified L4 microkernel, seL4,8 Verve keeps interrupts dis-


Nucleus.bpl(x86) Kernel.obj (x86) abled throughout the execution of any single Nucleus func-
tion. (On the other hand, interrupts may be enabled during
Boogie/Z3 TAL checker the TAL kernel’s execution, with no loss of safety.) Since
Spec.bpl
Nucleus functions do not block, Verve still guarantees that
TRUSTED

BoogieAsm eventually, interrupts will always be re-enabled, and usu-


BootLdr.exe ally will be re-enabled very quickly. However, Verve’s cur-
rent ­implementation sacrifices real-time interrupt handling
Assembler
Linker ISO generator because of one particularly long function: “GarbageCollect,”
which performs an entire stop-the-world garbage collec-
tion. In the future, we hope to improve real-time behavior by
SafeOS.iso(bootable CD-ROM image)
using a verified incremental collector.10
Such design decisions led us to a small Nucleus API

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the ac m 125


research highlights

consisting of just 20 functions, all shown in Figure 1. 4.1. Specification logistics


These 20 functions, implemented with a total of about We express Verve’s specification as first-order logical formu-
1500 x86 instructions, include memory management las in BoogiePL.1 These formulas follow C/Java/C# syntax
(AllocObject, AllocVector, GarbageCollect, readField, and consist of:
writeField, readStack, and writeStack), stack manage-
ment (GetStackState, ResetStack, YieldTo, Throw), device •  Arithmetic operators: +, −, *, >, ==, !=, ...
access (VgaTextWrite, TryReadKeyboard, StartTimer, •  Boolean operators: !, &&, ||, ==>, ...
SendEoi), fault ­handling (FaultHandler, ErrorHandler, •  Variables: foo, Bar, old(foo), ...
FatalHandler, and InterruptHandler), and startup •  Boolean constants: true, false
(NucleusEntryPoint). Most of the functions are intended •  Integer constants: 5, ...
for use by only the kernel, not by applications. However, •  Bit vector constants: 5bv16, 5bv32, ...
applications may call AllocObject, AllocVector, and •  Function application: Factorial(5), Max(3,7),
Throw directly. IsOdd(9), ...
•  Array indexing: foo[3], ...
4. THE VERIFIED NUCLEUS •  Array update: foo[3 := Bar], ...
To verify that the Nucleus behaves correctly, we have to •  Quantified formulas: (∀i:int::foo[i]==
specify what correct behavior is. Formally, this specifi- Factorial(i)), ...
cation consists of preconditions and postconditions for
each of the 20 functions exported by the Nucleus (Figure BoogiePL bit vectors correspond to integers in C/
1). The preconditions reflect the guarantees made by Java/C#, which are limited to numbers that fit inside a
other components of the system when calling the Nucleus. fixed number of bits. BoogiePL integers, on the other
For example, the precondition to NucleusEntryPoint hand, are unbounded mathematical integers. BoogiePL
describes the state of memory when the Nucleus begins arrays are unbounded mathematical maps from some
execution; the (trusted) boot loader is responsible for type (usually integers) to some other type. Unlike arrays in
establishing this precondition. The preconditions C/Java/C#, BoogiePL arrays are immutable values (there
for functions exported to the kernel and applications are no references to arrays and arrays are not updated
describe the state of registers and the current stack when in place). An array update expression a[x := y] cre-
making a call to the Nucleus; the (trusted) TAL checker ates a new array, which is equal to the old array a at all
is responsible for guaranteeing that these precondi- locations except x, where it contains a new value, y. For
tions hold when the (untrusted) kernel and applications example, (a[x :=  y])[x]  == y and (a[x  := y])
transfer control to the Nucleus. Nucleus postconditions [x + 1] == a[x + 1].
describe changes to the Nucleus state and reflect memory BoogiePL procedures have preconditions and postcondi-
and guarantees the Nucleus makes to the rest of the ker- tions, written as BoogiePL logical formulas:
nel. Because preconditions are relatively weak for certain
functions, the Nucleus must occasionally perform run-
var a:int, b:int;
time checks to validate the values passed from the kernel
procedure P(x:int, y:int)
and applications.
requires a < b && x < y;
The Nucleus specification describes what the Nucleus
modifies a, b;
must do, but does not specify exactly how the Nucleus must
ensures a < b && a == x + old(a);
be implemented. For example, the Verve specification of
{
garbage collection does not specify which algorithm the
a := a + x;
garbage collector should implement. Instead, following
b := b + y;
the approach of McCreight et al.,10 the specification just
}
says that the garbage collector must ensure that the stack
frames and heap objects contain the correct data, with no
dangling pointers. We have built a verified mark-sweep and In this example, the procedure P can be called only in
a verified copying collector for Verve, both obeying this same a state where global variable a is less than global variable
specification. b, and the parameter x is less than the parameter y. Upon
The Nucleus interacts with five components: memory, exit, the procedure’s postconditions ensure that a is still
hardware devices, the boot loader, interrupt handling, and less than b, and that a is equal to x plus the old version of a
TAL code (kernel and application code). Memory and hard- (before P executed). Note that the procedure must explicitly
ware devices export functionality to the Nucleus, such as reveal all the global variables that it modifies (“modifies
the ability to read memory locations and write to hardware a, b;” in this example), so that callers to the procedure will
devices. The verification process ensures that the Nucleus be aware of the modifications.
satisfies the preconditions to each operation on memory The Boogie tool relies on the Z34 automated theorem
and hardware. In turn, the Nucleus exports functionality to prover. Z3 automatically checks logical formulas involving
the boot loader (the Nucleus entry point), the interrupt han- linear integer arithmetic (addition, subtraction, compari-
dling (the Nucleus’s interrupt handlers), and the TAL code son), arrays, bit vectors, and functions. Z3 checks integer
(AllocObject, YieldTo, etc.). formulas faster than it checks bit-vector formulas, so  we

126 commun icat ion s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


chose integers over bit vectors where possible. Z3 also
procedure Load(ptr:int) returns (val:int);
checks quantified formulas (formulas with forall and
  requires memAddr(ptr);
exists); however, Z3 relies on programmer-supplied hints
  requires Aligned(ptr);
in the form of triggers to help with quantifiers, since check-
  modifies Eip;
ing quantified formulas with arithmetic and arrays is unde-
  ensures word(val);
cidable in general.
  ensures val == Mem[ptr];
For each function exported to the boot loader, inter-
rupt handling, and TAL code, the Nucleus implements
procedure Store(ptr:int, val:int);
a BoogiePL procedure whose specification is given in
  requires memAddr(ptr);
terms of “requires,” “ensures,” and “modifies” clauses.
  requires Aligned(ptr);
The Nucleus implements these procedures in terms of
  requires word(val);
more primitive hardware procedures: each BoogiePL pro-
  modifies Eip, Mem;
cedure exported from the memory and hardware devices
  ensures Mem == old(Mem)[ptr := val];
to the Nucleus corresponds to exactly one assembly-­
language instruction, such as an instruction to read a
single memory location or write to a single hardware reg- Each of these two operations requires a 4-byte-aligned
ister. The rest of this section presents specifications for pointer (“Aligned(...)”) to memory inside the general-­
various BoogiePL procedures in Verve, and describes how purpose memory region (“memAddr(...)”). The loaded or
the Nucleus implementation is verified against these stored value must be in the range 0 . . . 232 − 1 (“word(...)”).
specifications. Any Store operation updates the contents of Mem, so that
subsequent Load operations are guaranteed to see the
4.2. Memory updated value. Loads and stores have an additional side
Verve’s initial memory layout is set by the boot loader. effect, noted in the modifies clause: they modify the current
Verve uses an off-the-shelf boot loader, which sets up instruction pointer (program counter), “Eip.”
an initial virtual-memory address space (i.e., it sets up a
page table), loads the executable image into memory, and 4.3. Hardware devices
jumps to the executable’s entry point, passing detailed Verve supports four basic hardware devices: a program-
information about the memory layout to the entry point. mable interrupt controller (PIC), a programmable interval
The boot-loader-supplied address space simply maps timer (PIT), a VGA text screen, and a keyboard. Verve speci-
virtual memory directly to physical memory, except for a fies the interaction with this hardware using unbounded
small range of low addresses that are left unmapped (to streams of events. The Nucleus delivers events to the PIC,
catch null pointer dereferences). A traditional operating PIT, and screen, and it receives events from the keyboard.
system would create new virtual memory address spaces For the screen, the events are commands to draw a charac-
to protect applications from each other. Because Verve ter at a particular position on the screen. For the keyboard,
guarantees type safety, however, it can rely on type safety events are keystrokes received from the keyboard. For the
for protection and keep the initial boot-loader-supplied PIC and PIT, particular sequences of events initialize inter-
address space. rupt handling and start timers.
Verve’s mapped address space consists of three We present the keyboard specification as an exam-
parts. First is the memory occupied by the executable ple. Verve represents the stream of events from the key-
image, including code, static fields, method tables, board as an immutable array KbdEvents mapping
and memory layout information for the garbage collec- event sequence numbers (represented as integers, start-
tor. Verve may read this memory, but may write to only ing from 0) to events (also represented as integers). As
the static fields, not the code, method tables, or layout the Nucleus ­queries the keyboard, it discovers more
­information. Second, the Verve specification reserves the and more events from the stream. Two indices into the
memory just above the executable image for the inter- array, KbdAvailable and KbdDone, indicate the state
rupt table. Verve may write to the table, but it can write of  the Nucleus’s ­interaction with the keyboard. Events
only values that obey the specification for interrupt han- 0...  KbdDone-1 have already been read by the Nucleus,
dlers. Third, the remaining memory above the interrupt while  events KbdDone...KbdAvailable-1 are avail-
­handler is ­general-purpose memory, free for arbitrary able to read but have not yet been read.
use. The ­specification describes the state of general-­ Two operations, KbdStatusIn8 and KbdDataIn8,
purpose memory using a global variable Mem, which is query the keyboard. Each of these procedures represents
an array that maps integer byte addresses to integer val- a single x86 assembly-language 8-bit I/O instruction, and
ues. For any 4-byte-aligned address i in general-­purpose BoogieAsm translates each call to these procedures into a
memory, Mem[i] contains the 32-bit memory contents single x86 “in” instruction. By invoking KbdStatusIn8,
stored at address i, represented as an integer in the the Nucleus discovers the current state of KbdAvailable
range 0 . . . 232 − 1. and KbdDone. If this operation places a 0 in the eax regis-
Each part of memory exports its own access functions to ter’s lowest bit, then no events are available; if the opera-
the Nucleus. The general memory exports two operations to tion places a 1 in eax’s ­lowest bit, then at least one event is
the Nucleus, Load and Store: available. If the Nucleus can prove that at least one event

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n icat i o ns o f th e acm 127


research highlights

is available (KbdAvailable > KbdDone), it may call BoogieAsm checks that each statement in the veri-
KbdDataIn8 to receive the first available event. fied BoogiePL code corresponds to a simple, prede-
termined sequence of 0, 1, or 2 assembly-language
instructions (e.g., “call  eax := And(eax,1)” corre-
var KbdEvents:[int]int; sponds to “and  eax,  1”), and then transforms the BoogiePL
var KbdAvailable:int, KbdDone:int; code into valid assembly code:
procedure KbdStatusIn8();
  modifies Eip, eax, KbdAvailable;
  ensures and(eax,1)==0 ==> KbdAvailable==KbdDone; _?TryReadKeyboard proc
  ensures and(eax,1)!=0 ==> KbdAvailable> KbdDone;    in al, 064h
procedure KbdDataIn8();    and eax, 1
  requires KbdAvailable > KbdDone;    cmp eax, 0
  modifies Eip, eax, KbdDone;    jne TryReadKeyboard$skip
  ensures KbdDone == old(KbdDone) + 1;    mov eax, 256
  ensures and(eax,255) == KbdEvents[old(KbdDone)];     ret
  TryReadKeyboard$skip:
   in al, 060h
Given primitive x86 operations like Load, Store,    and eax, 255
KbdStatusIn8, and KbdDataIn8, we can implement    ret
and  verify the procedures that make up the Nucleus.
We illustrate this process with a small, but complete,
example—the verified source code implementing Note that some variables in the BoogiePL code, like
TryReadKeyboard from Figure 1, along with a portion of KbdEvents,KbdAvailable, and KbdDone, are “specifi-
its specification: cation variables” that exist only during verification, and do
not exist in the generated assembly code.

procedure TryReadKeyboard(); 4.4. Stacks and threads


  ... Verve supports preemptive multithreading, which
   ensures KbdAvailable==old(KbdDone) ==> eax==256; requires periodic switching between stacks. This stack
    ensures KbdAvailable> old(KbdDone) ==> switching requires the Nucleus specification to be explicit
eax==KbdEvents[old(KbdDone)]; about the flow of control between the kernel and the
Nucleus. Most Nucleus procedures have a simple control
implementation TryReadKeyboard() { flow—the Nucleus procedure performs some work, and then
  call KeyboardStatusIn8(); uses the x86 ret instruction to return directly to the caller
  call eax := And(eax, 1); that called the Nucleus procedure (see TryReadKeyboard
  call Go(); if (eax != 0) {goto skip;} in the previous section, e.g.). However, a few Nucleus proce-
   call eax:=Mov(256); dures return in more complicated ways. Interrupt handlers,
   call Ret(old(RET)); return; for example, use a special x86 “return-from-interrupt”
  skip: instruction, rather than the standard x86 ret instruction.
  call KeyboardDataIn8(); More interestingly, several procedures, including YieldTo
  call eax := And(eax, 255); and interrupt handlers, may return to a caller in a different
  call Ret(old(RET)); return; stack.
} Verve uses a specification variable “RET” to specify
how each procedure must return. RET equals one of two
values: ReturnToAddr(i), which specifies that the pro-
TryReadKeyboard’s specification requires that the cedure must perform a normal return (the x86 ret instruc-
Nucleus return a keystroke (in the range 0–255) if one is tion) to return address i, or ReturnToInterrupted(i,
available, and otherwise return the value 256. The imple- cs, eflags), which specifies that the procedure must
mentation of the TryReadKeyboard function calls perform an interrupt return (the x86 iretd instruction)
KeyboardStatusIn8, performs a bitwise AND operation to return address i, restoring code segment cs and status
to discover the status, and branches based on the status. flags eflags. Most Nucleus procedures are required to
To verify that the implementation meets the specification, return to the return address pushed on the stack, pointed
we simply run the Boogie tool on TryReadKeyboard’s to by the stack pointer esp:
implementation. Boogie queries the Z3 theorem prover
to check that the procedure satisfies its postconditions,
and that all calls inside the procedure satisfy the necessary procedure TryReadKeyboard(...);
pre­conditions. Given the BoogiePL source code, this pro-   requires RET == ReturnToAddr(Mem[esp]);
cess  is entirely automatic, requiring no scripts or human    ...
interactive assistance to guide the theorem prover.

128 commun icat ion s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


The Nucleus entry point, on the other hand, does not abstractions that applications expect from an operating
return to its caller (the boot loader), but instead returns system. Therefore Verve also includes a simple kernel on
to the TAL code located at address KernelEntryPoint, top of the Nucleus, written in C# and compiled to TAL.
so that the kernel begins execution after the Nucleus has This kernel includes standard libraries for thread synchro-
finished initialization: nization and scheduling. It also schedules garbage collec-
tions, bringing each thread to a safe point before invoking
the Nucleus garbage collector.
procedure NucleusEntryPoint(...); The key novelty of Verve’s kernel is that it is written
  requires RET == ReturnToAddr(KernelEntryPoint); entirely in safe code, relying on the Nucleus to perform
   ... operations that require unsafe code in traditional oper-
ating systems, such as memory deallocation and context
switching. In general, the kernel decides policies, such as
The kernel scheduler calls the YieldTo procedure to when a particular thread should run or when a garbage
switch stacks. For example, the kernel’s timer interrupt collection should take place, while the Nucleus imple-
handler calls YieldTo to preempt one thread and switch ments mechanisms like context switching and garbage
to another thread. The exact behavior of YieldTo depends collection. An incorrect policy decision might cause incor-
on the state of the target stack that is being yielded to. Each rect behavior; for example, an incorrect scheduler imple-
stack may be in one of four states at any time: empty, inter- mentation might fail to schedule a thread. However, an
rupted, yielded, or running. When switching to an empty incorrect kernel implementation cannot violate the sys-
stack (a stack with no stack frames), YieldTo is required to tem’s type safety.
switch the stack pointer to the empty stack and then return As an example of a kernel policy, the current kernel
to KernelEntryPoint, which starts a new thread running implements round-robin preemptive threading, allow-
in the empty stack. When switching to a stack that had been ing threads to block on semaphores. The kernel manages
interrupted earlier, YieldTo is required to use return-from- the threads by keeping each thread in a queue, where
interrupt to return control to the interrupted instruction run- one queue holds threads ready to run, one queue holds
ning in the interrupted stack. To switch to a stack that had threads awaiting garbage collection, each semaphore con-
earlier voluntarily yielded, YieldTo uses an ordinary return tains a queue of blocked threads, and so on. Each queue is
instruction. Finally, switching to the currently running implemented as a linked list using standard C# objects,
thread has no effect; in this case, YieldTo simply returns: but queues could be implemented in other ways (e.g.,
using a tree or array to implement priority queues) with-
out requiring changes to the Nucleus. Furthermore, other
procedure  YieldTo(...); policies, such as priority-based scheduling or deadline-
  
requires based scheduling, could be implemented with no changes
(StackState[s]==StackRunning && s==S to the Nucleus.
&&  RET==ReturnToAddr(Mem[esp]))
|| (StackState[s]==StackYielded(...) 6. MEASUREMENTS
&& RET==ReturnToAddr(...)) This section summarizes Verve’s performance, quan-
|| (StackState[s]==StackInterrupted(...) tifies the size of the Nucleus implementation (includ-
&&    
RET==ReturnToInterrupted(...)) ing ­annotations), and describes the time taken to
|| (StackState[s]==StackEmpty verify Verve, for mechanical verification and in terms of
 &&     
RET==ReturnToAddr(KernelEntryPoint) person-months.
&& ...); First, we wrote two simple micro-benchmarks to exer-
... cise the Nucleus’s stack management. We wrote the bench-
marks in C#, compiled them to TAL, verified the TAL code,
and linked them with the kernel and Nucleus. We then ran
In the specification above, stacks are numbered starting them on a 1.8 GHz AMD Athlon 64 3000+ with 1GB RAM,
from 0, where S contains the current running stack and s using the processor’s cycle counters to measure time and
contains the target stack to which YieldTo is switching. averaging over multiple iterations, after warming the
The array StackState maps stacks to their states (empty, caches. We measured for two configurations—Verve built
interrupted, yielded, or running). The Nucleus’s interrupt with a copying collector and Verve built with a mark-sweep
handler “InterruptHandler” has a specification similar (MS) collector:
to YieldTo’s specification, except that its target stack is
always stack number 0, which runs the kernel’s TAL code for Copying (cycles) MS (cycles)
scheduling and interrupt processing.
2*YieldTo   98   98
2*Wait + 2*Signal 216 216
5. KERNEL
The Verve Nucleus implements low-level primitives to
manage memory, switch stacks, and access hardware, The YieldTo benchmark shows that the Verve Nucleus
but by itself the Nucleus does not provide the high-level requires 98 cycles to switch from one stack to another

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n icat i o ns o f th e acm 129


research highlights

and back (49 cycles per invocation of YieldTo). The ker- l­ ess-automated interactive proof system designed for
nel builds thread scheduling and semaphores on top of non-first-order logics, is  the  key  reason for this relatively
the raw Nucleus YieldTo operation. Using semaphore small amount of ­annotation: since Verve’s annotations are
wait and signal operations, it takes 216 cycles to switch written in BoogiePL’s first-order logic, the Z3 first-order
from  one thread to another and back (108 cycles per logic theorem prover is able to automatically prove prop-
thread switch). The wait/signal performance is compa- erties that require thousands of lines of manual scripts in
rable to the round-trip IPC performance of fast micro- interactive proof systems.
kernels such as L4 (242 cycles on a 166 MHz Pentium9) It takes 272 s for the Boogie/Z3 tools to verify all the
and seL4 (448 cycles on an ARM processor8), although in Nucleus components, including both the mark-sweep and
fairness, IPC involves an address space switch as well as copying collectors, on a 2.4 GHz Intel Core2 with 4GB of
a thread switch. memory. The vast majority of this time is spent on verifying
We next present the size of various parts of the Nucleus the collectors; only 33 s were required to verify the system’s
specification and implementation. All measurements are other components.
lines of BoogiePL code, after removing blank lines and This small verification time gave us the freedom to
­comment-only lines. The following table shows the size of experiment with different designs. For example, mid-way
various portions of the trusted specification: through the project, we switched from an implementa-
tion based on blocking Nucleus calls to an implementa-
Basic definitions 61 tion based on non-blocking Nucleus calls. We were able to
Memory and stacks 116 make such changes in days rather than months, because we
Interrupts and devices 111 could make minor changes to large, Nucleus-wide invari-
x86 instructions 126 ants and then run the automated theorem prover to quickly
GC tables and layouts 317
Nucleus GC, allocation functions 239
re-verify the entire Nucleus. In the end, the Verve design,
Nucleus other functions 215 implementation, and verification described in this paper
Total BoogiePL lines 1185 took just 9 person-months, spread between two people.

7. RELATED WORK
Overall, 1185 lines of BoogiePL is fairly large, but most The Verve project follows in a long line of operating system
of this is devoted to definitions about the hardware plat- and runtime system verification efforts. More than 20 years
form and memory layout. The GC table and layout informa- ago, the Boyer–Moore mechanical theorem prover was used
tion, originally defined by the Bartok compiler, occupies to verify a small operating system (Kit) and a small high-level
a ­substantial fraction of the specification. The specifica- language implementation,2 although the Kit OS was too lim-
tions  for all the functions exported by the Nucleus total ited to run on commodity hardware and to support standard
239 + 215 = 454 lines. programming languages.
We measured the size of the Nucleus implementation More recently, the seL4 project verified all of the C code
for two configurations of Verve, one with the copying col- for an entire microkernel.8 The seL4 microkernel contains
lector and one with the mark-sweep collector (note that 8700 lines of C code, substantially larger than earlier veri-
the trusted specifications are the same for both collec- fied operating systems like Kit. This allows seL4 to imple-
tors); 1610 lines of BoogiePL are shared between the two ment realistic primitives for page table management,
configurations: multithreading, capabilities, and message passing, so that
it can securely run realistic user-mode applications, written
Copying MS in standard languages like C, on real hardware. The features
supported by seL4 are comparable, though not identical,
Shared BoogiePL lines 1610 1610
Private BoogiePL lines 2699 3243 to those supported by Verve: seL4 pages are analogous
Total BoogiePL lines 4309 4854 to Verve objects, seL4 capabilities are analogous to Verve
Specification BoogiePL lines 1185 1185 object references, seL4 messages are analogous to Verve
Total BoogiePL lines w/spec 5494 6039 method invocations, and seL4 threads are similar to Verve
x86 instructions 1377 1489
BoogiePL/x86 ratio 3.1 3.3
threads. The verified seL4 microkernel is substantially
BoogiePL + spec/x86 ratio 4.0 4.1 larger than the verified Verve Nucleus (8700 lines of C vs.
1400 x86 instructions). On the other hand, the verification
effort required by seL4 was larger than the effort required
In total, each configuration contains about 4500 lines by Verve: they report 20 person-years of research devoted
of BoogiePL. From these, BoogieAsm extracts about 1400 to developing their proofs, including 11 person-years spe-
x86 instructions. This corresponds roughly to a 3-to-1 cifically for the seL4 code base. The proof required 200,000
ratio (or 4-to-1 ratio, if the specification is included) of lines of Isabelle scripts—a 20-to-1 script-to-code ratio. We
BoogiePL to x86 instructions (or, roughly, 2-to-1 or 3-to-1 hope that while seL4 demonstrates that realistic microker-
ratio of nonexecutable annotation to executable code). nels are within the reach of interactive theorem proving,
This is about an order of magnitude fewer lines of anno- Verve demonstrates that automated theorem proving can
tation and script than related projects.5, 8 The choice provide a less time-consuming alternative to interactive
of using Boogie/Z3 and first-order logic, rather than a theorem proving for realistic systems software verification.

130 c ommun ication s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | n o. 1 2


The FLINT project sets an ambitious goal to build Nucleus to support a large class of PCI devices, using an
foundational certified machine code, where certifica- IO memory management unit (IO/MMU) to protect Verve’s
tion produces a proof about an executable program, memory from errant devices. These extensions added less
and a very small proof checker can be used to verify the than 400 assembly-language instructions to the Nucleus,
proof.5 Such a system would have a much smaller trusted and required less than 1000 lines of BoogiePL code and
computing base than Verve. So far, such a foundational just a couple man-months to implement. While many chal-
approach has been labor-intensive, limiting the scope lenges remain for future work (such as verified incremental
of the current implementation, which currently does ­garbage collection10 and support for multicore machines),
not support standard programming languages and does our experience so far gives us confidence that Verve’s
not have more than a few hundred lines of application approach will scale to larger systems and more features
code. We hope that future advances in theorem proving without requiring excessive human effort and without sac-
technology will combine foundational proofs with more rificing type safety.
automated verification.
Acknowledgments
8. CONCLUSION AND FUTURE WORK We would like to thank Jeremy Condit, Galen Hunt, Ed
Using a combination of TAL and automated theorem prov- Nightingale, Don Porter, Shaz Qadeer, Rustan Leino, Juan
ing, we have completely verified the safety of Verve at the Chen, Gregory Malecha, and David Tarditi for their sugges-
assembly-language level, and completely verified the cor- tions and assistance.
rectness (excepting termination) of Verve’s Nucleus at the
assembly-language level. So what happens when we boot
References
and run Verve? Since it is verified, did it run perfectly (or at 1. Barnett, M., Chang, B.-Y.E., DeLine, collectors. In POPL (2009),
least safely?) every time we ran it? Almost—the good news R., Jacobs, B., Leino, K.R.M. Boogie: 441–453.
A modular reusable verifier for 8. Klein, G., Elphinstone, K., Heiser,
is that as far as we know, every execution of Verve has run object-oriented programs. In Formal G., Andronick, J., Cock, D., Derrin,
in accordance with its specification. The bad news is that Methods for Components and P., Elkaduwe et al. seL4: Formal
Objects (FMCO) (Amsterdam, the verification of an OS kernel. In
the specification itself may contain bugs, or at least may Netherlands, 2006), volume 4111. Proceedings of the 22nd ACM
contain unwarranted assumptions about the world that 2. Bevier, W.R., Hunt Jr., W.A., Moore, J.S., Symposium on Operating Systems
Young, W.D. An approach to systems Principles (SOSP) (Big Sky, MT, Oct.
Verve interacts with. In fact, we did encounter two viola- verification. J. Autom. Reason. 5, 4 2009), ACM, 207–220.
tions of type safety, due to violations of the assumptions (1989), 411–428. 9. Liedtke, J., Elphinstone, K., Schönberg,
3. Chen, J., Hawblitzel, C., Perry, S., Härtig, H., Heiser, G., Islam, N.,
made by the specification. First, Verve initially included F., Emmi, M., Condit, J., Coetzee, Jaeger, T. Achieved ipc performance
an off-the-shelf, unverified debugger stub, written in C++. D., Pratikakis, P. Type-preserving (still the foundation for extensibility).
compilation for large-scale In Proceedings of the 6th Workshop
The presence of the C++ code undermined Verve’s assump- optimizing object-oriented on Hot Topics in Operating Systems
tions about memory, so we eventually decided to banish compilers. SIGPLAN Not. 43, 6 (HotOS-VI) (Cape Cod, MA, May 5–6,
(2008), 183–192. 1997).
the debugger stub from Verve. Second, a linking issue 4. de Moura, L.M., Bjørner, N. Z3: An 10. McCreight, A., Shao, Z., Lin, C., Li, L.
efficient SMT solver. In TACAS (2008), A general framework for certifying
caused Bartok’s relocation information for GC tables to 337–340. garbage collectors and their
get dropped, resulting in incorrect return addresses in the 5. Feng, X., Shao, Z., Dong, Y., Guo, mutators. In PLDI (2007),
Y. Certifying low-level programs 468–479.
GC tables at run time. After these two issues were resolved, with hardware interrupts and 11. Morrisett, G., Walker, D., Crary, K.,
Verve ran without incident. preemptive threads. In PLDI Glew, N. From System F to typed
(2008), 170–182. assembly language. In POPL '98:
Compared to the traditional operating system devel- 6. Ford, B., Hibler, M., Lepreau, 25th ACM Symposium on Principles of
opment, where vast numbers of bugs are discovered only J., McGrath, R., Tullmann, P. Programming Languages (Jan. 1998),
Interface and execution models 85–97.
at run time, the number of bugs encountered at run time in the Fluke kernel. In OSDI (1999), 12. Turing, A. Checking a large routine.
in Verve has been extremely small. Nevertheless, the fact 101–115. In The Early British Computer
7. Hawblitzel, C., Petrank, E. Automated Conferences, MIT Press, Cambridge,
that any bugs existed is a good motivation for trying to verification of practical garbage MA, 1989, 70–72.
reduce the size and complexity of the specification, for
reducing the number of trusted components (such as Jean Yang ( jeanyang@csail.mit.edu), Chris Hawblitzel (chris.hawblitzel@
the debugger stub and the linker), and for more system- Massachusetts Institute of Technology, microsoft.com), Microsoft Research,
Cambridge, MA. Redmond, WA.
atically testing the specification.7, 8 It also motivates the
idea of verifying the verification tools themselves: veri-
fying the  TAL checker  (which currently contains over
12,000 lines of  trusted code3) against a small specifica-
tion would particularly help to reduce Verve’s trusted
computing base.
Based on the core Verve features described in this paper,
we have implemented or ported approximately 35,000
lines of safe C# code to run on top of the Verve kernel and
Nucleus. This C# code, which Bartok compiles to verifi-
able TAL code, includes various System libraries from .NET
and support for networking protocols like ARP, IP, and
UDP. To enable running networking protocols safely on
real networking hardware, we recently extended Verve’s © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 131


research highlights
doi:10.1145/2043174.2 0 4 3 1 9 8

Technical Perspective
Anonymity Is Not Privacy
By Vitaly Shmatikov

We live in an era of data abundance. nymized graph, yet without knowing


Every aspect of our online and offline the pattern, it is difficult to determine
behavior—every click on a Web site, The following whether the graph contains such a
every relationship in online social paper is an object structure. This idea is very powerful
networks, every bit of information we because graph structure is not a “per-
disclose about ourselves—is captured lesson in how sonally identifiable” attribute by any
and analyzed by multiple entities, to do data privacy meaning of the term. Nevertheless,
ranging from Internet service provid- Backstrom et al. show how it can be
ers to advertising agencies to credit research. used to reidentify (sub)graphs that
bureaus. With this dramatic increase have been anonymized according to
in data collection, the companies the best legal standards and satisfy
holding our data face the responsibil- the strongest anonymity properties.
ity for protecting our privacy, espe- Their third contribution is a new
cially as they sell and exchange infor- class of attacks on anonymity; in par-
mation about us. ticular, an active attack in which the ad-
Existing privacy protection tech- cords is to link them with an external versary deliberately introduces random
nologies are overwhelmingly based dataset by matching common demo- links into the social network so that the
on anonymization. They remove a few graphic attributes. As a consequence, resulting subgraph can be recognized
data attributes (such as names) that anonymization is easily broken by even after all information about identi-
could be used to identify individu- creative adversaries who use a differ- ties has been erased from the network.
als and consider the resulting anony- ent attack model. Existing privacy technologies fail to
mized datasets safe from privacy vio- The following paper by Lars Back- account for the possibility that the ad-
lations. This approach is pervasive in strom, Cynthia Dwork, and Jon Klein- versary may influence the data prior to
academic literature, as well as indus- berg is a landmark in privacy research anonymization and thus do not provide
try practices. Whether it’s the chief because it asks all of the above ques- a defense against this threat.
privacy officer of a major online social tions and gives unexpected answers. The era of data abundance is bring-
network testifying to a U.S. Senate The authors demonstrate fragility of ing new kinds of sensitive data about
committee that there is a critical dis- data anonymization, invent several individuals, new understanding of
tinction between the use of informa- new techniques for reidentifying an- privacy risks, new attacks, and new
tion in “personally identifiable form” onymized nodes in social networks, defenses. This work provides us with
and the use, sharing, and dissemina- and radically change our understand- valuable insights in all of these areas.
tion of information in “non-personal- ing of what constitutes personally By showing that the basic structure
ly identifiable form,” or a popular Web identifiable information. of our social relationships can be as
site informing its customers that it Their first contribution is to inves- identifying as a name, they debunk
shares non-personally identifiable in- tigate the meaning of anonymity in the naive belief that simple removal of
formation about them with hundreds graph-structured data, which are very identifiers renders the data non-per-
of advertisers, the safe-keepers of our different from relational datasets tradi- sonally identifiable. The authors carry
data act as if anonymity were equiva- tionally considered in privacy research. out a rigorous theoretical analysis of
lent to privacy. But is it, really? They focus on online social networks, anonymity in social networks (includ-
To build meaningful protections but their results apply broadly to tele- ing interesting connections to graph
for sensitive individual data, we must phone call graphs, survey data, and, theory) and accompany it by the em-
ask the right questions. What does it in general, almost any dataset con- pirical evaluation of their reidentifica-
mean to compromise privacy? How taining information about relation- tion techniques on a large, real-world
can a potential adversary access and/ ships between people. social network of the LiveJournal blog-
or influence the data, both before Their second contribution is the ging service. Their paper is an object
and after anonymization? What are insight that the basic topological lesson in how to do data privacy re-
the adversary’s capabilities, and what structure of the social graph can act as search. It should be required reading
information might she employ to an identifier. They show that patterns for anyone interested in this area.
reverse anonymity? Unfortunately, of social links—whether arising natu-
many existing privacy technologies rally or artificially introduced into Vitaly Shmatikov (shmat@cs.utexas.edu) is an associate
professor at the University of Texas at Austin.
suffer from a certain poverty of imagi- the social network by the adversary—
nation. For example, they assume the tend to be unique and efficiently rec-
only way to reidentify anonymized re- ognizable even in a completely ano- © 2011 ACM 0001-0782/11/12 $10.00

132 commun icat ions of t h e ac m | d ec emb e r 2 0 1 1 | vo l. 5 4 | no. 1 2


doi:10.1145/2043174 . 2 0 4 3 1 9 9

Wherefore Art Thou R3579X?


Anonymized Social Networks,
Hidden Patterns, and
Structural Steganography
By Lars Backstrom, Cynthia Dwork, and Jon Kleinberg

Abstract is roughly as follows: while the social network labeled with


In a social network, nodes correspond to people or other actual names is sensitive and cannot be released, there
social entities, and edges correspond to social links between may be considerable value in allowing researchers to study
them. In an effort to preserve privacy, the practice of anony- its structure. For such studies, researchers are not specifi-
mization replaces names with meaningless unique identi- cally interested in “who” corresponds to each node, but in
fiers. We describe a family of attacks such that even from the properties of the graph, such as its connectivity, node-
a single anonymized copy of a social network, it is possible to-node distances, frequencies of small subgraphs, or the
for an adversary to learn whether edges exist or not between extent to which it can be clustered. Anonymization is thus
specific targeted pairs of nodes. intended to exactly p­reserve the pure unannotated structure
of the graph while suppressing the “who” information.
Can this work? The hope is that being handed an ano-
1. INTRODUCTION nymized picture of a social network—just a graph with a
­random identifier attached to each node—is roughly akin
1.1. Anonymized social networks to being given the complete social network of Mars, with
Digital traces of human social interactions can now be found the true Martian names attached to the nodes. Intuitively,
in a wide variety of online settings, and this has made them the names are meaningless to earth-dwellers: we do not
rich sources of data for large-scale studies of social networks. “know” the Martians, and it is completely irrelevant to us
While a number of these online data sources are based on whether a given node in the graph is labeled “Groark” or
publicly crawlable blogging and social networking sites, “Zoark.” The difficulty with this metaphor, of course, is
where users have explicitly chosen to publish their links to that anonymous social network data almost never exists in
others, many of the most promising opportunities for the the absence of ­outside context, and an adversary can poten-
study of social networks are emerging from data on domains tially combine this knowledge with the observed structure
where users have strong expectations of privacy—these to begin compromising privacy, de-anonymizing nodes, and
include email, phone, and messaging networks, as well as even learning the edge relations between explicitly named
the link structure of closed (i.e., “members-only”) online (de-anonymized) individuals in the system. Moreover, such
communities. As a useful working example, consider a “com- an adversary may in fact be a user (or set of users) of the sys-
munication graph,” in which nodes are email addresses, and tem that is being anonymized.
there is a directed edge (u, v) if u has sent at least a certain For distinguishing among ways in which an adversary
number of email messages or instant messages to v, or if v might take advantage of context, it is useful to consider
is included in u’s address book. Here, we will be considering an analogy to the distinction between passive attacks and
the “purest” form of social network data, in which there are active attacks in cryptanalysis—that is, between attacks
simply nodes corresponding to individuals and edges indi- in which an adversary simply observes data as it is pre-
cating social interaction, without any further annotation sented, and those in which the adversary actively tries to
such as timestamps or textual data. affect the data to make it easier to decipher. In the case of
In designing studies of such systems, one needs to set anonymized social networks, passive attacks are carried
up the data to protect the privacy of individual users while out by individuals who try to learn the identities of nodes
preserving the global network properties. This is typically only after the anonymized network has been released.
done through anonymization, a simple procedure in which
each individual’s “name”—for example, email address,
phone number, or actual name—is replaced by a random
A previous version of this paper was published in
user ID, but the connections between the (now anonymized)
the  Proceedings of the 16th International Conference on
­people—encoding who spoke together on the phone,
World Wide Web: WWW2007 (Banff, Alberta, Canada,
who corresponded with whom, or who instant-messaged
May 8–12, 2007).
whom—are revealed. The motivation behind anonymizing

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the acm 133


research highlights

In  contrast, an adversary in an active attack tries to com- These trade-offs naturally suggest the design of hybrid
promise privacy by strategically ­creating new user accounts “semi-passive” attacks, in which a user of the system creates
and links before the anonymized network is released so no new accounts but simply creates a few additional out-
that these new nodes and edges will then be present in the links to targeted users before the anonymized network is
anonymized network. released. As we show later, this can lead to privacy breaches
on a scale approaching that of the active attack, without
1.2. The present work: Attacks on anonymized requiring the creation of new nodes.
social networks In the next section, we provide some background and con-
In this paper, we present both active and passive attacks on text for our work in terms of the broader area of data privacy.
anonymized social networks, showing that both types of We then present our two main classes of active attacks on
attacks can be used to reveal the true identities of ­targeted anonymized social networks; we refer to them as walk-based
users, even from just a single anonymized copy of the attacks and cut-based attacks, with the names reflecting the
­network, and with a surprisingly small investment of effort underlying techniques being used. We then describe the use
by the attacker. of passive attacks and conclude with a general discussion.
We describe active attacks in which an adversary chooses
an arbitrary set of users whose privacy it wishes to violate, 2. RELATED WORK
creates a small number of new user accounts with edges to This work fits within a growing literature that has considered
these targeted users, and creates a pattern of links among ways in which private online data can be divulged against
the new accounts with the goal of making it stand out in the users’ wishes, via carefully devised privacy-breaching attacks.
anonymized graph structure. The adversary then efficiently Such attacks have been based on a variety of features in the
finds these new accounts together with the targeted users data; for example, the queries entered by users into search
in the anonymized network that is released. At a theoretical engines can be used to uniquely identify them,17 and the
level, the creation of nodes by the attacker in an writing styles of users in online discussion can likewise be
n-node network can begin compromising the privacy of arbi- used to find the same person writing under different pseud-
trary targeted nodes, with high probability for any network; onyms.22 Temporal data can also be an effective feature in
in experiments, we find that on a 4.4-million-node social privacy-breaching attacks: since it is unlikely for two users to
network, the creation of 7 nodes by an attacker (with degrees perform a nontrivial set of actions at almost exactly the same
comparable to those of typical nodes in the network) can sets of times, the sequence of times at which a user performs
compromise the privacy of roughly 2400 edge relations on these actions becomes a type of identifying signature.20
average. Moreover, experimental evidence suggests that it We note that in our case, both the passive and active
may be very difficult to determine whether a social network attackers do not have access to highly resolved data like
has been compromised by such an active attack. timestamps or other textual or numerical attributes; they
We also consider passive attacks, in which users of the can only use the binary information about who links to
system do not create any new nodes or edges—they simply whom, without other node attributes, and this makes their
try to find themselves in the released network and from this task more challenging. Indeed, the secret subgraph H con-
to discover the existence of edges among users to whom structed as part of our attacks can be thought of as a kind
they are linked. In the same 4.4-million-node social network of structural steganography, hiding secret messages for later
dataset, we find that for the vast majority of users, it is possi- recovery using just the social structure of G.
ble for them to exchange structural information with a small In this way, our approach can be seen as a step toward
coalition of their friends and subsequently uniquely iden- understanding how fundamental techniques of data privacy
tify the subgraph on this coalition in the ambient network. (see, e.g., Dwork9 and the references therein) can inform
Using this, the coalition can then compromise the privacy of how we think about the protection of even the most skele-
edges among pairs of neighboring nodes. tal social network data. We discuss this further in the final
There are some obvious trade-offs between the active and section.
passive attacks. The active attacks have more potent effects, In the time since the conference proceedings version
in that they are guaranteed to work with high probability in of our work appeared, there has been continued research
any network (they do not force users to rely on the chance exploring mechanisms by which private data can be
that they can uniquely find themselves after the network is revealed online. Concurrent with our work, Hay et al.15
released), and the attacker can choose any users it wants to considered a set of methods for identifying nodes in ano-
target. On the other hand, while the passive attack can only nymized social networks by looking at successively larger
compromise the privacy of users linked to the attacker, it neighborhoods of a node. More recently, Narayanan and
has the striking feature that this attacker can simply be a Shmatikov21 have shown how access to multiple networks
user of the system who indulges his or her curiosity; there containing overlapping sets of people can enable approaches
is no observable “wrongdoing” to be detected. Moreover, to de-anonymization based on approximately aligning the
since we find in practice that the passive attack will succeed portions of the networks that overlap.
for the majority of the population, it says in effect that most In a related but different direction, several lines of recent
people in a large social network have laid the groundwork work have shown how the principle of homophily—that
for a privacy-breaching attack simply through their every- neighbors in social networks have similar characteristics—
day actions, without even realizing it. can be used to discover private information: even if a user

134 commun icat ion s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


keeps profile information private, it can often be estimated accounts and the creation of links to existing accounts. In
from attributes in public profiles maintained by his or her this sense, while the active attacker’s aims are nefarious
friends.16, 19, 25 Finally, recent work shows how social links (and, in almost any imaginable scenario, prohibited either
themselves can be discovered from information about the by research ethics guidelines or the terms of service of the
times and places at which individuals perform activities; system, or both), none of the individual steps from which
using data from a photo-sharing site, Crandall et al.6 showed the attack is constructed could be viewed at a syntactic
that if two users uploaded photos from approximately the level as “breaking into” parts of the system where it is not
same location at approximately the same time on multiple allowed.
occasions, there was a sharply increased probability that A number of technical ingredients are needed in order to
they were linked on the site’s social network. make this attack work based on whether certain subgraphs
have the same structure as each other and whether they have
3. THE WALK-BASED ATTACK any internal symmetries. To express such questions, we use
To set the stage for our first active attack, we begin with the following terminology. For a set of nodes S, we let G[S]
some definitions and notation. We assume the social net- denote the subgraph of G induced by the nodes in S. An iso-
work is an n-node graph G = (V, E), representing interactions morphism between two sets of nodes S and S¢ in G is a one-to-
in an online system. Nodes correspond to user accounts one correspondence f: S ® S¢ that maps edges to edges and
and an edge (u, v) indicates that u has communicated with non-edges to non-edges: (u, v) is an edge of G[S] if and only
v (again, consider the example of an email or instant mes- if (  f (u), f (v) ) is an edge of G[S¢]. In this case, G[S] and G[S¢]
saging ­network). The attacks become easier to carry out if are isomorphic—they are the same graph up to relabeling.
the released graph data is directed; for most of the paper, we An automorphism is an isomorphism from a set S to itself—
will therefore consider the harder case of undirected graphs, a relabeling of the nodes f: S ® S that preserves graph’s
in which we assume that the curator of the data—the agent structure. An automorphism f is nontrivial if it is not the
that releases the anonymized network—eliminates the identity function.
directions on the edges. Thus, the construction of H succeeds if

3.1. Description of the attack   (i) There is no S ¹ X such that G[S] and G[X] = H are
Let us consider the problem from the perspective of the isomorphic.
attacker. For ease of presentation, we begin with a slightly  (ii) The subgraph H can be efficiently found, given G.
simplified version of the attack and then show how to extend (iii) The subgraph H has no nontrivial automorphisms
it to the attack we really use. Recall that as an attacker, our
basic approach is to create a set of new user accounts with If (i) holds, then any copy of H we find in G must in fact be
links among them that will “stand out” when the ano- the one we constructed; if (ii) holds, then we can in fact find
nymized graph is released. Thus, we first choose a set of the copy of H quickly; and if (iii) holds, then once we find H,
k = Q (log n) named users, W = {w1, . . ., wk}, that we wish to we can correctly label its nodes as x1, . . ., xk, and hence find
target in the network—we want to learn all the pairs (wi, wj) w1, . . ., wk.
for which there are edges in G. We create a set of k new user The full construction is almost as described above, with
accounts, X = {x1, . . ., xk}, which will appear as nodes in the the following three additions. First, the size of the targeted
system. We include each undirected edge (xi, xj) indepen- set W can be larger than k. The idea is that rather than con-
dently with probability 1/2. This produces a random graph necting each wi with just a single xi, we can connect it to a
H on X. subset Ni ⊆ X, as long as wi is the only node in G – H that is
We also create an edge (xi, wi) for each i. (In terms of attached to precisely the nodes in Ni—this way wi will still be
the underlying social network, this involves having xi send uniquely identifiable once H is found. Second, we will explic-
wi a message, or include wi in an address book, or some itly randomize the number of links from each xi to G – H, to
other activity depending on the nature of the network.) For help in finding H. And third, to recover H, it is helpful to
describing the basic version of the attack, we also assume be able to traverse its nodes in order x1, x2, . . ., xk. Thus, we
that, because the account xi corresponds to a fake identity, it deterministically include all edges of the form (xi, xi + 1) and
will not receive messages from any node in G – H other than ­randomly construct all other edges.
potentially wi, and thus will have no link to any other node in The Construction of H. With this informal discussion in
G – H. We will see later that the attack can be made to work mind, we now give the full specification of the attack.
even when this latter assumption does not hold. (1) We choose k = (2 + d ) log2 n, for a small constant d
When the anonymized graph G is released, we need to find > 0, to be the size of X. We choose two constants
our copy of H, and to correctly label its nodes as x1, . . ., xk. d0 ≤ d1 = O(log n), and for each i = 1, 2, . . ., k, we
Having found these nodes, we then find wi as the unique choose an external degree Di ∈ [d0, d1] specifying
node in G – H that is linked to xi. We thus identify the full the number of edges xi will have to nodes in G – H.
labeled set W in G, and we can simply read off the edges Each Di can be chosen arbitrarily, but in our experi-
between its elements by consulting G. ments with the algorithm, it works well simply to
It is worth noting that this type of attack only involves choose each Di independently and uniformly at
the use of completely innocuous operations in the context ­random from the interval [d0, d1].
of the system being compromised—the creation of new (2) Let W = {w1, w2, . . ., wb} be the users we wish to target,

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the ac m 135


research highlights

for a value b = O(log2 n). We also choose a small integer There will typically be an extremely large number of
constant c (c = 3 will suffice in what follows). For each ­ istinct k-node paths in G, so we need to organize the
d
targeted node wj, we choose a set Nj ⊆ {x1, . . ., xk} such computation carefully in order for the search algorithm to
that all Nj are distinct, each Nj has size at most c, and run efficiently. We do this as follows:
each xi appears in at most Di of the sets Nj. (This gives
the true constraint on how large b = O(log2 n) can be.) • First, we loop over all nodes v of G, trying each as the
We construct links to wj from each xi ∈ Nj. candidate starting point y1 for the path P (the node that
(3) Before generating the random internal edges of H, we will correspond to x1 in H). If the degree of v is not equal
add arbitrary further edges from H to G – H so that each to , then we skip v in this process, since it cannot cor-
node xi has exactly Di edges to G – H. We construct respond to the node x1 in H.
these edges subject only to the following condition: • For each node v of degree , in G, we will organize all
for each j = 1, 2, . . ., b, there should be no node in G – H paths originating at y1 = v into a search tree τv in the natu-
other than wj that is connected to precisely the nodes ral way: each node a in τv, at depth ℓ, will correspond to
in Nj. an ℓ-node path in G, starting at y1 = v, that has not yet failed
(4) Finally, we generate the edges inside H. We include any of the degree or internal structure tests.
each edge (xi, xi+1), for i = 1, . . ., k − 1, and we include • We grow τv one level at a time. For each node a of τv,
each other edge (xi, xj) independently with probability at depth ℓ, corresponding to an ℓ-node path P = {v = y1,
1/2. Let be the degree of xi in the full graph G (this is y2, . . ., yℓ} in G, we first check whether yℓ passes the degree
Di plus its number of edges to other nodes in X). and internal structure tests. If it does not, we declare a
to be a leaf of τv. If it does pass, then we create a new
This concludes the construction. As a first fact, we note child a ¢ of a in τv for each way of extending P by adjoin-
that standard results in random graph theory (see, e.g., ing a neighbor of yℓ that does not already appear on P.
Bollobás5) imply that with high probability, the graph H
has no nontrivial automorphisms. We will assume hence- If τv ever acquires a node at depth k, then this corresponds to
forth that this event occurs, that is, that H has no nontrivial a k-node path in G that has passed all of our tests, and hence is
automorphisms. a copy of H. Conversely, if there is such a path P originating at
We also note that the attack will work even if multiple v, then our tree-growing procedure will continue adding nodes
copies of the construction are carried out simultaneously. to τv until it produces a node at depth k corresponding to P.
That is, we can choose different sets of nodes to attack, W1, Note that the total running time of this algorithm is only
W2, . . ., Wt, each of size Q (log n); for each Wi, we add a distinct a small factor larger than the total number of nodes in all
set of new nodes Xi to the graph G, building a graph Hi on search trees τv (summed over all nodes v in G), and so a key
each Xi with the different random constructions performed issue in the analysis is to show that with high probability, the
independently. total number of nodes in all τv is not too large.
Efficiently Recovering H Given G. When the graph G is
released, we want to identify H: that is, we want to find the 3.2. Analysis
subset of nodes of G that correspond to the set of nodes To prove the correctness and efficiency of the attack, we
x1, x2, . . ., xk of H. Since we have constructed H to contain a show two things: with high probability, the construction
path through the nodes x1, x2, . . ., xk, we will search along produces a unique copy of H in G, and with high probability,
k-node paths in G, looking for a k-node path P for which the total number of nodes in all search trees τv in the recovery
the edges induced among the nodes of P have precisely the algorithm does not grow too large.
structure of H. The formal statements of these two claims are as follows.
At a high level (ignoring issues of efficiency, which we dis-
cuss next), our algorithm works simply as follows. For every • Uniqueness. Let k ≥ (2 + d )log2 n for an arbitrary positive
k-node path P = {y1, y2, . . ., yk} in G, we visit the nodes of P in constant d > 0, and suppose we use the following process to
order, declaring P to have failed in the comparison to H as soon construct an n-node graph G:
as we reach a node yi that fails one of the following two tests.
   (i) We start with an arbitrary graph G¢ on n – k nodes, and
  (i) A degree test: The degree of node yi should be equal to we attach new nodes X = {x1, . . ., xk} arbitrarily to
the value , which we know to be the degree of node nodes in G¢.
xi in G.   (ii) We build a random subgraph H on X by including each
(ii) An internal structure test: For each j < i, there should be edge (xi, xi+1 ) for i = 1, . . ., k − 1, and including each
an edge (yj, yi) in G if and only if (xj, xi) is an edge of H. other edge (xi, xj) independently with probability 1/2.

Finally, if we reach the end of the path P without any of its Then with high probability, there is no subset of nodes
nodes having failed either of these tests, then by definition S π X in G such that G[S] is isomorphic to H = G[X].
we have found a copy of H in G. (As we note later, the degree
test is not necessary either for the correctness of the algo- • Efficiency. For every e > 0, with high probability, the total
rithm or the bound on the worst-case running time, but it is number of nodes appearing in all the search trees τv (over
extremely useful in practice.) all v in G) is O(n1+e).

136 c ommunications of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


While the proofs of these claims are somewhat involved, As data, we use the network of friendship links on the
the basic idea underlying them is rooted in an argument for- blogging site LiveJournal, constructed from a crawl of this
mulated by Paul Erdös to prove a result in Ramsey Theory.5, 11 site performed in February 2006. Each node in LiveJournal
In the simplest form of the argument, let us suppose we have corresponds to a user who has made his or her blog public
an n-node graph G and a random k-node graph H on nodes through the site; each user can also declare friendship links
{xi, x2, . . ., xk}, with each edge (xi, xj) present independently to other users. These links provide the edges of the social
with probability 1/2. What is the probability that G contains network we construct; they are directed, but we follow the
a k-node subgraph that is isomorphic to H? For any k-tuple principle of the previous subsections and convert them to
of nodes v1, v2, . . ., vk in G, the probability that the subgraph undirected edges for purposes of the experiments. The
of G on this k-tuple is isomorphic to H, under the mapping LiveJournal data thus works well as a testbed; it has 4.4 mil-
sending vi to xi, is precisely , since the presence or lion nodes and 77 million edges in the giant component of
absence of the random edge (xi, xj) has to match the presence its undirected social network, and it exhibits many of the
or absence of (vi, vj) for each (i, j ) pair. But there are fewer global structural features of other large online social net-
than nk such k-tuples of nodes in G, and so the probability works. Finally, we emphasize that while LiveJournal has the
that any of them yields such an isomorphism is less than right structure for our tests, it is not in reality an anonymous
nk 2−k(k−1)/2. Now a direct calculation shows that once k exceeds network—all the nodes in the network represent users who
2 1og2 n, this probability shrinks rapidly to 0, and hence it is have chosen to publish their information on the Web.
likely that there is no isomorphic copy of H in G. We simulate anonymization by removing all the user
This gives the central idea of the proofs, but the details names from the nodes; we then run our attack and investi-
become more complicated because the graph H in the gate the ranges of parameters in which it successfully iden-
active attack is necessarily being attached by edges to the tifies targeted nodes. As a first question, we examine how
graph G—and this creates the possibility of isomorphisms often H can be found uniquely for specific choices of d0, d1,
that ­create a second copy of H out of parts of the original and k. In our construction, we generate a random external
H together with parts of the rest of G. Showing that this is degree Di for each node xi uniformly from [d0, d1]. We then
unlikely to happen requires a more intricate argument. create links to targeted nodes sequentially. Specifically,
It is important to stress, however, that the intricacy of in iteration i we choose a new user wi in G – H to target; we
the proofs is an aspect of the analysis, not of the algorithms then pick a minimal subset X¢ ⊆ X that has not been used
themselves. The construction of H and the recovery algo- for any wj for j < i, and where the degrees of nodes in X¢ are
rithm have already been fully specified in the previous sub- less than their randomly selected target degrees. We add
section, and they are quite simple to implement. an edge between wi and each user in X¢. We repeat this
We conclude with some comments on the tests used in process until no such X¢ can be found. If, at the end of the
the recovery algorithm. Recall that as we build τv, we elimi- process, some nodes in X have not yet reached their target
nate paths based on an internal structure check (do the edges degrees, we add edges to random nodes in G (and remove
among path nodes match those in H?) and a degree check nodes from W so that no two nodes are connected to the
(do the nodes on the path have the same degree sequence same subset of X).
as H?). The proofs of our two main claims require just the Uniqueness. We say the construction succeeds if H can be
internal structure check to prove uniqueness and to bound recovered uniquely. Figure 1 shows the success frequency
the size of τv, respectively, but it is important in practice
that the algorithm use both checks: as the experiments in Figure 1. For two different choices of d0 and d1, the value k = 7 gives
the next subsection will show, one can get unique subgraphs the attack on the Live Journal graph a high probability of success.
Both of these choices for d0 and d1 fall well within the degrees
at smaller values of k, and with much smaller search trees
typically found in G.
τv, by including the degree tests. But it is interesting to note
that since these theorems can be proved using only inter- Probability of successful attack
nal structure tests, the attack is robust at a theoretical level 1
provided only that the attacker has control over the internal d0 = 20, d1 = 60
structure of X, even in scenarios where nodes elsewhere in d0 = 10, d1 = 20
0.8
the graph may link to nodes in X without the knowledge of
the attacker. (In this case, we still require that the targeted
nodes wj Î W are uniquely identifiable via the sets Nj and that 0.6
Probability

all degrees in X remain logarithmic.)


0.4
3.3. Computational experiments
Social Network Data. We now describe computational
experiments with the algorithm on real social network 0.2
data drawn from an online setting. We find that the algo-
rithm scales easily to several million nodes and produces 0
efficiently findable unique subgraphs for values of k sig- 0 2 4 6 8 10 12
nificantly smaller than the upper bounds in the previous k
subsections.

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n icat i o ns o f th e acm 137


research highlights

for two different choices of d0 and d1 (the intervals [10, 20] to the rest of G.
and [20, 60]), and varying values of k. We see that the success
frequency is not significantly different for our two choices. 4. THE CUT-BASED ATTACK
In both cases the number of nodes we need to add to achieve In the walk-based attack just presented, one needs to con-
a high success rate is very small—only 7. With 7 nodes, we struct a logarithmic number of nodes in order to begin com-
can attack an average of 34 and 70 nodes for the smaller and promising privacy. On the other hand, we can show that at
larger degree choices, respectively. least nodes are needed in any active attack that
We also note that the degree tests are essential for pro- requires a subgraph H to be uniquely identifiable with high
ducing unique identifiability of H at such a small value of k. probability, independent of both the structure of G – H and
In fact, each of the 734 possible Hamiltonian graphs on the choice of which users to target.
7 nodes actually occurs in the LiveJournal social network, so It is therefore natural to try closing this gap between the
it is only because of its degree sequence in G that our con- O(log n) number of nodes used by the first attack and the
structed subgraph H is unique. (Our Uniqueness result does lower bound required in any attack. With this in
guarantee that a large enough H will be unique purely based mind, we now describe our second active attack, the cut-
on its internal structure; this is compatible with our findings based attack; it matches the lower bound by compromising
since the analyzed bound of (2 + d) log2 n is larger than the privacy using a subgraph H constructed on only
value k = 7 with which we are succeeding in the experiments.) nodes. While the bound for the cut-based attack is appeal-
Efficient Recovery. In addition to being able to find H reli- ing from a theoretical perspective, there are several impor-
ably, we must be able to find H quickly. We argued above tant respects in which the walk-based attack that we saw
that the total number of nodes in all search trees τv would earlier is likely to be more effective in practice. First, the
be ­sufficiently small that our search algorithm would be walk-based attack comes with a much more efficient recov-
near-linear. In our experiments on the LiveJournal friend- ery algorithm; and second, the walk-based attack appears
ship graph, we find that, in practice, the total number of to be harder for the curator of the data to detect (as the cut-
nodes in all τv is not much larger than the number of nodes based attack produces a densely connected component
v whose degree in G is equal to . (Recall that we only build attached weakly to the rest of the graph, which is uncom-
search trees for those v that have this degree.) For instance, mon in many settings).
when d0 = 10 and d1 = 20, there are an average of 70,000 nodes The Construction of H. We begin the description of
that have degree , while the total number of nodes in all the ­cut-based attack with the construction of the sub-
search trees τv is typically about 90,000. graph H.
Detectability. Our simple attack shows that simple anony-
mization does not preserve privacy of links. One might won- (1) Let b, the number of users we wish to target, be
der about the detectability of the attack: can the curator of , and let w1, w2, . . ., wb be these users. First, for
the data, who is releasing the anonymized version, not be k = 3b + 3, we construct a set X of k new user accounts,
able to discover and remove H? The curator does not have creating an (undirected) edge between each pair with
access to the secret degree sequence or the edges within probability 1/2. This defines a subgraph H that will
H and so cannot employ the same algorithm the attacker be in G.
uses to discover H. However, if H were to stand out signifi- (2) Let d (H) denote the minimum degree in H, and let
cantly in some other way, there might be an alternate means γ (H) denote the value of the minimum cut in H (i.e.,
for finding it. the minimum number of edges whose deletion dis-
This subtle issue is worthy of more rigorous treat- connects H). It is known that for a random graph H
ment; here, we provide the following indications that the such as we have constructed, the following properties
subgraph H may be hard to discover. First is the simple hold with probability going to 1 exponentially quickly
fact that H has only 7 nodes, so it is difficult for any of its in k: first, that γ (H) = d (H); second, that d (H) ≥ (1/2 − e)
graph-theoretic properties to stand out with much statisti- k for any constant e > 0; and third, that H has no non-
cal significance. Second, we describe some particular ways trivial automorphisms.5 In what follows, we will
in which H does not stand out. To begin with, the internal assume that all these properties hold: γ  (H) = d (H) ≥
structure of H is consistent with what is present in the net- k/3 > b, and H has no nontrivial automorphisms.
work. For example, we have already mentioned that every (3) We choose b nodes x1, . . ., xb in H arbitrarily. We
7-node Hamiltonian graph already occurs in LiveJournal, ­create a link from xi to wi so that the edge (xi, wi) will
so this means that there are already subgraphs that exactly appear in the anonymized graph G. Thus, b of the
match the internal structure of H as an induced 7-node nodes of H each have a single edge to a node of G – H,
subgraph. (We are still able to find H because of the pat- while the other k − b nodes of H have no edges to
tern of edges that connect nodes of H to nodes of G – H.) nodes of G – H.
More generally, almost all nodes in LiveJournal are part of
a very dense 7-node subgraph: If we look at all the nodes A crucial property of H that we will use is the following:
with degree at least 7, and consider the subgraph formed there are b edges in total that have one end in H and the
by those nodes and their 6 highest-degree neighbors, over other end  in G – H; on the other hand, each node in H has
90% of such subgraphs have at least 11 > edges. These more than b edges to other nodes of H.
subgraphs are also almost all comparably well connected Finally, we note that as with the walk-based attack in the

138 c ommun ication s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


previous section, we can also carry out multiple copies of forest T ¢, one of which, say S*, is our subgraph H. The prob-
the present construction simultaneously, if desired, so as to ability that there even exists an Sj ¹ S* that is isomorphic to H
attack multiple sets of targeted users W1, W2, . . ., Wt. is bounded by 8333333 · 12! < 2−66 < 6 · 10−5. Hence the proba-
Efficiently Recovering H Given G. Now, when G is released, bility that any Hi will lead to non-uniqueness when attached
we identify the subgraph H and the targeted users w1, . . ., wb to G is at most 0.006, and so in particular this holds for the Hi
using the following recovery algorithm. that we choose as H.
By way of comparison, the provable bounds for the
(1) We first compute the Gomory–Hu tree of G—this is an walk-based attack require a number of new user accounts
edge-weighted tree T on the node set V of G, such that that is at least 2 log2 n, which is approximately 53 when n is
for any v, w  ∈ V, the value of the minimum v − w cut in 100 million. On the other hand, as we have seen in our com-
G is equal to the minimum edge weight on the putational experiments, the walk-based attack appears to
v − w path in T.13 require fewer nodes in practice than the provable guaran-
Computing T is the most expensive step of the tees suggest, suggesting that further empirical comparison
recovery algorithm, computationally. The best run- of these two attacks would be an interesting open question.
ning time known for constructing a Gomory–Hu tree
in a graph with n nodes and m edges is O(mn) times 5. PASSIVE ATTACKS
a factor that is polynomial in log (m + n).3 This is a In a passive attack, regular users are able to discover their
much larger worst-case bound than we have for the locations in G using their knowledge of the local structure
walk-based attack. On the other hand, computational of the network around them. While there are a number
experiments in Web graph analysis indicate that of different types of passive attacks that could be imple-
Gomory–Hu tree computations can in fact be made mented, here we imagine that a small coalition of passive
to scale to very large graphs in practice.12 attackers collude to discover their location. By doing so,
(2) We delete all edges of weight at most b from T, produc- they compromise the privacy of some of their neighbors:
ing a forest T ¢. To find the set of nodes X we con- those connected to a unique subset of the coalition, and
structed, we iterate through all components of T ¢ of hence unambiguously recognizable once the coalition is
size exactly k—let them consist of node sets S1, S2, . . ., found.
Sr—and for each such Si we test whether G[Si] is isomor- Here, we imagine that a coalition X of size k is initiated
phic to H. These isomorphism tests can be done effi- by one user who recruits k − 1 of his or her neighbors to join
ciently, even by brute force, since k! = o(n). By adapting the coalition. (Other structures could lead to analogous
our proof of Uniqueness from the walk-based attack, attacks.) We assume that the users in the coalition know
we can show a form of uniqueness for H here too: the edges among themselves—the internal structure of H =
   • With high probability, there will be a single i such that G[X], using the terminology from the active attacks. We
G[Si] is isomorphic to H, and that Si is equal to our set also assume that they know the names of their neighbors
X of new nodes. outside X. This latter assumption is reasonable in many
(3) Since H has no non-trivial automorphisms, from cases: for example, if G is an undirected graph built from
knowledge of Si we can identify the nodes x1, . . ., xb messages sent and received, then each user in X knows its
that we linked to the targeted users w1, . . ., wb, respec- incident edges.
tively. Hence we can identify the targeted users as The attack itself is analogous to the walk-based attack,
well, which was the goal. except that the structure of H arises organically from the
behavior of individuals using the system. A user x1 selects
Some Specific Numbers for the Cut-Based Attack. It is use- k − 1 neighbors to form a coalition X = {x1, x2, . . ., xk}. The
ful to supplement the asymptotic results for the cut-based coalition knows which edges (xi, xj) are in G and also the
attack with some specific numbers. If the network G has neighbors of each xi in G – X. Once G is released, the coalition
100 million nodes, then by creating 12 new user accounts runs the search algorithm from the walk-based attack, with
we can succeed in identifying 3 chosen users in the system a minor modification due to the fact that H need not have a
with probability at least 0.99. Creating 15 new user accounts Hamiltonian path but instead has a single node connected
leads to a microscopically small failure probability. to all others.
The calculation is as follows. We first generate 100 ran- To help the passive attack succeed, we can incorpo-
dom 12-node graphs H1, . . ., H100, and see if any of them lacks rate a further optimization that was not explicitly used for
nontrivial automorphisms and has a minimum cut of size the walk-based active experiments. For each nonempty
at least 4. If any of them does, we choose one as our 12-node set S ⊆ {1, 2, . . ., k}, let g (S) denote the number of nodes in
subgraph H. Computational experiments show that a ran- G that have edges to all the element of {xi: i Î S} and none
dom 12-node graph will have no nontrivial automorphism of the elements of {xi: i ∉ S}. (In some places, we will abuse
and g  (H) ≥ 4 with probability roughly 0.25. Thus, with prob- the notation for g (·) as follows: if U is a set of nodes in X
ability well over 0.999, one of the 100 graphs Hi will have this rather than a set of indices, we will use g (U) to denote the
pair of properties. Now, if we use the ith of these random number of nodes in G that have edges to all elements of U
graphs in the construction, for a fixed i, then, applying the and no elements of X − U.) Now, suppose we have a node a in a
notation from the description of the attack above, there are search tree τv, corresponding to a path y1, y2, . . ., yℓ in G. For
at most 8333333 possible components Sj of size 12 in the each S ⊆ {1, 2, . . ., ℓ}, it should be the case that exactly g (S)

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n icat i o ns o f th e acm 139


research highlights

nodes of G are connected to all members of {yi: i Î S} and Figure 2. Probability of success for different coalition sizes in the
none of {yi: i ∉ S}; otherwise, {y1, . . ., yℓ} cannot be the first LiveJournal graph, comparing a simple algorithm using only the
ℓ nodes of the copy of H in G. degrees and internal structure of the coalition, and a more refined
Finally, once the coalition of users X finds itself, it can algorithm using the edges connecting H to G–H.
determine the identity of any user w ∉ X whose neighbor set
Probability of successful attack
S in X satisfies g(S) = 1. (In this case, w is uniquely identified
1
by the identities of its neighbors in X.)
Since the structure of H is not randomly generated, 0.9
there is no a priori reason to believe that it will be uniquely 0.8
findable or that the above algorithm will run efficiently. 0.7
Indeed, for pathological cases of G and H, the problem is
0.6

Probability
NP-Hard. However, we find on real social network data that
0.5
the instances are not pathological and that subgraphs on
small coalitions tend to be unique and efficiently findable. 0.4
The primary disadvantage of this attack in practice, as 0.3
compared to the active attack, is that it does not allow one to 0.2 Simple algorithm, High-degree friends
compromise the privacy of arbitrary users. However, a natural Refined algorithm, High-deg friends
0.1 Refined algorithm, Random friends
extension is a semi-passive attack whereby a coalition of existing
users colludes to attack specific users. To do this, the coalition 0
2 3 4 5 6 7 8
X forms as described above with x1 recruiting k − 1 neighbors.
Coalition size
Next, the coalition compares neighbor sets to find some set
S ⊆ X such that g (S) = 0. Then, to attack a specific user w, each
user in {xi: i Î S} adds an edge to w. Then, assuming that the
coalition can uniquely find H, they will certainly find w as well. Figure 3. As the size of the coalition increases, the number of users
Computational Experiments. Here, we consider the passive in the LiveJournal graph compromised under the passive attack
when the coalition successfully finds itself increases superlinearly.
attack on the undirected version of the LiveJournal graph.
The number of users the semi-passive attack compromises
For varying k, we consider a coalition of a user x1 and his or increases exponentially.
her k − 1 highest-degree neighbors. (We also consider the
case where x1 selects k − 1 neighbors at random; the success Average number of users compromised
rate here is similar.) We analyze the attack described above 50
for a randomly chosen sample of users x1 whose degree is at 45 Passive
Semi-passive
least k − 1. 40
We find that even coalitions as small as three or four users
Number compromised

35
can often find themselves uniquely, particularly when using
the refined version of the algorithm. Figure 2 summarizes the 30
success rates for different-sized coalitions based on both 25
the “simple” algorithm using the internal structure of H 20
and the degree sequence, as well as the “refined” algorithm 15
that incorporates the function g (S). With minimal prepro-
10
cessing, G can be searched for a particular coalition almost
immediately: On a standard desktop, it takes less than a 5
tenth of a second, on average, to find a coalition of size 6. 0
2 3 4 5 6 7 8
At first glance, these results seem at odds with the
results for the active attack in Figure 1, as the passive attack Coalition size
is producing a higher chance of success with fewer nodes.
However, in the active attack, we limited the degrees of the
users created so that H would be inconspicuous. In the pas- by the coalition). Moreover, when the coalition is compro-
sive attack, there is no such limit, and many users’ highest- mising as many users as possible, the semi-passive attack
degree neighbor has degree well over the limit of 60 that we tends to have a higher success rate.
imposed on the active attack; this makes it easier to find
the resulting subgraph H. When we consider only those 6. DISCUSSION
coalitions whose members all have degrees analogous It is natural to ask what conclusions about private analysis
to those in the active attack, the results are similar to the of social network data should be drawn from this work. As
active attack. noted at the outset, our work is not directly relevant to all
As Figure 3 shows, the passive attack identifies relatively settings in which social network data is used. For example,
few nodes outside the coalition, compared to the active much of the research into online social networks is con-
attack. However, with a semi-passive attack, we can greatly ducted on data collected from Web crawls, where users
increase the number of users compromised, as indicated by have chosen to make their network links public. There
Figure 3 (and recall that these users can be chosen arbitrarily are also natural scenarios in which individuals work with

140 c ommun ication s of t h e ac m | d ec embe r 2 0 1 1 | vol . 5 4 | no. 1 2


social network data under safeguards that are primarily protects against arbitrary auxiliary information is differen-
legal or contractual, rather than computational, in nature— tial privacy. Further progress on differentially private analysis
although even in such cases, there are compelling reasons of social networks awaits compelling and precise analytical
why researchers covered by contractual relationships with a goals.
curator of sensitive data should still only publicly release the
results of analyses carried out through a privacy mechanism Acknowledgments
to prevent the information in these analyses from implicitly This work has been supported in part by NSF grants CCF-
compromising privacy. In cases such as these, where com- 0325453, IIS-0329064, CNS-0403340, and BCS-0537606, by
putational safeguards are not the primary focus, important the Institute for the Social Sciences at Cornell, and by the
questions of data utility versus privacy still arise, but these John D. and Catherine T. MacArthur Foundation.
questions are not something our results directly address.
What our results do show is that one cannot rely on ano-
References
nymization to ensure individual privacy in social network
1. Agrawal, D., Aggarwal, C. On the privacy-preserving data analysis.
data, in the presence of parties who may be trying to com- design and quantification of privacy In Proceedings of FOGS (2010).
promise this privacy. And while one natural reaction to these preserving data mining algorithms. 15. Hay, M., Miklau, G., Jensen, D.,
In ACM Symposium on Principles of Towsley, D., Weis, P. Resisting
results is to try inventing methods of thwarting the partic- Database System (2001). structural re-identification in
2. Agrawal, R., Srikant, R. Privacy- anonymized social networks.
ular attacks we describe, we think this misses the broader preserving data mining. In Proceedings In Proceedings of the VLDB
point of our work: true safeguarding of privacy requires of the ACM SIGMOD (2000). Endowment, 1 (2008).
3. Bhalgat, A., Hariharan, R., Kavitha, T., 16. Jernigan, C., Mistree, B. Gaydar:
mathematical rigor, beginning with a clear description of Panigrahi, D. An Õ(mn) Gomory- Facebook friendships expose sexual
what it means to compromise privacy, what are the compu- Hutree construction algorithm for orientation. First Monday 14 (2009).
unweighted graphs. In Proceedings 17. Kumar, R., Novak, J., Pang, B.,
tational and behavioral capabilities of the adversary, and to of ACM Symposium on Theory of Tomkins, A. On anonymizing query
what information might it have access, now or in the future. Computing (2007). logs via token-based hashing. In
4. Blum, A., Dwork, C., McSherry, F., Proceedings of the 16th International
There is a growing literature to which we can turn for Nissim, K. Practical privacy: the SuLQ World Wide Web Conference (2007).
thinking about ensuring privacy in settings such as these. framework. In ACM PODS (2005). 18. Mishra, N., Sandier, M. Privacy via
5. Bollobás, B. Random Graphs. pseudorandom sketches. In ACM
There has been extensive recent work on privacy-preserving Cambridge University Press, Symposium on Principles of Database
data mining, beginning with Agrawal et  al., Samarati, and Cambridge, U.K., 2001. System (2006).
6. Crandall, D., Backstrom, L., Cosley, D., 19. Mislove, A., Viswanath, B., Gummadi,
Sweeney 1, 2, 23, 24 which rekindled interest in a field quiescent Suri, S., Huttenlocher, D., Kleinberg, J. P.K., Druschel, P. You are who you know:
since the 1980s, and increasingly incorporating approaches Inferring social ties from geographic inferring user profiles in online social
coincidences. Proc. Natl. Acad. Sci., networks. In ACM WSDM (2010).
from modern cryptography for describing and reasoning 107 (2010). 20. Narayanan, A., Shmatikov, V. Robust
about information leakage.4, 7, 10, 18 The notion of e-differential 7. Dinur, I., Nissim, K. Revealing de-anonymization of large sparse
information while preserving privacy. datasets (How to break anonymity
privacy gives very strong guarantees, independent of the aux- In Symposium on Principles of of the Netflix prize dataset). In
iliary information and computational powers of the adversary Database System (2003). Proceedings of the IEEE Symposium
8. Dwork, C. Differential privacy. on Security and Privacy (2008).
(see Dwork et al.8, 9, 10). This notion departs from previous ones Proceedings of International 21. Narayanan, A., Shmatikov, V.
Colloquium on Automata, Languages De-anonymizing social networks. In
by shifting away from comparing what can be learned about and Programming (2006). Proceedings of the IEEE Symposium
an individual with versus without the database, instead con- 9. Dwork, C. A Firm Foundation for Private on Security and Privacy (2009).
Data Analysis. CACM 54, 1 (2011). 22. Novak, J., Raghavan, P., Tomkins, A.
centrating on how the database behaves with versus without 10. Dwork, C., McSherry, F., Nissim, K., Anti-aliasing on the web. Proceedings
the data of an individual. Smith, A. Calibrating noise to of the 13th International World Wide
sensitivity in private data analysis. Web Conference (2004).
A simple and general interactive mechanism for ensuring In Proceedings of Theory of 23. Samarati, P. Protecting respondents’
differential privacy is given in Dwork et al.10 In this mecha- Cryptography Conference (2006). identities in microdata release.
11. Erdös, P. Some remarks on the theory IEEE TKDE 13 (2001).
nism, a question is posed, the exact answer is computed by of graphs. Bull. AMS 53 (1947). 24. Sweeney, L., k-anonymity: a model for
the curator, and then a noisy version of the true answer is 12. Flake, G., Tarjan, R., Tsioutsiouliklis, K. protecting privacy. Intl. J. Uncertainty
Graph clustering and min cut trees. Fuzziness Knowledge-Based Systems
returned to the user. The advantage of interaction lies in the Internet Math. 1 (2003). 10 (2002).
fact that accuracy must deteriorate with the number and com- 13. Gomory, R., Hu, T.C. Multi-terminal 25. Zheleva, E., Getoor, L. The illusion
network flows. J. Soc. Ind. Appl. Math. of privacy in social networks with
plexity of questions asked (see Dinur and Nissim,7 et sequelae). 9 (1961). mixed public and private user profiles.
In a noninteractive solution, the curator must produce an 14. Hardt, M., Rothblum, G. A Proceedings of the 18th International
multiplicative weights mechanism for World Wide Web Conference (2009).
object that answers all potential future questions; interactive
approaches answer only those questions actually asked.
Lars Backstrom (lars@fb.com), Jon Kleinberg (kleinber@cs.cornell.edu),
A lively literature (see, e.g., Hardt and Rothblum14 and the Cornell University, Ithaca, NY.
Facebook, Palo Alto, CA.
references therein) explores the tradeoffs between accuracy,
computation, and degree of differential privacy in answer- Cynthia Dwork (dwork@microsoft.com),
Microsoft Research, Silicon Valley
ing very large numbers of counting queries, that is, questions Campus, Mountain View, CA.
of the form “How many people in the database satisfy prop-
erty P?” In the context of a social network in which the goal
is to protect the privacy of individual friendships, this cap-
tures questions of the form “How many edges (friendships)
connect people with property P to people with property Q?”
such as, “How many friendships are there between people
who went to Princeton High School and Cornell graduates?”
The only privacy definition of which we are aware that © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l . 5 4 | n o. 1 2 | c o m m u n ic atio n s o f the ac m 141


last byte

DOI:10.1145/2043174.2043200 Peter Winkler

Puzzled
Solutions and Sources
Last month (Nov. 2011, p. 120) we posted a trio of brainteasers, including
one as yet famously unsolved, concerning distances between points on
the plane. Here, we offer solutions to two of them. How did you do?

1. Cities of gold.
Solution. You were asked to de-
termine whether it is possible to place
P, because all but two of the remain-
ing six points are at unit distance from
the fulcrum, and these two—the other
that order clockwise around P. Note
that the angle between PA and PC can-
not be more than 60 degrees or else
seven points (cities of gold) on the plane sharp lozenge endpoints—are unit the third side AC of the triangle would
in such a way that among any three, at distance from each other. So forget the be too long.
least two are a specified distance—10 fulcrum, but the other six points lie Observe now that the point B can-
leagues—apart. It turns out there is. on two equilateral triangles, and any not be involved in any other maxpairs,
We can assume that the specified three must include at least two vertices because such a pair would cross both
distance is 1. Two unit-side equilater- of one of the triangles. PA and PC, an impossibility. Dropping
al triangles sharing a side make what This cute problem was passed to B out of our configuration yields a
we call a “lozenge” with two sharp me (without the spurious history) by smaller configuration with one fewer
endpoints. Take two lozenges with a mathematical wizard Frank Morgan of point and one fewer maxpair, reach-
common sharp endpoint P, and swing Williams College. ing a contradiction.
them with P fixed in such a way that This puzzle appeared in 1957 on

2.
their other endpoints are unit distance the William Lowell Putnam Exam, an
apart (see the figure here). Together, Frisbee players. annual contest for college students
the two lozenges have seven vertices. Solution. If the Frisbee players (http://math.scu.edu/putnam/), which
To see that they satisfy the condi- are arranged in a regular nonagon with is a great source for challenging math-
tion, suppose there were three points longest diagonals of length 100 yards, ematical puzzles.
among the seven that do not include then nine pairs of players will be at this

3.
a pair at distance 1. This threesome distance, with none farther.
cannot contain the “fulcrum” point In fact, for any n, you cannot get Three colors, seven points.
more than n “maxpairs,” or pairs of Solution. To see how the layout
points at maximum distance, among of the seven points in the first puzzle
n points in the plane. To prove this, gives us information about painting
assume it is false, and let the points the plane, consider the colors these sev-
A,B,C… constitute a counterexample of en points would have to be if you could
the smallest possible size n. paint with only three colors. By the pi-
Note first that any two maxpairs AB, geonhole principle (used several times
CD must “cross”; that is, the line seg- already in this column) at least three
ment between A and B crosses the seg- of the seven points must then get the
ment between C and D; otherwise one same color, but we know these three
of the diagonals of the quadrilateral contain two points at unit distance,
ABDC would exceed the supposed max- and points at unit distance are not al-
imum length. lowed to have the same color. Voila!
Now if, in our purported counter-
example, no point was involved in Peter Winkler (puzzled@cacm.acm.org) is Professor
of Mathematics and of Computer Science, and Albert
more than two maxpairs, then there Bradley Third Century Professor in the Sciences, at
would be only n maxpairs in total. So Dartmouth College, Hanover, NH.
Locations of the seven cities of gold, with
each line representing a distance of 10 there must be some point P that is in All readers are encouraged to submit prospective
leagues; the top point is the fulcrum P. three maxpairs, say, PA, PB, and PC, in puzzles for future columns to puzzled@cacm.acm.org.

142 communicat ions of t he ac m | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


last byte

[ co nti nue d fro m p. 144] an operat- three weeks ago, when you know it’s
ing system from scratch to figure it out, clean, and reinstall some pieces. But
which we did. Then we looked at our “One of the big that’s clearly a labor-intensive project.
findings and realized they should be advantages of Or you try to find all the bad code and
applicable to any standard operating files, and remove them, which of course
system. So, with a few of my colleagues academia is that is also labor intensive. There are some
and students, we did a study to see how if you decide the automatic virus removers, but they’re
much work would be necessary to scale very specific to a particular virus.
the Linux kernel to a large number of problem’s not
cores. If you have enough manpower, interesting, you What is your approach?
it’s certainly doable. Here’s one direction my colleague
can change. Nickolai Zeldovich and our students
This is the system you built in which That’s a hard thing are exploring: Once you’ve deter-
eight six-core chips were used to simu- mined that an adversary sent bad
late the performance of a 48-core chip. to do in a startup.” packets to your Web server, you know
Yes, indeed. There are a lot of inter- everything that could be influenced
esting problems to be solved, but my by those packets is suspicious, and
general sense is that things are going all the influenced actions must be
to evolve in the right direction, and that undone. We roll the system back to
there won’t be a point in time where we you to build operating systems that are before the attack happened, and roll
have to throw everything away and start scalable by design, as opposed to scal- forward all the actions that were not
over again. ing every subsystem one by one. New influenced by the adversary’s actions.
concurrent data structures that exploit If everything works out correctly, you
Another insight to come out of that weak consistency semantics are anoth- will end up in a clean state, but you
work was that it can be difficult to iden- er direction. will still have all the work that you did
tify the root cause of performance is- in the last three weeks.
sues. Is that what inspired your work You have also done work on systems se-
on MOSBENCH, a set of application curity, using information flow control What if the actions of the adversary
benchmarks designed to measure the to prevent the unauthorized disclosure are intermingled with the actions of
scalability of operating systems? of data. the user?
Yes, MOSBENCH came out of that The idea is simple. Typically when Undoing that intermingling and
project. Typical benchmarks are just you build an application, and you want keeping track of the dependencies re-
application benchmarks, where all the to make it secure, you put a check be- quires some reasonably sophisticated
action is in the application itself. But fore every operation that might be sen- techniques. Another aspect of the
we needed a benchmark that included sitive. The risk is that you can easily problem is that you really don’t want
a lot of system-intensive applications. forget a check, which can then be ex- to replay or redo every operation. So
Otherwise, you don’t stress the operat- ploited as a security vulnerability. We we have a bunch of clever observations
ing system, and if you don’t stress the tried to structure the operating system saying, well, this work or this operation
operating system, it isn’t scalable by in such a way that even if you forget could never have been influenced by
default. So we collected several appli- some of these checks, security is not the attacker’s actions, so therefore we
cations to stress different parts of the immediately compromised. The way don’t have to redo them. We have some
operating system—essentially, it’s a we do it is to draw a box around the op- encouraging results, but we’re still try-
workload generator. erating system and label all data. Then ing to figure out whether we can make
we have a guard that checks whenever this work in practice for heavily used
What conclusions has it led to so far? data is being sent across the border to complex systems.
The Linux kernel scales pretty well. make sure it’s going to the right place,
But there might be interesting future based on the data’s label. Do you have plans to do another startup?
problems. One direction is having the I’m going to wait and see. It’s not
operating system give you more control Some of your other security research until the later stages of a project that
over the caches in which the data lives. focuses on making it easier to restore I think about whether it solves a real
The traditional view is that the cache is system integrity after an intrusion. So- problem that people have and, if so,
hidden from the operating system and called “undo computing,” for instance, would it be worthwhile to start a com-
the hardware just does its job of cach- seeks to undo any changes made by an pany around it. One of the big advan-
ing. In multicore, caches are spread adversary during the attack while pre- tages of academia is that if you decide
all around the chip, some close by and serving legitimate user actions. the problem’s not interesting, you
others that are far away. There are cas- Let’s say you have a desktop, and you can change. That’s a hard thing to do
es where you want control over where discover it was compromised a couple in a startup.
the data is placed so you can get better weeks after an attack. Then the ques-
performance. Something else we’re tion is, How do you restore its integrity? Leah Hoffmann is a technology writer based in Brooklyn, NY.

looking at are abstractions that allow You could go back to a backup from © 2011 ACM 0001-0782/11/12 $10.00

dec e mb e r 2 0 1 1 | vo l. 5 4 | n o. 1 2 | co m m u n ic ati o ns o f the acm 143


last byte

DOI:10.1145/2043174.2043201 Leah Hoffmann

Q&A work in practice. But I don’t think


there’s any direct technology transfer

Scaling Up
from our ideas into products although
there was one startup that used our
code. The impact has been more indi-
M. Frans Kaashoek talks about multicore computing, rect. Academically, it influenced other
systems that were built afterward. On
security, and operating system design. the more commercial side, it also has
been credited in work on machine
interest in com-
M . F r an s K a a s h o e k ’ s monitors for handheld devices.
puting was sparked, like many others
in the field, by an early love for pro- Operating systems design has become
gramming. At Vrije Universiteit, he dis- such a partisan issue. What is your take
covered he could turn his hobby into a on it?
career, and studied with MINIX creator I have a pragmatic view. In research,
Andrew S. Tanenbaum before accept- taking an extreme position is interest-
ing a professorship at Massachusetts ing because it forces you to clarify your
Institute of Technology’s Department thinking and solve the hard case. In
of Electrical Engineering and Com- practice, I think people are going to
puter Science. Kaashoek has since do whatever helps solve the particular
conducted wide-ranging research in problems they have. If you look at a
computer systems, including operat- monolithic kernel like Linux—I know
ing system design, software-based net- you can’t call it a microkernel system,
work routing, and distributed hash ta- but some of the servers run as applica-
bles, which revolutionized the storage tions in user space and some run in the
and retrieval of data in decentralized kernel, and it really becomes shades of
information systems. He also helped directly to applications. Traditionally, gray. And some people draw this line
found two startups: Sightpath, a video the kernel provides a fixed set of un- slightly differently than others. But if
broadcast software provider that was changeable abstractions. For example, the kernel is already working fine, why
acquired by Cisco Systems in 2000, and you have a very complex, unchange- change it?
Mazu Networks, which was acquired able kernel interface like traditional
by Riverbed Technology in 2009. Kaas- Unix systems, or you have a small, Since your work on exokernels, you
hoek was named an ACM fellow in 2004 unchangeable microkernel interface, have done several other projects on op-
and elected to the National Academy of which defines a few carefully chosen erating systems design, in particular as
Engineering in 2006. Last year his work abstractions. An exokernel design al- it relates to multicore computing.
was recognized with an ACM-Infosys lows the programmer to define its own You might say that multicore has
Foundation Award (see “Unlimited operating system abstractions. nothing to do with the operating system
Possibilities” in the June 2011 issue of because it is, in many ways, already in-
Communications). For its minimalism, it sounds almost herently parallel; it provides processes
like an extreme version of microker- that can run on different cores in paral-
You have said that your work on the nel design. lel. But many applications rely heavily
exokernel operating system, which en- The main goal with a microkernel is on operating system services, particu-
ables application developers to specify to make the kernel small. That was not larly systems applications like email
how the hardware should execute their necessarily our goal. So, for example, and Web servers. So if the operating
Photogra ph by D ominic Casserly

code, was driven by intellectual curios- we would have been perfectly happy to system services don’t scale well, those
ity. Can you elaborate? put a device driver inside the kernel if applications can’t scale well, either.
We wanted to explore whether we we thought it was the right thing to do.
could build a kernel interface that de- So your work is focused on building
fines no abstractions other than what How did the project evolve? scalable operating systems.
the hardware already provides, and We were able to build a prototype Originally, we thought we would
that exports the hardware abstractions that demonstrated the approach could have to write [c ont inu ed o n p. 1 4 3 ]

144 commun ications of t h e acm | d ec emb e r 2 0 1 1 | vol . 5 4 | no. 1 2


Distinguished Speakers Program
talks
talks byy and
aannd with
wi
witthh technology
te ologyy lleaders
eadd and innovators
Chapters • Colleges & Universities • Corporations • Agencies • Event Planners
Need a Great Technical Speaker for Your Next Event?
The Association for Computing Machinery (ACM), the world’s largest educational and scientific computing society, now provides colleges
and universities, corporations, event and conference planners, and agencies – in addition to ACM local Chapters – with direct access to top
technology leaders and innovators from nearly every sector of the computing industry.

Book the speaker for your next event through the ACM Distinguished Speaker Program (DSP) and deliver compelling and insightful content to your
audience at a remarkably reasonable price. Our program features renowned thought leaders in academia, industry and government, speaking about
the topics that matter most in the computing and IT world today. Our booking process is simple and convenient, please visit us at: www.dsp.acm.org.

The ACM Distinguished Speaker Program is an excellent solution for:


Corporations Educate your technical staff, ramp up the knowledge Event & Conference Planners Use the ACM DSP to help find
of your team, and give your employees the opportunity to have their compelling speakers for your next conference and reduce your costs
questions answered by experts in their field. in the process.
Colleges & Universities Expand the knowledge base of your students ACM Local Chapters Boost attendance at your meetings with live
with exciting lectures and the chance to engage with a computing talks by DSP speakers and keep your chapter members informed of
professional in their desired field of expertise. the latest industry findings.

Captivating Speakers from Exceptional Companies, Colleges & Universities


DSP speakers represent a broad range of companies, colleges and universities, including: IBM, Microsoft, BBN Technologies, Raytheon, Sony
Pictures Imageworks, National Institute of Standards and Technology, Lawrence Livermore National Laboratory, Siemens Information Systems
Bangalore, Stanford University, University of Pennsylvania, University of British Columbia, Georgia Tech, Carnegie Mellon, UCLA, McGill
University, Tsinghua University and many more.
Topics for Every Interest
Over 250 lectures are available from nearly 100 different speakers with topics covering Software Engineering, High Performance Computing,
Human Computer Interaction, Artificial Intelligence, Gaming, Mobile Computing, and dozens more. Some of our most popular lectures include:
• Electronic Voting in the 21st Century
• Software Engineering Best Practices
• Software Under Siege: Viruses and Worms
• Spatial Databases and Geographic Information Systems
• Careers in Computing – How to Prepare and What to Expect

Quality is Our Standard


The same ACM you know from our world-class digital library, magazines and journals is now putting the affordable and flexible Distinguished
Speaker Program within reach of the computing community.
To select a speaker for your next event, get started today at www.dsp.acm.org.
If you have questions, please send them to acmdsp@acm.org.

Association for
Computing Machinery
The DSP is sponsored,
in part, by Microsoft Europe Advancing Computing as a Science & Profession
© Jason Ku

February 19-22, 2012 Queen’s Human Media Lab Kingston, ON Canada

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy