Teaching Data and Computational Journalism PDF
Teaching Data and Computational Journalism PDF
DATA A N D
C O M P U TAT I O N A L
JOURNALISM
Charles Berret & Cheryl Phillips
Copyright © 2016 Columbia Journalism School
ISBN: 978-0-692-63745-6
This book was set in Adobe Jenson Pro and Arno Pro,
both by Robert Slimbach.
PREFACE 7
EXECUTIVE SUMMARY 8
INTRODUCTION 11
CHAPTER 1:
DEFINING THE FIELD OF STUDY
What’s in a Name 15
Four Key Areas of Data Journalism 17
A Brief History of Computers and Journalists 20
The Task at Hand: Causes for Concern and Reasons for Hope 27
CHAPTER 2:
STATE OF THE FIELD:
OUR QUANTITATIVE DATA
The Scope of Our Study 31
Our Findings 32
Teaching Data Fundamentals: Rows and Columns 34
Teaching Advanced Data Skills: Visualization and Programming 36
Alternative Data Journalism Instruction: The State of
Online Courses 38
Textbooks: Little Consensus 39
CHAPTER 3:
QUALITATIVE FINDINGS:
INTERVIEWS AND OBSERVATIONS
Identifying What to Teach 41
The Coding Issue 43
Institutional Challenges: Resources 44
Institutional Challenges: Faculty Expertise 45
Institutional Challenges: Student Engagement 46
CHAPTER 4:
MODEL CURRICULA IN
DATA AND COMPUTATION
Introduction and Summary of Curricular
Recommendations 49
Model 1. Integrating Data as a Core Class: Foundations
of Data Journalism 50
Model 2. Integrating Data and Computation into Existing
Courses and Concentrations: General Guidelines for the
Undergraduate and Graduate Levels 53
Model 3. Concentration in Data and Computation 60
Model 4. Advanced Graduate Degree: Expertise-
Driven Reporting on Data & Computation 67
Model 5. Advanced Graduate Degree: Emerging Journalistic
Techniques and Technologies 69
CHAPTER 5:
INSTITUTIONAL
RECOMMENDATIONS
Faculty Development and Recruitment 73
Training or Modules 74
Incoming Skills, Technical Literacies, and Boot Camps 74
Technology Infrastructure 75
Benefits of Distance or Online Learning 75
Fostering Collaboration 76
APPENDIX 78
WORKS CITED 90
ACKNOWLEDGMENTS 92
P R E FAC E
The digital revolution ushered in fundamental changes in how information
is structured. It also brought changes in how governments and corporations
use information to exercise power. Governments now influence communities
through the management of large data sets, such as in the allocation of services
through predictive policing. They hold exclusive access to data that would
help us to understand which policies are working, or how vulnerable popula-
tions are affected by the exercise of public policy. Corporations write opaque
algorithms to determine who gets insurance at what price. These developments
challenge journalism to move well beyond adaptation to social media or the
adoption of new technologies for visualization. They implicate journalism’s
public purpose. Encouragingly, a new facet of journalistic practice is emerging,
adapting technology to reporting in the public interest.
This is an important reason why we must teach journalists to work with
data: There are vital questions to be asked that require numeracy, and there are
big stories to find and tell in new ways. The intellectual history of journalism
reveals a continuous interrogation of emerging technologies for their relevance
to the profession’s public purpose and concerns. We need journalists to be
positioned to assess techniques like natural language processing and facial
recognition for their relevance and promise as tools of reporting, as well as for
their ethical dangers.
This is where journalism education may play a leadership role. Integrating
computation, data science and other emerging technologies into public-spir-
ited reporting is an ideal mission for journalism. These schools can access the
full resources of a university. The mission also relieves journalism educators
of the risk of teaching perishable digital skills, and perishable platforms. Data
journalism curricula respond to objective change in the sheer amount of infor-
mation that is stored digitally today – information that requires computation to
access and interrogate. Teaching journalists to be literate about these changes
and some to be specialists requires committing ourselves to using data, compu-
tation, and emerging technologies as essential tools of our profession.
Steve Coll
Dean & Henry R. Luce Professor
Columbia Journalism School
E X E C U T I V E S U M M A RY
Over the past century, journalism schools have developed solid foundations
for teaching shoe-leather reporting techniques. Hundreds of universities teach
how to interview, how to develop sources, how to cover a beat, and how to
write a breaking news story, a feature, a sports dispatch, or an investigative
piece.
But the practice of data journalism has been largely left out of the main-
stream of journalism education, even as the field’s relatively small core of
devotees has honed it into a powerful and dynamic area of practice. For
decades, data journalists have competed for the profession’s highest prizes and
secured positions of distinction within the most competitive news organiza-
tions, yet our research has found that relatively few journalism schools offer
courses in this area, let alone a concentration, even as these schools have
expanded instruction in presentation-focused digital skills.
The authors of this report believe that all journalism schools must broaden
their curricula to emphasize data and computational practices as foundational
skills. To place data journalism in the core of journalism education will mark a
crucial advance in what schools can offer their students. Journalists who under-
stand data and computation can more effectively do their job in a world ever
more reliant on complicated streams of information.
Beyond teaching, too few journalism schools support faculty research
into tools and techniques of data-driven reporting, despite rich opportunities
for developing theories and applications that may change journalistic practice.
Journalism schools that embrace research in their missions can transform
themselves into innovation hubs, introducing new tools and techniques to the
profession and across their universities, instead of merely preparing students to
enter the field.
This report offers a snapshot of the state of data journalism education
in the United States and outlines models for both integrating the use of data
journalism into existing academic programs and establishing new degrees that
specialize in data-driven and computational reporting practices. While we
focus on the state of education in one country, we hope that the results may
also be useful internationally.
But first, a definition. When we say “data journalism,” we mean using 9
data for the journalistic purpose of finding and telling stories in the public
interest. This may take many forms: to analyze data and convey that analysis
in written form, to verify data found in reports, to visualize data, or to build
news apps that help readers to explore data themselves. This field also encom-
passes the use of computation—algorithms, machine learning, and emerging
technologies—to more effectively mine both structured and unstructured
information to find and tell stories. The ability to use, understand, and critique
data amounts to a crucial literacy that may be applied in nearly every area of
journalistic practice.
We interviewed more than 50 journalists, educators, and students, and we
evaluated more than 100 journalism programs across the nation. This report
features a chapter detailing quantitative findings, such as the number of U.S.
journalism programs offering classes in data, computation, and related tech
skills. We also include a chapter of qualitative findings in which our interviews
and classroom observations offer some color and texture to this picture of the
present state of data journalism education and its potential.
WHAT ’S IN A NAME
In our view, data journalism as a field encompasses a suite of practices for
collecting, analyzing, visualizing, and publishing data for journalistic purposes.
This definition may well be debated. The history of data journalism is full of
arguments about what it should be called and what it includes.
In fact, data journalism has been evolving ever since CBS used a computer
to successfully predict the outcome of the presidential election in 1952. As
technology has advanced, so has the ability of journalists to tap that technology
and use it for important storytelling.
One key definition of data journalism can be found in a 2014 report by
Alexander Howard for the Tow Center for Digital Journalism and Knight
Foundation. Data journalism is “gathering, cleaning, organizing, analyzing,
visualizing, and publishing data to support the creation of acts of journalism,”
Howard wrote. “A more succinct definition might be simply the application
of data science to journalism, where data science is defined as the study of the
extraction of knowledge from data.”2
But news games, drone journalism, and virtual reality—approaches that
some may not consider mainstream data journalism today—may represent a
much more dominant presence tomorrow. Or data journalism may evolve in
yet another direction, perhaps into common applications for machine learning
and algorithms. Data journalists are already working more with unstructured
information (text, video, audio) as opposed to the historical elements of data
journalism (spreadsheets and databases full of rows and columns of numbers).
DATA REPORTING
definition: Obtaining, cleaning, and analyzing data for use in telling journal-
istic stories.
includes:
» Deploying computer-assisted reporting or analysis for writing
journalistic stories
» Practicing precision journalism, as introduced by Philip Meyer, including
the use of social science research methods in the interest of journalism
» Visualizing data—mapping and charting—for use in exploration and
analysis
» Programming to obtain and analyze data for writing journalistic stories
techniques and technologies:
» Invoking public records law to negotiate for data
» Using web scraping tools and techniques (ranges from tools to knowledge
of Python programming language)
» Using relational database software (can range from Microsoft Access
to MySQL)
» Understanding statistical concepts and software or programming
languages with statistical packages (SPSS or R among others)
» Using mapping and visualization tools and software (Tableau, Esri
mapping software, QGIS, Google Fusion)
COMPUTATIONAL JOURNALISM
definition: The use of algorithms, machine learning, and other new
methods to accomplish journalistic goals. This area overlaps with data
reporting and emerging technologies.
includes:
» Algorithms that help journalists mine unstructured data in new ways
» New digital platforms to better manage documents and data
technologies:
» Programming languages like Python, Ruby, and R
» Frameworks and applications like Jupyter that enable journalists to mix
code and prose as they perform analysis and show the steps in their work
» Platforms like Overview that facilitate the use of complicated
computational processes like natural language processing and topic
modeling
20 A BRIEF HISTORY OF COMPUTERS
AND JOURNALISTS
In 1967, Philip Meyer had just returned to Knight Ridder’s Washington Bureau
from a Nieman Fellowship at Harvard University, where he had delved into a
different area of computational methods: social science. Social science meth-
odologies, including statistical tests and surveys, had recently been used by
academics to detail the reasons behind the 1965 Watts riots in Los Angeles.
Meyer believed similar methodologies could have great impact in journalism.
He wasn’t back at work for long when he was able to put that belief into prac-
tice.
In July 1967, an early morning raid of an unlicensed bar in Detroit resulted
in rioting. Crowds of people ran through the streets, burning, looting, and
shooting. Theories abounded as to why the rioting had occurred. Some experts
thought it was done by those “on the bottom rung of society” with no money
or education. A second theory was that it was caused by transplanted and unas-
similated Southerners.
Meyer, on loan to Knight Ridder’s Detroit Free Press, reached out to friends
who were social scientists to devise a survey, cobble together funding, and
train interviewers. In the survey, respondents, who were guaranteed anonymity,
were asked to assess their own level of participation in the riots. They were
also asked to indicate whether they considered rioting a crime, whether they
supported fines or jail for the looters, and whether they considered African
Americans in Detroit to be better off than those elsewhere.
The survey results contradicted the earlier theories and pointed to a
different explanation—that the relative good fortune of many African Ameri-
cans highlighted more deeply the gap felt by those who were left behind.
The Free Press’s coverage of the rioting, including Meyer’s “swift and
accurate investigation into the underlying causes,” won the Pulitzer Prize for
Local General Reporting in 1968 and launched a new era in the use of compu-
tational methods in the service of journalism. Meyer’s seminal book, Precision
Journalism: A Reporter’s Introduction to Social Science Methods was published in
1973 and argued that journalists trained in social science methods would be
better equipped for journalistic work and provided guidelines for journalists
to understand those methods.3 “The tools of sampling, computer analysis, and
statistical inference increased the traditional power of the reporter without
changing the nature of his or her mission,” Meyer wrote, “to find the facts, to
understand them, and to explain them without wasting time.”4
That pioneering work by Meyer is commonly thought to be the beginning
of what has been termed either precision journalism or computer-assisted
reporting. His approach inspired other journalists. Their work in turn inspired
a movement and the creation of a training ground. Two academic institutions
3 In later editions, the name changed to The New Precision Journalism (2013).
4 Meyer, Precision Journalism, p. 3.
in particular, Indiana University and the University of Missouri, supported the 21
development of that training ground.
But in the wider academic world, computational methods applied to
reporting largely did not have an impact on other university programs or how
journalism was taught. Instead, professional journalists taught other profes-
sional journalists the new techniques, and only as those data journalists began
to enter academia did data journalism education begin to take a wider hold in
that setting.
By the 1980s, as desktop personal computers took the place of typewriters,
and editing terminals were used with digital publishing systems, reporters
began to use software on PCs to great effect. In 1986, Elliot Jaspin, a reporter at
the Providence Journal-Bulletin, used databases to match felons and bad driving
records to school bus drivers.
In 1988, Bill Dedman, a reporter for the Atlanta Journal Constitution,
using data from a 9-track tape and with analysis by Dwight Morris and input
from the Hubert H. Humphrey School of Public Affairs at the University of
Minnesota, showed that banks were redlining African Americans on loans
throughout Atlanta, and eventually the country, while providing services in
even the poorest white neighborhoods. That series, “The Color of Money,” won
a Pulitzer Prize in Investigative Reporting.
By 1989, Jaspin launched the Missouri Institute for Computer-Assisted
Reporting (MICAR) at the University of Missouri. Soon, he was teaching
computer-assisted reporting to students at the university and holding boot
camps for professional journalists. Four years later, in 1994, a Freedom Forum
grant would help the institute boost its presence and become a part of IRE as
NICAR—the National Institute for Computer-Assisted Reporting.
In 1990, at Indiana University, former journalist turned professor James
Brown worked with IRE to organize the first computer-assisted reporting
conference, sponsored by IRE. He created a fledgling group called the National
Institute for Advanced Reporting (NIAR).
“Andy Schneider, a two-time Pulitzer winner, had just joined our faculty as
the first Riley Chair professor. One day we were talking about how so few jour-
nalists used computers in their reporting,” Brown recalled in an email. “In 1990,
I don’t know of any schools that had such skills integrated into the curriculum.
At that time, any undergraduate in even the smallest school of business knew
how to use a spreadsheet. We decided to do something about it and that was
how NIAR started.”
NIAR would host six conferences before deciding to fold to avoid dupli-
cating efforts by IRE and MICAR, Brown said. Still, the Indiana conferences
trained more than 1,000 journalists and were a precursor to a new era. In
1993, IRE and MICAR (which later would be renamed to NICAR), held a
computer-assisted reporting conference in Raleigh, North Carolina, that drew
several hundred attendees. That marked the beginning of an annual event that
continues today, where new generations of reporters and editors learn to use
spreadsheets or query data and to use maps and statistics to arrive at news-
22 worthy findings.
In 1993, the same year as the Raleigh computer-assisted reporting confer-
ence, the Miami Herald received the Pulitzer Prize for Public Service after
reporter Steve Doig used data analysis and mapping to show that weakened
building requirements were the reason Hurricane Andrew had so devastated
certain parts of Miami.
Much of this new computer-assisted reporting came about because as
the Internet emerged and became more accessible, so too did the concept of
using a computer in reporting. But NICAR and the University of Missouri in
particular had a broad and deep impact. A good number of the most prominent
practitioners of data journalism learned their skills from NICAR and from
other journalists trying to solve similar data challenges.
This pattern is perhaps most visible through tracking the careers of the
NICAR trainers themselves. Sarah Cohen was part of a Washington Post team
that received the 2002 Pulitzer Prize in investigative reporting for detailing
the District of Columbia’s role in the neglect and death of 229 children in
protective care, and Jennifer LaFleur has won multiple national awards for the
coverage of disability, legal, and open government issues. Both were NICAR
trainers.
Another NICAR trainer was Tom McGinty, now a reporter at the Wall
NICAR conferences
over time. The confer-
ence was not held in
2001 because of 9/11.
Source: IRE
Street Journal and the data journalist for “Medicare Unmasked,” which received
the 2015 Pulitzer Prize in Investigative Reporting. Jo Craven McGinty was also
a NICAR trainer and later worked as a database specialist at the Washington
Post and at the New York Times; she now writes a data-centric column for the
Wall Street Journal. Her analysis about the use of lethal force by Washington
police was part of a Post series that received the Pulitzer Prize for Public 23
Service and the Selden Ring Award for Investigative Reporting in 1999.
Journalist David Donald moved on from his NICAR training role to head
data efforts at the Center for Public Integrity and is now data editor at Amer-
ican University’s Investigative Reporting Workshop.
Aron Pilhofer was an IRE/NICAR trainer and led IRE’s campaign finance
information center. He went on to work at the Center for Public Integrity and
the New York Times, where he founded the paper’s first interactives team. Today,
Pilhofer is digital executive editor at the Guardian.
Justin Mayo, a data journalist at the Seattle Times, graduated from the
University of Missouri and worked in the NICAR database library and as a
NICAR trainer. He has paired with reporters on work that has opened sealed
court cases and changed state laws governing logging permits. Mayo was
involved in data analysis and reporting on an investigative project on prob-
lems with prescription methadone policies in the state of Washington, which
received a Pulitzer Prize for Investigative Reporting in 2012 and in covering a
mudslide that received a Pulitzer Prize for Breaking News Reporting in 2015.
Clearly, working at NICAR has meant building powerful skills. So, too,
has attending conferences and boot camps. The students who attended early
NICAR boot camps were “missionaries” who returned to their newsrooms to
teach computational journalism skills to their colleagues, Houston recalled.
For years, the conferences and boot camps were “the only place where people
have had an extensive amount of time to try out new techniques.”5
By the late 1990s, as the increasing prominence of the Internet led more
news organizations to post stories online, journalism education offered even
more digitally focused instruction: multimedia, online video skills, and HTML
coding, among others.
Two strands, data and digital, represent distinct uses of computers
within journalism. Early calls for journalism schools to adapt to changing
technological conditions were answered mainly with the addition of digital
classes—learning how to build a web page, create multimedia, and curate
content.
Many of the early digitally focused journalism instructors faced a battle in
trying to introduce new concepts into print journalism traditions. Data jour-
nalism instructors—focusing more on data analysis for use in stories—have
faced similar challenges.
Meanwhile, by the 1990s, a few universities had begun teaching data
analysis for storytelling. Meyer, who in 1981 became Knight Chair at the
University of North Carolina, was teaching statistical analysis as a reporting
method. Indiana University, with Brown, the professor who launched the first
5 For a more complete look at the long and storied history of computer-assisted reporting, the
spring/summer 2015 edition of the IRE Journal provides a detailed and engaging recounting by Jennifer
LaFleur, NICAR’s first training director in 1994 and now the senior data editor at the Center for Investiga-
tive Reporting/Reveal. Brant Houston details that history in “Fifty Years of Journalism and Data: A Brief
History,” Global Investigative Journalism Network, November 12, 2015.
24 CAR conference, began incorporating the methods into classes. And Missouri
offered computer-assisted reporting instruction, thanks to Jaspin; Brant
Houston, an early NICAR director who later became IRE’s executive director;
and others. Other universities began to introduce basic classes or incorporate
spreadsheets into existing classes.
Houston’s Computer-Assisted Reporting: A Practical Guide became one of
the few foundational texts available on the subject. His book, now in its fourth
edition, lays out the basics of computer-assisted reporting: working with
spreadsheets and database managers as well as finding data that can be used
for journalism, such as local budgets and bridge inspection information. What
Houston detailed in that first edition became essentially a core curriculum for
data journalism from 1995 through the present day. Houston’s work codified
the principles and practices of computer-assisted reporting from the perspec-
tive of its burgeoning community.
But throughout those two decades, journalists still learned these skills
primarily through the NICAR conferences or from other journalists. For many
years, for example, Meyer and Cohen taught a NICAR stats and maps boot
camp at the University of North Carolina geared toward teaching professional
journalists.
Since then, boot camps have become a popular model, used by universities
and other journalism training organizations, often in coordination with IRE/
NICAR. A key tenet of the boot camp is practical, hands-on training, using
data sets that journalists routinely report on, such as school test scores. To sum
up this model, Houston said it’s all about “learning by doing.”
Many boot camp graduates have gone on to robust data journalism careers
and have also moved into teaching in journalism programs, both as adjuncts
and full-time faculty, where they have integrated those teaching techniques into
their classes. These journalists essentially took the curriculum from NICAR
and introduced it into the wider academic world.
In 1996, Arizona State University lured Doig from the Miami Herald to the
academic life where he has been teaching data journalism ever since, serving as
the Knight Chair in Journalism and specializing in data journalism. The stats
and maps boot camp eventually migrated to ASU as well.
As journalism programs began to offer these classes, they focused on the
basics covered in Houston’s book: negotiating for data, cleaning it, and using
spreadsheets and relational databases, mapping, and statistics to find stories.
In 2005, ASU benefited from a push by the Carnegie Corporation of New
York and the John S. and James L. Knight Foundation to revamp journalism
education. The school expanded its focus on all things data and multimedia
with the founding of News21. That program has focused heavily on using data
to tell important and far-reaching stories while teaching hundreds of students
journalism at the same time.
At Columbia, the first course on computer-assisted reporting was offered
in 2003, when Tom Torok, then data editor at the New York Times, taught a
one-credit elective. With the founding of the Stabile Center for Investigative
Journalism in 2006, some data-driven reporting methods were integrated into 25
the coursework for the small group of students selected for the program. The
number of offerings in data and computation at Columbia has risen steadily
since the founding of the Tow Center for Digital Journalism in 2010 and the
Brown Institute for Media Innovation in 2012. In addition to research and
technology development projects, these centers brought full-time faculty and
fellows to teach data and computation, as well as supplied grants to support the
creation of new journalistic platforms and modes of storytelling.
Columbia has also launched several new programs in recent years that
situate data and computational skills within journalistic practice. One is a
dual-degree program in which students simultaneously pursue M.S. degrees
in both Journalism and Computer Science — and those students must be
admitted to both programs independently. In 2014, the Columbia Journalism
School established a second data program, The Lede, in part to aid students in
developing the broad skillset they would need to be a competitive applicant to
both Journalism and CS. The Lede is a non-degree program that provides an
intensive introduction to data and computation over the course of one or two
semesters. Most students arrive with little or no experience with programming
or data analysis, but after three to six months they emerge with a working
knowledge of how databases, algorithms, and visualization can be put to
narrative use. Post Lede, many students are competitive applicants for the dual
degree, but others go directly into the field as reporters.
The emergence of these initiatives in journalism schools reflects the extent
to which data-driven reporting practices have broadened in the last decade. In
the 2000s, journalists began to move well beyond CAR, trying out advanced
statistical analysis techniques, crowdsourcing in ways that ensured data accu-
racy and verification, web scraping, programming, and app development.
In 2009, IRE began working to attract programmers and journalists
specializing in data visualization, said executive director Mark Horvit. It always
offered hands-on sessions in analyzing data, mapping, and statistical methods.
Added to that now are sessions on web scraping, multiple programming
languages, web frameworks, and data visualization, among other topics. The
sessions have even included drone demonstrations. The challenge has become
balancing the panels so that there is enough of each type of data journalism. As
a result, the annual conferences have grown tremendously, from around 400
at the CAR conference each year in the early 2000s to between 900 and 1,000
attendees today.
Other groups began addressing data journalism as well as pushing for new
methods of digital journalism. The Society of Professional Journalists wanted
to teach its members about data and joined with IRE to do so, sponsoring
regional two- or three-day Better Watchdog Workshops. Minority journalism
associations began to provide data journalism training, often in collaboration
with IRE or its members or under the Better Watchdog theme.
The Online News Association’s annual conference focuses on the larger
world of digital journalism. Many of its panels feature coding for presentation,
26 cutting-edge developments in digital web-based products, audience develop-
ment, and mobile. It also offers panels on data journalism and programming.
Still, a gap has persisted. At times, new organizations formed to fill some
of the needs. In 2009, Pilhofer, then at the New York Times, Rich Gordon from
Northwestern University, and Associated Press correspondent Burt Herman,
who was just finishing a Knight Fellowship at Stanford, created a loosely knit
organization that brings together journalists and technologists, hence the name
Hacks/Hackers. Its mission is to create a network of people who “rethink the
future of news and information.” Even as some groups have tried to fill gaps in
data journalism instruction, what exactly counts as data journalism remains a
rough boundary, with few distinctions between data journalism and digital/
web skills. In this paper, we continue to sharpen the focus on what will improve
the level of data journalism education, not overall digital instruction.
In 2013, a group of journalists used Kickstarter to raise $34,000 and create
ForJournalism.com, a teaching platform to provide tutorials on spreadsheets,
scraping, building apps, and visualizations. Founder Dave Stanton said the
group wanted to focus on teaching programmatic journalism concepts and
skills and offer subjects that weren’t being taught. “You didn’t really even have
these online code school things,” he said. “There were a few. The problem was
there was no context for journalism.”
6 Franco Moretti, Graphs, Maps, Trees: Abstract Models for Literary History (London: Verso,
2007) and Dennis Tenen, “Blunt Instrumentalism,” in Debates in the Digital Humanities, forthcoming in
2016, University of Minnesota Press.
28 centers, research institutes, and degree programs (such as data science and
computational media). It is not the purpose of a program in data journalism
to compete with these other disciplines, but to develop a curriculum that is
intrinsically journalistic—one that reflects a mission to find and tell stories in
the public interest—as well as develop partnerships and collaborations with
other disciplines.
One example of unexpected interdepartmental collaboration at Columbia
has been with the Earth Institute, which has curated a massive database of
climate data and offers courses in Python programming in which several Jour-
nalism students have enrolled. This course focuses on large time-series data sets,
which enables data journalists to put the climate into context in their stories.
In 2013, Jean Folkerts, John Maxwell Hamilton, and Nicholas Lemann—
all journalism school deans and two of the three of them longtime professional
journalists—published “Educating Journalists: A New Plea for the University
Tradition.” The paper focused on “universities’ role in journalism as a profes-
sion” but it also discussed how this transformation in journalism could be a
boon for the schools that educate journalists. The authors wrote:
That journalism is going through profound changes does not vitiate—in
fact, it enhances—the importance of journalism schools’ becoming
more fully participant in the university project. Done properly, that will
produce many benefits for the profession at a critical time. Journalism
schools should be oriented toward the future of the profession as well as
the present, and they should not be content merely to train their students
in prevailing entry-level newsroom practices.7
Key among their recommendations was this: “We see all three of these early
strains in journalism education—practice-oriented, subject matter-oriented,
and research-oriented—as essential. And all of them can and should be applied,
with potentially rich results, to the digital revolution. Journalism schools
should embrace all three, not choose one and reject the others.”8
Journalism programs, with their ability to communicate to a general
audience and their potential to analyze and visualize data for story, are a perfect
partner for other departments. For example, at Stanford’s new Computa-
tional Journalism Lab (co-founded by one of this report’s authors), faculty are
working on several projects with professors from other academic disciplines
whose research mission touches on the same data. One goal is that data sets
can be collected, analyzed, and used in academic research as well as for journal-
istic storytelling. In some instances, new methods of analysis can be developed
in concert with important public accountability journalism projects.
OUR FINDINGS
A little more than half of the universities we reviewed—59 of the 113 schools—
offer one or more data journalism courses. We defined a data journalism class
as being focused on the intersection of data and journalism, and using spread-
sheets, statistical software, relational databases, or programming toward that
end. We included in the data journalism category only those programming
classes that went beyond basic HTML and CSS. For the purposes of this report,
we considered classes on HTML, CSS, and JavaScript to be focused on digital/
design journalism, not data journalism. We also excluded courses in numeracy
and communications research methodologies and statistics unless the course
offerings explicitly included a journalism focus. The appendix includes tables
detailing the full results of our analysis.
For Aaron Williams, who is four years out of college, it was not surprising
to hear that our analysis showed 54 of the 113 programs don’t offer a stand-
alone class on data journalism. Williams has worked in data journalism at
the Los Angeles Times, the Center for Investigative Reporting, and now as
interactive editor at the San Francisco Chronicle. Almost everything he knows
he learned from colleagues at NICAR, he said. “I didn’t even really know about
data journalism as a discipline, nor did my instructors . . . until basically I was a
senior,” Williams recalled.
Of the 59 programs we identified that teach at least one data journalism
class, 27 of the schools offer just one course, usually foundational. Fourteen
offer two classes. Just 18 of the 59 schools teaching data journalism offer three
or more classes in this subject.
At a minimum, these programs offer courses that teach students to use
1 It should be noted that information on a degree program’s website does not necessarily reflect
the present state of their curriculum. We reached out to professors and administrative staff in order to
confirm our data, but this was not always possible.
spreadsheets to analyze data for journalistic purposes. At the other end of 33
the spectrum, some schools provide far more, teaching multiple classes in
programming skills, such as scraping the Web, building news apps, or creating
advanced data visualizations. But programs with multiple classes are rare.
A significant number of programs offer some instruction in data journalism,
even if they don’t provide a standalone class. Of the 113 ACEJMC-accredited
programs, 69 integrate some data journalism into other reporting and writing
courses, our analysis showed. In most cases, this entails introducing the
concepts of using spreadsheets or basic analysis as part of reporting and writing
classes or certain topic classes, such as business journalism.
Again, tables summarizing these findings can be found in the appendix,
while the remainder of this chapter will dig deeper into our analysis of syllabi
and course offerings in data journalism.
34 TEACHING DATA FUNDAMENTALS:
ROWS AND COLUMNS
Data journalism professors say that the foundational data class is the most
important because it lays down key mindsets and skills that are a prerequisite
for more advanced learning. Steve Doig of ASU believes the core data syllabus
should consist of negotiating for data, thinking critically about data, and using
spreadsheets to analyze data.
It is difficult to overstate the value of spreadsheets for managing infor-
mation. When we asked former CUNY professor Amanda Hickman, now
an Open Lab senior fellow at BuzzFeed, how she defines data, she replied,
“anything tabular.”
For the foundational computer-assisted reporting classes, the syllabus anal-
ysis and interviews indicate that the coursework is comprehensive, providing a
strong base in critical thinking and basic concepts surrounding the use of data
to find and tell stories. Students are taught similar concepts: critical thinking
and developing a “data frame of mind”—in other words, being able to question
data in a disciplined way, make sense of discrepancies, and find the underlying
patterns and outliers that are important to the analysis.
Most of the classes include some type of hands-on learning. Many of them
focus first on spreadsheets, then SQL, followed by mapping and statistical
concepts. Others include basic data visualization, using Tableau or Google
Fusion as a way into the subject. Multiple professors said the hands-on
approach reinforces the critical thinking concepts, including helping students
to understand what structured data look like and how information of any kind
can be structured for better understanding.
Another key feature of the 63 syllabi we reviewed was an exercise in
requesting and negotiating for data from a governmental body. Dan Keating,
who works at the Washington Post and teaches a long-standing class in comput-
er-assisted reporting at the University of Maryland, said that finding what “no
one has ever known before” is a defining part of his class.
Many CAR courses break down this way:
Hard Skills
» Searching for and finding documents and data that enable the journalist
to make statements of fact, including public requests, deep research, and
scraping skills
» Understanding data structures and how to clean and standardize data into
a form that is useful
» Analyzing data using spreadsheets, databases, mapping, and visualization
» Learning advanced statistical methods that illuminate data
Guiding concepts 35
» Finding what “no one has known before”
» Developing data-driven storytelling techniques, including how to use
numbers effectively in prose and how to tell a story visually
» Thinking of data as an asset in the reporting process
Whether following the guiding concepts or applying the hard skills, journalism
students today must be well grounded in both the importance of data and
the tools to use data in storytelling. “If you don’t deal with data as a journalist,
you’re shutting yourself down,” said McGinty of the Wall Street Journal.
Analysis from
our collection of
syllabi. Advanced
classes included
data visualization,
programming
languages, and other
emerging methods
such as machine
learning.
Foundational
classes included
spreadsheets, basic
relational database
understanding, and
descriptive statistics.
However, teaching the basic CAR curriculum is not enough, argued Kevin
Quealy, a graphics editor at the New York Times and adjunct professor of
journalism at New York University. “To do data work at a high level, one or two
semesters of courses is very inadequate,” he said.
Many journalism programs offer design classes, but often those classes
focus on basic design tenets, overall web design, or static infographics.
Teaching students the concepts and skills needed to visualize data in an interac-
tive way or to build a web application is more rare.
Not all data journalism educators are convinced that data visualization for
news presentation should even be considered part of a data journalism curric-
ulum. However, most agree that it is vital to teach visualization for the purpose
of analysis. Alberto Cairo, who is leading an effort to fill a data visualization gap
in his role as Knight Chair in Visual Journalism at Miami University, believes
that even basic visualization instruction goes a long way toward literacy.
First, data journalists need to know how to do basic exploratory visual 37
analysis, Cairo said. And second, even journalists who practice data visualiza-
tion need to start with the exploratory analysis. They need to know—just like
the CAR specialists—how to “interview” the data, he said.
One challenge for traditional journalism schools, which may lack a strong
journalism design component and may already have difficulty teaching a
CAR or data analysis class, is whether they should tap professionals or recruit
or train faculty to incorporate data visualization. To that, Cairo and other
academics and professionals we interviewed suggest that such schools collabo-
rate with other parts of a university to fill the gap.
For our analysis, we differentiated between web and digital technologies
aimed at presentation and the data skills needed to tell a story. This can be a
difficult boundary line. News applications, for example, are focused on design,
but, based on our interviews, there is a key difference in building a new website
or a multimedia presentation and building something like ProPublica’s “Dollars
for Docs,” which enabled readers to drill into the story of pharmaceutical
industry payments to doctors and also made it possible for other journalists
to find and tell other stories. Meanwhile, “Snow Fall,” the New York Times’s
much-touted (and Pulitzer Prize-winning) interactive story of skiers caught in
a Washington state avalanche, wasn’t about data and it wasn’t about furthering
the use of the data; it used design skills to make the story an immersive multi-
media experience for the reader.
Just 14 journalism schools in our data set teach programming beyond
HTML and CSS, based on their course descriptions. At present, the program-
ming languages most often used in classes on data-driven reporting are SQL,
Python, and R. Instructors focusing on data analysis often incorporate SQL,
and some will introduce R. Some instructors also teach web frameworks, such
as Django and Ruby on Rails, and some visualization professors teach JavaS-
cript and other skills, though fewer go into the D3 library developed by Mike
Bostock, a former New York Times graphic editor.
Deen Freelon, a communication studies professor at American University,
takes a different approach, teaching “code for the purposes of analysis” in a
course open to both communications and journalism students. “I just got back
from my last class where I was teaching students how to analyze Twitter data,”
he said.
While advanced classes are rare, there is a clear demand for this knowl-
edge. In the tech world, short programs designed to train web developers have
emerged as financially viable businesses. These code schools have shown that
some of these skills can be taught in considerably less time than a four-year
degree. The Lede Program at Columbia, which offers a summer boot camp as
well as an intensive two-semester certification program in computational skills
for journalists, has drawn students interested in gaining key data skills in a
short period of time.
Maggie Mulvihill, a clinical professor of journalism at Boston University,
is raising revenue for computational journalism efforts there through holding
38 week-long camps on storytelling with data for non-journalism professionals.
Integrating data journalism exposes students to the field, highlighting
this as an area that they might choose to practice, but it is also an important
step for students developing a foundation of journalistic skills. As noted, 69 of
the 113 AEJMC-accredited programs already integrate some data journalism
into reporting and writing courses, and on this front there is some good news:
several schools expressed interest in adding data journalism in a systematic
way to their programs. When, in order to verify or data, we contacted each
of the programs that had listed either no data journalism class or just one,
11 responded that they are actively working to add data journalism to their
curricula. At the University of Alabama in Tuscaloosa, for example, the school
does not offer a standalone CAR class, but it now includes components of data
analysis instruction in three separate journalism classes.
INSTITUTIONAL CHALLENGES:
RESOURCES
Depending on the university, some students need more support with tech-
nology. Some students still do not own personal laptops and rely on school
computer labs for their assignments, for example. Other students may be using
personal computing devices that are not equipped with what they need to do
data journalism. Students who use a tablet (such as an iPad) as their primary
tool will face barriers.
Meredith Broussard, who taught data journalism at Temple University
until 2015, said that ensuring that her students had the equipment they needed
for her class was a major priority. Many of her students relied on a tablet, which
meant equipping computer labs with the necessary equipment and platforms—
or even lending laptops to students for the term.
Brant Houston of the University of Illinois Urbana-Champaign also 45
pointed to the availability of resources as an important issue—especially for
universities that draw students from economically disadvantaged populations.
Journalism schools can help these students by investing in up-to-date lab
equipment and by working to create an environment that makes it easy for
students to access needed software and to install it on their own devices. Jour-
nalism school administrators should consider more frequent audits and surveys
of professors to identify which software will be most useful for their students.
And for students who are working on their own personal laptops, some
professors hold provisioning sessions to help students install the needed soft-
ware at the beginning of the term.
INSTITUTIONAL CHALLENGES:
FACULTY EXPERTISE
There is no secret that a divide exists between the professional journalism
world and the academic world. This chasm continues even with faculty when
it comes to who teaches data journalism and the impact it will have on the
department.
Of course, each brand of data journalism instructor may have his or her
own biases. Those who started as professional journalists, or who still work in
a newsroom and teach as an adjunct, believe that they can convey the critical
thinking skills needed to succeed in a newsroom environment more effectively
than a professor whose experience is in research.
On the other hand, Diakopoulos of the University of Maryland believes
that faculty should hold PhDs, and that while it would be good to be able to
hire someone with 25 years of experience in data journalism, it’s an unrealistic
expectation at this stage of the field’s development. His goal, he said, is to teach
thinking through research. Still, he admitted that this is a struggle.
Some data journalists and journalism professors take issue with Diako-
poulos, suggesting that such a model of data journalism professors with PhDs
is unrealistic in a world where data journalism emerged from the professional
practice, not academia.
Wherever journalism schools find the necessary faculty, just hiring a new
professor to specialize in data journalism will not solve the problem, said Doig
from ASU. “One difficulty with having somebody like me is everybody else can
say, ‘Ah, we don’t have to worry about data journalism now.’ In reality, I teach
maybe two sections of 20 students each semester. That’s a fraction of our total
student load,” Doig said. “So believing that it is somehow being taken care of by
one specialist like that, that isn’t the case.”
To help solve at least some of the issues, Doig has provided short video
tutorials to other professors for basic government reporting classes.
46 While Doig believes it would be good to have a required data journalism
course, he also questions whether that is possible. “How are you going to find
the faculty to teach that?” he asked. “There’s not enough people in town who
could teach that.”
Professional track and academic track faculty members agree that for
now, pulling in professional journalists to serve as adjuncts will continue to be
necessary and that relying on professional journalists alone will not solve the
problem.
For Dustin Harp at the University of Texas, Arlington, this conundrum
was solved through her own initiative. She had never taught data journalism
but decided the students needed the class, so she did some research and
created one. Some colleagues asked her why. She has tenure and no one asked
her to take on the extra work. But the students needed the class, Harp said. She
used lynda.com for tutorials and learned the same information before teaching
her students.
“The thing is I’m a qualitative researcher, I’m not a numbers person, I’m not
a numbers cruncher, so it was very crazy and daunting. After I said I was going
to do it, it was on the schedule, I was like what have I gotten myself into?” Harp
said. “But I follow the field. . . . I’m aware that data journalism is, it’s a tool our
students need to be more competitive to get jobs.”
INSTITUTIONAL CHALLENGES:
STUDENT ENGAGEMENT
Journalism programs need to do a better job of persuading or even requiring
students to take a data journalism class. Students may shy away because they
believe they aren’t any good at math. “A lot of students are scared of ‘that math
thing,’” said one journalism student at Northwestern University.
Resistance to math is an issue far broader than the field of journalism, but
it will need to be addressed if teaching data journalism is to be taken seriously.
This applies to both teachers and students, some of whom may have chosen to
pursue journalism in part because they thought that it would require little or
no math.
The problems go deeper than just convincing people they can handle math.
Even in universities with entire programs focused on teaching programming,
data journalism, and even data visualization, some students have reported that
it wasn’t easy to find out about these opportunities. Some of the reasons have
to do with silos within schools and departments for specific programs as well
as specific tracks with emphasis on specific types of journalistic practice.
Rich Gordon, co-founder of the Knight Lab and director of digital inno-
vation at the Medill School of Journalism at Northwestern University, agrees
that a gap exists between the basic CAR course and the much more advanced
program through Knight Lab, which brings in technologists and works with
the technologists to develop new applications for journalism. Journalism
students going through a normal degree plan may have the opportunity to take 47
a basic CAR class, but most won’t ever be exposed to the work at the lab, he
said.
In general, data journalism courses are electives and draw only a few
students out of the total enrolled in each journalism program. Some of that
has to do with capacity, but another issue is the lack of visibility. Often, other
professors don’t treat the classes as vital to a journalism career.
“There’s some student interest in CAR,” said one University of Missouri
journalism graduate. “But there would be more if it were expressed as an
option for students early on.”
Universities can address this issue, said Mike Reilley, professor of practice
at Arizona State University, who regards universities as “‘too siloed.” Reilley
advocates team-taught courses and cooperation between departments.
Some students work their way through to find what they need. For
instance, one student who took Temple University’s undergraduate class in
data journalism had taken classes in programming with Python through the
university’s business school. She told our researchers she planned on learning
more data journalism as she wanted to continue doing this type of journalism
when she graduated. But institutional changes could make data journalism
much more accessible.
Several students suggested that schools should offer a track that could
include a journalistically focused statistics class, a class that focuses on data-
bases, a class with a focus on reporting with data, and others that delve into
more in-depth data reporting and data visualization.
One of the authors of this report co-taught a spring 2015 watchdog
reporting class with an engineering professor. Five computer science students
embedded into project teams of journalism students. The journalism students
learned new data skills, and the computer science students learned techniques
in reporting and writing. The class offered challenges, too. Next time around,
there may be a more defined way for the journalism students to take on data
challenges of their own and continued emphasis on having the computer
science students learn skills such as interviewing.
CHAPTER 4:
MODEL CURRICULA I N
DATA AND COMPUTATION
MODEL 1.
INTEGRATING DATA AS A CORE
CLASS: Foundations of Data Journalism
This is a model for a required introductory course at the graduate or undergrad-
uate level. What follows is a narrative account of how such a class may proceed
in developing data literacy among beginning journalists. We realize that this
course may need to fit the unique contours of different journalism programs,
some of which contain boot camps and other introductory programs with idio-
syncratic durations and varying levels of intensity and focus on different skills.
The point is for this course to be given equal footing with other skills or subject
matters that are currently treated as essential in a journalism education.
course description: This course is an introduction to the collection,
analysis, presentation, and critique of structured information by journalists. As
students are introduced to the basics of reporting and the range of journalistic
methods that they may pursue in later coursework, an introduction to data and
computation is an essential component of their journalism education.
Over the course of a term, students should begin to develop a frame of
mind in which they approach every story looking for data possibilities. They
should understand how to use basic methods using spreadsheets and relational
databases. They should get a primer on using and understanding statistical
concepts. They should learn how to take their data findings and locate the
people who illustrate those findings for their stories. They will learn how to
convert their data analysis into a pitch for a journalistic story.
Students will learn how to find data online, how to maintain personal
records as they report stories, and how to use simple visualization methods to
find new information: how keeping a timeline can help reveal discrepancies
and how cross-checking sources of information may lead to new avenues of
inquiry. The use of data in these contexts will benefit students no matter which 51
area of journalism they choose to practice. Just like interviewing, which is a
ubiquitous journalistic skill, the art of gathering and understanding data should
extend widely across the field of journalism.
The trouble with data is that it so often appears clinical or detached from
the richness of people’s lives. To reduce things to abstractions may seem
limiting to some students. Early exercises may help to counter this presup-
position. If you ask the class to gather information about each other such as
their birth dates, blood types, eye color, and birthplace, they may see within a
20-minute exercise how interesting data can be when we learn something from
data that we care to know.
From this point, the class may move to more journalistic exercises with
spreadsheets. As students become more comfortable with spreadsheets, the
class may turn to methods of data analysis such as pivot tables and other plot-
ting methods.
skills: This class should prepare students to use spreadsheets and databases
to find and tell stories.
Central concerns include: spreadsheet training, how to find data, clean it,
look for patterns and outliers, and question the biases and omissions in how
it was gathered. Instructors may choose to use an introductory data set with a
good story for beginners to find (examples are listed in the appendix).
Students should also learn how to critically assess claims surrounding
data. Reporting on data may go astray when it presumes this information is
complete and accurate. Reporters should be trained to look for problems in
data. It is necessary to question every source of information.
Another foundational aspect of data-driven reporting is to recognize
patterns and anomalies in data. Two skills that should emerge from this class
are to look for trends and to identify outliers. Every data point is a possible
source or anecdote.
This class will introduce data visualization, but mainly as a means to
explore a data set. Using an approachable program such as Excel, Fusion Tables,
or Tableau, students will learn to display data in graphic form as an inroad to
asking journalistic questions. The goal is not to design a graphic for publication,
but to graph for the sake of understanding the data. Students may think of this
as a research method or a sketchpad for further reporting. Instruction should
include discussion of the ways that different visualization methods can be
misleading.
Along the way, this class can cover basic numeracy and descriptive statis-
tics—skills that every journalist needs to know. This may include reminders
about how to calculate percentage and percent change, working with units and
measures, and even identifying large numbers like billions and trillions. Once
those are covered, the material could move to statistics principles and methods
such as standard deviation and regression analysis.
topics: Data sources, importing data, negotiating for data, checking the
veracity of data, data cleaning, using formulas in spreadsheets, querying data-
52 bases, finding social significance in the data, writing a data story, visualizing a
data story.
course structure: Mix of hands-on practice and lectures, primarily using
spreadsheet tools and perhaps relational database software; some limited expo-
sure to data visualization for story exploration.
example assignments
» homework: Bring a piece of data journalism to critique in class.
» homework: Find a data set and explain why it’s interesting and what it
might reveal.
» classwork: Discuss basic data analysis and cleaning on prepared
example data.
» spreadsheet assignment: Analyze a government’s payroll, including
overtime, or examine a city or county budget. This could be a bridge
inspection data set, a city budget, or city payroll.
» final: Produce a data story in three assignments: pitch, draft, final
submission.
CLASSROOM SUPPORT
Since many students will enter college with little or no prior training in data
and computation—and worse still, a bevy of uncertainties about their abili-
ties—we recommend a range of support resources including open lab sessions,
teaching assistants, and online resources to review tools and methods.
Open lab sessions give students extra assistance. Matt Waite runs “maker
hours” that are well attended by his students at Nebraska. At Northwestern’s
Knight Lab, students have weekly open hours in which to build and discuss
new digital tools.
In our interviews, observations, and personal experience, a teaching assis-
tant (TA) is a considerable asset to classes in data and computation. TAs can
offer in-class help when students encounter minor bumps. Here’s a common
scenario: a student forgets to type a single character while learning a program-
ming language and cannot understand the error message or parse what’s
missing from that line. Confronted with that situation, many students may not
want to interrupt the instructor and as a result could be left behind. The TA can
quickly assist with a problem like that. TAs also can be on hand during open
lab sessions so that multiple students can get help at once instead of waiting for
the instructor.
MODEL 2. 53
INTEGRATING DATA AND
COMPUTATION INTO EXISTING
COURSES AND CONCENTRATIONS:
General Guidelines for the Undergraduate
and Graduate Levels
The basic principles of data journalism should be as familiar to students as
writing a lede, shooting b-roll, or tweeting updates to a developing story. To
integrate data skills into journalism instruction means introducing these
concerns across the curriculum.
Our central recommendation is for journalism schools to treat data and
computation as core skills for all students. Data journalism must be taught as a
foundational method in introductory classes, a distinct theme in media law and
ethics, a reporting method suitable to any specialized reporting course, and
a subject in which interested students can pursue advanced coursework or a
concentration.
Moreover, because data and algorithms are increasingly important topics
to understand in order to report on issues in business, politics, technology, and
health, among others, subject area reporting classes should include material
that prepares students to approach these information sources with proper
skepticism and to explain them clearly in writing. In the models that follow, we
point to a few ways that data journalism can be integrated into courses that are
commonly offered in journalism schools.
One notable difference between graduate and undergraduate programs
is that a master’s program often begins with a boot camp in which students
are quickly brought up to speed on a wide range of skills. For the majority of
students, who enter without a declared concentration, a boot camp may point
toward areas of unexpected interest. To integrate data and computational jour-
nalism into graduate programs, it must be given equal footing alongside other
areas where students may choose to specialize. An introductory module on
data journalism will benefit students as much as learning the basics of photo-
journalism. Moreover, thematic elective coursework such as environmental and
political reporting should integrate data instruction to the same degree that it
would emphasize such distinct approaches as photojournalism, broadcast, and
long-form journalism.
54 Introductory journalism classes are necessarily broad. Some classes are
thematic, covering material from the basic history and general practices of jour-
nalism to the range of technologies and reporting techniques that constitute
the modern media. Others focus entirely on the practice of journalism. Either
way, data and computation must have a place foundational courses.
At the undergraduate level, this should apply to students pursuing either
a major or a minor in journalism. Coursework toward the minor also should
integrate some measure of data and computational instruction.
Schools may also consider working more coursework in data and computa-
tion for other programs and concentrations. Students focused on investigative
reporting, for instance, would benefit from additional coursework on finding
stories in data, perhaps even as an additional requirement.
HISTORY OF JOURNALISM
how and why to integrate data: Understanding history is especially
valuable during times of apparent change. To observe the field of journalism
evolving over the centuries can make journalism students more conscious
participants in the process of inventing its future. It may also help to temper
the widespread view that journalism is witnessing unprecedented upheaval
due to technology. Looking back, we see that institutions come and go, new
technologies are often disruptive before settling into routine, and the mission
and practice of the profession are perennially under revision. Data and compu-
tation are in many ways emblematic of our time, but not exclusive to it. These
topics have a long history in journalism. This class needs to tell that story.
Two distinct strands of historical concern should be covered. One is to
recount the historical uses of data in the news. For example, a striking and
memorable early case of data-driven journalism dates to the antebellum period
in the United States, when Harriet Beecher Stowe compiled the accounts
of several escaped slaves, aggregated advertisements from Southern news-
papers offering rewards for their return, and published several tables of data
as a rebuttal to claims that her novel Uncle Tom’s Cabin had exaggerated the
reality of slavery. Likewise, one might point to Philip Meyer’s use of data to
undermine racial stereotypes in the coverage of the 1967 Detroit riots. These
two cases highlight the enduring value of data for asserting truths that might
56 otherwise be denied. More broadly, where these stories place data journalism
in historical context, it will not only form a canon to orient students in this
area of practice, but it will also reveal that data journalism, for all its glamorous
novelty, is rooted in a tradition of quality work.
skills to integrate: Acquiring a sense of how the journalistic profession
has developed over time, especially in terms of how journalists have chosen to
depict the world to their audiences. Appreciating how data and computational
journalism fit into historical context.
possible assignments:
» homework: Find and analyze a chart, graph, map, or other data
visualization published in a newspaper at least 50 years ago.
» term paper: Consider a contemporary concern surrounding emerging
technology, such as algorithmic transparency or the Snowden leaks, in the
context of other historical cases.
INVESTIGATIVE REPORTING
how and why to integrate data: Many of the tools and methods of
computational and data-driven journalism were developed through investiga-
tive reporting. Fluency with spreadsheets, databases, and other mainstays of
computer-assisted reporting will enable students to conduct deep investiga-
tions with the full range of resources at their disposal.
skills to integrate: Compiling the backgrounds of people and organiza-
tions with the use of data. Turning documents into data. Making public records
requests and negotiating for data.
possible assignments:
» Tracing shell company ownership through public records.
» Examining medical device reports for problems in devices sold by specific
companies.
CORE CLASSES
Required for Concentration in Data & Computation
INTRODUCTION TO JOURNALISTIC
PROGRAMMING
course description: The purpose of this course is to introduce students
to several foundational computer-programming skills that they will use to find
and tell stories. This should be a requirement of those who concentrate in data
and computation, but also open to students from other tracks.
course structure: Meets twice weekly, first for lecture and then for an 61
intensive workshop.
skills: The Unix command line; basic Python programming for scraping,
parsing, connecting to APIs; introduction to JavaScript for web work.
tools: Bash utilities, Jupyter/IPython Notebook, Pandas, Matplotlib, JavaS-
cript.
example assignments:
» Test proficiency with the command line with a quiz, or even a screencast
demonstrating completion of a series of tasks using Bash alone.
» Story assignment reported and submitted in Jupyter/IPython notebook.
DISTRIBUTION OF ELECTIVES
For the concentration, the school may offer elective courses to fulfill require-
ments in two or three areas of data and computational work. We have divided
these into three categories: presentation/visualization, analysis for story, and
journalistic programming. As a matter of designing degree requirements, a
program might choose to require at least one class from each category in addi-
tion to fulfilling overall credit requirements.
62 presentation & visualization
» Data Visualization
» Visual Journalism with Data and Computation
» Advanced Data Visualization
» Advanced Journalistic Mapping
analysis for story
» Writing About Data
» Statistical Analysis for Journalism
» Advanced Computational Reporting Methods (Using CAR)
journalistic programming
» Introduction to Journalistic Programming
» Methods of Collecting Data and Automating Reporting
» News App Development
» Advanced Computational Journalism
ELECTIVE COURSEWORK
Graduate Degree with Concentration
in Data & Computation
CYBERSECURITY SKILLS
We recommend that students concentrating in data and computation take a
module on digital security because it will be a necessary consideration in their
work.
Every news organization should have an information technology staff that
is capable of securing its digital infrastructure and advising staff about security
risks and countermeasures. In practice, this is not enough. More editorial staff
should be trained in digital security in order to assess and address risks to the
organization, its sources, and its readers. This training would dovetail with the
technical skills that students in a computational journalism course are already
learning.
Cybersecurity has become an increasingly salient ethical concern, espe-
cially in the wake of the Snowden leaks, but digital security skills are rarely
taught in journalism schools today. Just as no journalism student should grad-
uate without some sense of libel law, no student should leave without knowing
at least the dangers of insecure communication channels and practices, and
ideally also some solutions.
For introductions to encryption and digital security for journalists, see
Micah Lee’s “Encryption Works” handbook, published by the Freedom of the
Press Foundation, and “Security for Journalists” by Jonathan Stray.
COURSES
ELECTIVES
One of the goals of a mid-career, expertise-driven degree in journalism is for
students to develop a deep understanding of the field they are reporting. To
this end, this degree should offer several elective slots for taking classes in other
departments that contribute directly to the subject of the thesis.
Students may also consider auditing courses with skill requirements above
their level (for example, if assignments must be submitted in the C program-
ming language, which is still the case in some traditional computer science
classes).
example electives and justification:
» An earth science or geology course focused on climate data.
» A digital humanities course that uses computational techniques to explore
historical archives, literary works, or leaked caches of documents, to name
just a few examples.
» Any number of computer science courses in which students could learn
the technical basis and academic concerns surrounding issues of interest in
their reporting, such as computer vision or cryptography.
» A course in digital security could help a journalist not only to protect
sensitive sources, but also to report on such matters as public key
encryption or onion routing, and to assess new developments in these
fields.
» A graduate course in statistical modeling, whether taken in the statistics
department or in a quantitative social science such as sociology.
The point of elective courses should be to permit students to craft a coursework
plan that is suitable to their own unique interests as they develop the capacity
for expertise-driven reporting in some area related to data, computation, and
emerging technologies.
MODEL 5. 69
ADVANCED GRADUATE DEGREE:
Emerging Journalistic Techniques and Technologies
Investigative reporting is in many ways the research and development wing of
journalism. According to Brant Houston, “It’s the only place where people have
had an extensive amount of time to try out new techniques.”
CAR, data journalism, and computational journalism are some of the
clearest examples of this phenomenon at work. These practices have developed
where reporters have had the time or inclination to work with new tools and
platforms. Universities are ideally suited to cultivate this stance toward jour-
nalistic practice—not merely teaching the wisdom of the field as it exists, but
developing entirely new approaches based on encounters with other disci-
plines and unexplored tools.
If journalism schools were to take up the mantle of encouraging work that
seems to happen only under these permissive conditions—not just through
grants and innovation labs, but perhaps through coursework as well—then
universities could also act as R&D labs in a way that investigative reporting has
in the past.
This curriculum is in many ways the least structured and most speculative
one we offer. It is an open question whether these degrees should be offered at
the master’s or doctoral level. One might also ask whether the degree should
require any coursework or simply provide an open platform for research.
A WORD ON SAFETY
Most innovation labs will feature at least one tool or device that requires safety
training. Most journalism students will arrive without having had experience
handling soldering irons or electrical wiring.
Any lab that includes these devices must provide some safety infrastruc-
ture. Soldering irons should be used with some means of ventilation. Fire
hazards require a nearby extinguisher. And many circumstances may require
safety gloves or goggles.
When Amanda Hickman arrived at the BuzzFeed lab in San Francisco, one
of her first tasks was to evaluate safety. The BuzzFeed team has purchased safety
goggles and fire extinguishers, for example, because it is working with saws and
soldering irons.
Tinkering equipment can be quite cheap, but any technology lab must cover 71
some basics. These components are the bread and butter of hacker and maker
circles, so they are easy to find. Because they offer such useful inroads to experi-
menting with technology, they are valuable for journalism schools to cultivate
spaces of innovation.
Most electrical prototyping starts with a solderless breadboard, a flat
plastic case with an underlying grid of connections or building circuits. A
simple electronic device like an air quality meter can be built from scratch by
placing components like wires, resistors, knobs, buttons, and sensors across
the grid. And connecting a breadboard device to a simple computer like an
Arduino or a Raspberry Pi enables users to issue commands and gather data
from the equipment. A starter set for such a project would generally run under
$100, far less than cameras and other equipment that journalism students are
often required to purchase.
Beyond the small computers used for prototyping, more substantial
computers should be on hand for projects that call for it. If possible, an
emerging technologies lab should have machines that allow students to gain
firsthand experience working with news-bound technology such as immer-
sive 3D cameras, VR headsets, and drones instead of relying on secondhand
accounts. These skills and literacies fit into a larger constellation of technical
concerns that may give rise to media innovation along unforeseen paths.
CHAPTER 5:
INSTITUTIONAL
R E C O M M E N DAT I O N S
Steps Toward Bringing Data and Computation
into Your Journalism School
DATA JOURNALISM
DATA REPORTING
» “Drugging Our Kids,” San Jose Mercury News, 2014
» “Methadone and the Politics of Pain,“ The Seattle Times, 2012
PROGRAMMING LANGUAGES
C is a heavy-lifting programming language that is the language of choice for the
Computer Science Department. It’s far faster than Python or JavaScript and
introduces you to the nitty-gritty of computer science.
Git is something called a version control system—it’s not a programming
language, but programmers use it often. Version control is a way of keeping
track of the history of your code, along with providing a structure that
encourages collaboration. GitHub is a popular cloud-based service that makes
use of git, and we make heavy use of it during the Lede Program.
HTML isn’t technically a programming language, it’s a markup language. A
HyperText Markup Language, to be exact. HTML is used to explain what
different parts of web pages are to your browser, and you use it extensively
when learning to scrape web pages.
JavaScript is a programming language that’s in charge of interactivity on the
Web. When images wiggle or pop-ups annoy you, that’s all JavaScript. The
popular interactive data visualization framework D3 is built using JavaScript.
Python is a multipurpose programming language that is at home crunching,
parsing text, or building Twitter bots. We use Python extensively in the Lede.
R is a programming language that is used widely for mathematical and
statistical processing.
DATA FORMATS
An API (application programming interface) is a way for computers to
communicate to one another. For us, this generally means sharing data. We’ll
be coding up Python scripts to talk to and request data from machines around
the world, from Twitter to the U.S. government.
CSVs (comma-separated values) are the most common format for data. It’s a
quick export away from Excel or Google Spreadsheets, and you’ll find yourself
working from CSVs more often than any other format. Although “comma-
separated” is in the name, a CSV can arguably also use tabs, pipes, or any other
character as a field delimiter (although the tab-separated one can also be called
a TSV).
GeoJSON and Topojson are specifically formatted JSON files that contain
geographic data.
JSON stands for JavaScript Object Notation, and it’s a slightly more
complicated format than a CSV. It can contain lists, numbers, strings, sub-items,
and all sort of complexities that are great for expressing the nuance of real-
world data. Data from an API is often formatted as JSON.
SQL (Structured Query Language) is a language to talk to databases. You’ll
sometimes find data sets in SQL format, ready to be imported into your
database system of choice.
PHILIP MEYER’S
RECOMMENDED TEXTS
» John Tukey, Exploratory Data Analysis (Upper Saddle River, NJ: Pearson
Education. 1977)
» James A. Davis, The Logic of Causal Order (Thousand Oaks, CA: Sage,
1985)
» Robert P. Abelson, Statistics as Principled Argument (Hillsdale, NJ:
Lawrence Erlbaum Associates, 1995)
MOOC EXAMPLES:
» Cairo, Alberto. “Recommended Resources for My Infographics and
Visualization Courses.” Personal. The Functional Art: An Introduction to
Information Graphics and Visualization, October 11, 2012. http://www.
thefunctionalart.com/2012/10/recommended-readings-for-infographics.
html.
» “Cameroon—Cameroon Budget Inquirer.” Accessed September 23, 2015.
http://cameroon.openspending.org/en/.
» Downs, Kat, Dan Hill, Ted Mellnik, Andrew Metcalf, Cory O’Brien,
Cheryl Thompson, and Serdar Tumgoren. “Homicides in the District of
Columbia—The Washington Post.” News. The Washington Post, October 14,
2012. http://apps.washingtonpost.com/investigative/homicides/.
» “Find My School .Ke.” Accessed September 23, 2015. http://findmyschool.
co.ke/.
» Keefe, John, Steven Melendez, and Louise Ma. “Flooding and Flood
Zones | WNYC.” News. WNYC. Accessed September 23, 2015. http://
project.wnyc.org/flooding-sandy-new/index.html.
» Kirk, Chris, and Dan Kois. “How Many People Have Been Killed by Guns
Since Newtown?” Slate, September 16, 2013. http://www.slate.com/
articles/news_and_politics/crime/2012/12/gun_death_tally_every_
american_gun_death_since_newtown_sandy_hook_shooting.html.
86 » Lewis, Jason. “Revealed: The £1 Billion High Cost Lending
Industry | The Bureau of Investigative Journalism.” Journalism. The
Bureau of Investigative Journalism, June 13, 2013. https://www.
thebureauinvestigates.com/2013/06/13/revealed-the-1billion-high-cost-
lending-industry/.
» Nguyen, Dan. “Who in Congress Supports SOPA and PIPA/
PROTECT-IP? | SOPA Opera.” News. ProPublica, January 20, 2012.
http://projects.propublica.org/sopa/.
» Rogers, Simon. “Government Spending by Department, 2011-12: Get
the Data.” The Guardian, December 4, 2012, sec. UK news. http://www.
theguardian.com/news/datablog/2012/dec/04/government-spending-
department-2011-12.
» ———. “John Snow’s Data Journalism: The Cholera Map That Changed
the World.” The Guardian, March 15, 2013, sec. News. http://www.
theguardian.com/news/datablog/2013/mar/15/john-snow-cholera-map.
» ———. “Wikileaks Data Journalism: How We Handled the Data.” The
Guardian, January 31, 2011, sec. News. http://www.theguardian.com/
news/datablog/2011/jan/31/wikileaks-data-journalism.
» ———. “Wikileaks Iraq War Logs: Every Death Mapped.” The Guardian,
October 22, 2010. http://www.theguardian.com/world/datablog/
interactive/2010/oct/23/wikileaks-iraq-deaths-map.
» Rogers, Simon, and John Burn-Murdoch. “Superstorm Sandy: Every
Verified Event Mapped and Detailed.” The Guardian, October 30, 2012.
http://www.theguardian.com/news/datablog/interactive/2012/oct/30/
superstorm-sandy-incidents-mapped.
» Serra, Laura, Maia Jastreblansky, Ivan Ruiz, Ricardo Brom, and Mariana
Trigo Viera. “Argentina’s Senate Expenses 2004-2013.” News. La Nacion,
April 3, 2013. http://blogs.lanacion.com.ar/ddj/data-driven-investigative-
journalism/argentina-senate-expenses/.
» Shaw, Al, Jeremy B. Merrill, and Zamora, Amanda. “Free the Files: Help
ProPublica Unlock Political Ad Spending.” ProPublica, September 4, 2015.
https://projects.propublica.org/free-the-files/.
» “Where Does My Money Go?” Accessed September 23, 2015. http://
wheredoesmymoneygo.org/.
FOUNDATIONS OF COMPUTING
During this introduction to the ins and outs of the Python programming
language, students build a foundation upon which their later, more coding-in-
tensive classes will depend. Dirty, real-world data sets will be cleaned, parsed
and processed while recreating modern journalistic projects. The course
will also touch upon basic visualization and mapping, and how to use public
resources such as Google and Stack Overflow to build self-reliance.
focus: Familiarize yourself with the data-driven landscape
topics & tools include: Python, basic statistical analysis, OpenRefine,
CartoDB, pandas, HTML, CSVs, algorithmic story generation, narrative work-
flow, csvkit, git/GitHub, Stack Overflow, data cleaning, command line tools,
and more
ALGORITHMS
Machine learning and data science are integral to processing and under-
standing large data sets. Whether you’re clustering schools or crime data,
analyzing relationships between people or businesses, or searching for a single
fact in a large data set, algorithms can help. Through supervised and unsuper-
vised learning, students will generate leads, create insights, and figure out how
88 to best focus their efforts with large data sets. A critical eye toward applications
of algorithms will also be developed, uncovering the pitfalls and biases to look
for in your own and others’ work.
focus: Analyzing your data
topics & tools include: linear regression, clustering, text mining, natural
language processing, decision trees, machine learning, scikit-learn, Python,
and more
Moretti, Franco. 2007. Graphs, Maps, Trees: Abstract Models for Literary History.
London: Verso.
Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of The New
York Times Interactive News Technology Department.” Presented at the
International Symposium for Online Journalism, Austin, TX, April 23.