Seshat Project
Seshat Project
Cliodynamics
Title
Seshat: The Global History Databank
Permalink
https://escholarship.org/uc/item/9qx38718
Journal
Cliodynamics, 6(1)
Authors
Turchin, Peter
Brennan, Rob
Currie, Thomas
et al.
Publication Date
2015
DOI
10.21237/C7clio6127917
Supplemental Material
https://escholarship.org/uc/item/9qx38718#supplemental
Abstract
The vast amount of knowledge about past human societies has not
been systematically organized and, therefore, remains inaccessible
for empirically testing theories about cultural evolution and
historical dynamics. For example, what evolutionary mechanisms
were involved in the transition from the small-scale, uncentralized
societies, in which humans lived 10,000 years ago, to the large-scale
societies with an extensive division of labor, great differentials in
wealth and power, and elaborate governance structures of today?
Why do modern states sometimes fail to meet the basic needs of their
populations? Why do economies decline, or fail to grow? In this
article, we describe the structure and uses of a massive databank of
historical and archaeological information, Seshat: The Global History
Databank. The data that we are currently entering in Seshat will allow
us and others to test theories explaining how modern societies
evolved from ancestral ones, and why modern societies vary so much
in their capacity to satisfy their members’ basic human needs.
Introduction
In 1919, the American archaeologist and historian James Henry Breasted, who held
the first chair in Egyptology and Oriental History in the United States (and, in 1928,
became President of the American Historical Association), issued a call for action:
Here, then, is a large and comprehensive task—the systematic
collection of the facts from the monuments, from the written records,
and from the physical habitat, and the organization of these facts into
a great body of historical archives. The scattered fragments of man's
story have never been brought together by anyone. Yet they must be
brought together by some efficient organization and collected under
one roof before the historian can draw out of them and reveal to
modern man the story of his own career. The most important missing
chapters in that story, the ones which will reveal to us the earliest
transition from the savagery of the prehistoric hunter to the social
and ethical development of the earliest civilized communities of our
own cultural ancestors—these are the lost chapters of the human
career which such a body of organized materials from the Near East
will enable us to recover (Breasted 1919).
Today, almost a century since Breasted wrote these words, his grand vision
remains unfulfilled. The vast amount of knowledge about past human societies,
held collectively by thousands of historians and archaeologists, has not been
systematically organized. This knowledge is scattered across heterogeneous
databases, innumerable books, publications in academic journals, and reports in
the ‘grey’ literature, as well as notes in the private archives of scholars. Much of it
is not even written, but resides in the heads of various experts and is permanently
lost when they pass away.
The store of knowledge about past societies is now much more immense than
it was in 1919, but it remains inaccessible for answering Big Questions about us
and our societies, such as the one formulated by Breasted. Translating his question
into more modern terms, we might state it as follows.
For most of our evolutionary history, humans have lived in small-scale societies
of nomadic foraging bands that were integrated by face-to-face interactions and
lacked both central authorities and high levels of structural inequality (Fried 1967,
Lee and DeVore 1968, Service 1975, Mullins et al. 2013). The first large-scale,
complex societies, characterized by an extensive division of labor, great
differentials in wealth and power, elaborate governance structures, and large
urban centers, appeared roughly 5,000 years ago (Liverani 2006, Algaze 2008,
Wilkinson et al. 2014). How this “major evolutionary transition” (Maynard Smith
and Szathmáry 1995) occurred is one of the biggest questions of social evolution,
and is a question for which we still do not have a widely accepted answer. However,
78
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
we do know that a first step occurred in the Near East during the Early Neolithic
period (ca. 9700/9500 – 6400/6200 BC) with the emergence of sedentary farming
communities living in larger settlements, supported by a stable economy with
surplus production(Price and Feinman 1995, Bar-Yosef 2001, Zeder 2011).
During the last 10,000 years, large-scale, complex societies have gradually
replaced small-scale, foraging societies. Despite their ubiquity, the ability of today’s
large-scale societies to construct viable states and nurture productive economies,
varies enormously from country to country. Why do states sometimes fail to meet
the basic needs of their populations? Why do economies decline, or fail to grow? In
many ways, differences between present-day societies can be as large as
differences between our foraging ancestors 15,000 years ago and us today.
In their search for explanations of what makes some societies succeed and
others “fail,” most economists and political scientists focus on the current
conditions or the recent past. Yet modern societies did not suddenly appear 30 or
even 100 years ago; they gradually evolved from pre-existing societies over many
centuries and millennia. If we want to live in better—more peaceful, wealthy, and
just—societies, we need to understand the major evolutionary transition (or
transitions) that occurred in our past and why they resulted in such divergent
outcomes in the present.
Fortunately, today, we are finally able to rise up to Breasted’s challenge. Our
ability to do so is based on the remarkable evolution of information technology that
has taken place over the last several decades. Even more important than
improvements of hardware are recent developments in “knowledge engineering”
techniques—approaches that will enable us to convert the gigantic and
unorganized multitude of facts into structured knowledge, with which we can
finally perform comprehensive tests of the many theories in history and cultural
evolution. This body of organized knowledge will enable us to see much more
clearly, if not fully recover, the lost chapters of a human career, thus fulfilling
Breasted’s dream.
This paper describes our vision for accomplishing this goal. We will do so by
building a massive databank of historical and archaeological information, which
we call Seshat: the Global History Databank. As this paper was written (in the first
half of 2015), Seshat was rapidly becoming much more than a vision. Thanks to the
generous support from government funding agencies, private foundations, and
individual supporters (see Acknowledgments), we have already started the job of
building it.
The goal of this article, thus, is to describe the structure and uses of Seshat. We
start with a theoretical background—a brief overview of various theories that have
been advanced to explain the emergence of large-scale, complex societies over the
last 10,000 years or so, and the changing levels of inequality in the long-term
history of our species. The main question in this section is, how do we extract
79
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
predictions from theories? Next, we explain how data on past societies are
collected and organized in Seshat. In the conclusion, we discuss where we are and
where we are going.
1 It is important to note that, although various members of our research network have
particular theories that they are interested in testing, our overarching goal is to test rival
theories against each other. Thus, data that we gather in Seshat focus on theoretically
relevant variables (variables invoked by various theories, see Building the Databank below),
but overall, the Databank is theory-neutral, and all effort is made to ensure that no particular
theory is privileged. In the long run, theoretical neutrality is enforced by the open-ended
nature of this collective enterprise, which allows for alternative conceptualizations and re-
coding of any variable by proponents of different theoretical frameworks.
81
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
83
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
how much wealth can be inherited, and whether inequalities can persist over
multiple generations (Borgerhoff Mulder et al. 2009). Sedentary agricultural
societies are able to produce food surpluses that can be used to support non-
productive (‘elite’) members of society, whereas hunter-gatherer societies
generally do not produce a surplus (Morrison 1994, Hayden 1995). The historical
and archaeological records suggest that hierarchy and inequality increased after
the advent of agriculture with the development of ranked societies and chiefdoms
(Trigger 2003, Ames 2007). Sedentary farming may have served as a precondition
for the rise of early states: a centralized political organization wielding its political,
economic, and military authority over a territory and a defined group of people,
guaranteeing the division of labor, the storage of food surplus, and the extraction
of resources (Scheidel 2013). However, the most unequal—even despotic—human
societies ever were ‘archaic states’ that first appeared c.3000 BCE (Feinman and
Marcus 1998). These early states were characterized by extreme forms of
structural inequality such as human sacrifice, slavery, unequal rights of
commoners and nobles, and deification of the rulers (Trigger 2003, Kirch 2010).
Religious and ritual mechanisms that evolved for the legitimation of hierarchy and
structural inequality, initially serving the interests of society at large, may have
been hijacked by coercive elites and rulers to drive inequality levels to
unprecedented heights.
A major hypothesis, which we will test empirically, is that the Axial Age
introduced another sea change in the evolution of inequality, starting a move
towards greater egalitarianism that has been continuing to the present. The Axial
Age (Jaspers 1953) refers to a series of religious and philosophical developments
that occurred in such far-flung regions of Eurasia as Greece, the Near East, India,
and East Asia between roughly 800 and 200 BCE. The last two centuries, have seen
the spread of democratic forms of governance and widespread acceptance of
fundamental human rights and equality. These are part of two related
developments that may ultimately have begun in the Axial Age: (1) rulers have
been increasingly constrained to act in ways that promote the public good, rather
than their own interests, and (2) structural forms of human inequality have been
gradually disappearing (the abolition of human sacrifice, slavery, etc.).
Religion may have played an extremely important, yet little-appreciated role in
this second turn. Robert Bellah (2011) has recently argued that a major driver in
the evolution of religion was the need to reconcile the tension between the benefits
of hierarchy and the need for legitimacy and equity, resulting in the new forms of
spirituality associated with the rise of world religions during the Axial Age. One
aspect of this change was the first appearance of a universally egalitarian ethic,
which was largely due to the emergence of “prophet-like figures who, at great peril
to themselves, held the existing power structures to a moral standard that they
clearly did not meet” (Bellah 2011).
84
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
85
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
More-equal More equal societies are more More unequal societies can
societies do better cohesive, better coordinated in more effectively coerce larger
in between-group military conflicts, and have numbers of people to fight in
competition individuals more willing to conflicts. Another alternative
defend their group is that military effectiveness
is relatively independent of
broader societal cohesion
(this may be particularly
relevant to situations where
societies have a permanent
standing army)
Axial religions The need to solve collective Religions have been mainly
made societies action problems entailed a used to support structural
more equal by return to more consensual inequalities, such as the
curtailing the forms of hierarchy, rather than divine nature of rulers, legal
coercive power of forcing individuals to obey distinctions between elites
despots self-serving elites. Political and and commoners, or favor for
religious authority became one ethnicity over others.
more closely entwined, and Ruling elites could use
leaders’ legitimacy came to religion as ‘opium for the
rely more on persuasion (e.g., masses,’ which would
through public, credibility- legitimate the existing order
enhancing displays) and less (involving huge levels of
on the naked exercise of power inequality)
century, however, little trace of their social system remained. The European
settlers brought their own set of genes, crops, domesticated animals, material
culture, language, and institutions, which largely replaced the Native American
equivalents. Thus, an alternative approach to geo-temporal sampling is ‘ethno-
temporal’ sampling: following ethnic groups as they migrate and expand, shrink
and die out, and recording the characteristics of the ethnic group at each point in
time.
Neither approach is entirely satisfactory. Eventually, when our geographic
coverage becomes more complete, we will be able to include the effects of both
spatial proximity and ‘cultural proximity’ (for example, linguistic similarity as a
proxy) in the analysis. Until then, however, we need to select one or the other.
Because tracing ethnic roots of populations can be much more contentious, we
opted for the geo-temporal approach. However, we emphasize that this is only the
first phase in our long-term project. Eventually, our data should be complete
enough to allow a simultaneous estimation of spatial and cultural proximity effects.
We selected 30 areas across the globe for our initial worldwide sample,
stratified by world region and history of social complexity. We divided the world
into ten major regions (see Figure 2 and Table 2) and then selected three natural
geographic areas (NGAs, explained below) within each region. We looked for NGAs
that sampled the diversity of a world region with respect to the relative antiquity
of complex societies within it. Accordingly, one NGA was selected in an area that
developed complex state-level societies very early. Another sampling point was the
opposite in terms of the antiquity of complex societies—ideally, it was free of
centralized polities (chiefdoms and states) until the colonial period. Finally, the
third NGA was intermediate in social complexity. Because different world regions
acquired complex societies at very different times, ‘high complexity’ NGAs can not
be directly compared historically. For example, Susiana in Southwest Asia has
much longer history of complex societies than Hawaii in the Pacific region.
In summary, the World Sample-30 was designed with two goals in mind: (1) to
include as much variation amongst sampled societies as possible, at least along the
social complexity dimension, and (2) to ensure that representation of different
parts of the world was maximized.
91
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
example is the area within which the practice of iron smelting was known (exists
within) during a certain period of time.
Table 2. The World Sample-30. The numbers of NGAs correspond to the numbers
in Figure 2.
World Low Medium High
Region Complexity Complexity Complexity
Africa 1 Ghanaian Coast 11 Niger 21 Upper Egypt
Inland
Delta
Europe 2 Iceland 12 Paris 22 Latium
Basin
Central Eurasia 3 Lena River 13 Orkhon 23 Sogdiana
Valley Valley
Southwest Asia 4 Yemeni Coastal 14 Konya Plain 24 Susiana
Plain
South Asia 5 Garo Hills 15 Deccan 25 Kachi
Plain
Southeast Asia 6 Kapuasi Basin 16 Central Java 26 Cambodian
Basin
East Asia 7 Southern China 17 Kansai 27 Middle
Hills Yellow River
Valley
North America 8 Finger Lakes 18 Cahokia 28 Valley of
Oaxaca
South America 9 Lowland Andes 19 North 29 Cuzco
Colombia
Oceania- 10 Oro, PNG 20 Chuuk 30 Big Island
Australia Islands Hawaii
93
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
Figure 3. Seshat Meta-model, showing entity class hierarchy and the most
important relationships between entity classes (note: classes with black text are
abstract and thus never directly coded; they define sets of variables that are
common to all sub-classes). Thin arrows indicate subclass relationships. For
example, FFA is a subclass of Territory, and NGA is a subclass of FFA. Thick arrows
are examples of relationships.
period as a different polity. For example, we divide Roman history into the
following polities: Regal; Early, Middle, and Late Republic; Principate, and
Dominate.
The current version of the Seshat Code Book has been developed primarily for
historical societies. Coding data for societies that are known only archaeologically
poses an additional set of challenges. We are currently developing an archeological
Seshat Code Book that will address these challenges (Marciniak et al., forthcoming;
Palmisano et al. forthcoming).
Coding Procedure
In populating the databank, the following instructions are provided to coders
(using approaches based on NGAs as an example):
• Identify an NGA within the larger World Region. This should be an area
ideally around 100 by 100 km, or 10,000 sq. km. Dimensions, of the NGA,
however, are allowed to vary quite substantially.
• In the NGA data page, list chronologically all polities that were located in the
NGA, or encompassed it (see Latium as an example). For periods when the
NGA was fragmented among many small-scale polities, use the quasi-polity
approach. In the intermediate case, when there were several large polities
(for example, the NGA was on a frontier between two states), focus on the
one that controlled the largest proportion of the NGA.
• As a coding convention, we try not to have too-long chunks of time on the
same polity data sheet. Try to limit the length to 200–300 years, but at the
same time don't slice it too thinly. We aim at roughly 200 (100–300) year
chunks, but are guided by actual historical events that result in major
change. As an example, we have split the Rome-Republican Period into three
polities (Early, Middle, and Late Republic).
• Next, switch to the data page for the polity and code all polity-based
variables there. In other words, you don't put any polity-related codes
within the NGA sheet. The NGA is used as purely a sampling scheme, and all
codes go into the relevant polity sheets.
• When coding NGA-based variables (such as resources, agriculture, and
population), list chronologically the general 'epochs' and then code these
variables for each period separately.
• The same approach is used to add lists of religious systems and cities, in
which case entry is linked to the page for the RS or City.
96
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
97
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
input tools). Ultimately, we would like to make this open and robust enough so that
any web user can suggest updates to the datasets in an easy way (crowd-sourcing).
Seshat Editors. Data administrators can moderate, correct, and manage the data
in the system over time. The downside to harvesting data from a broad community
is that inaccuracies, disagreements, user errors, and malicious use must be
managed or else the dataset will degrade in quality over time.
Seshat Data Architects. Knowledge engineers can make changes to the schema
(data structures) over time and manage transitions between versions of the
schema without breaking databank integrity (maintaining and assuring the
accuracy and consistency of data over its entire life-cycle).
Seshat Analysts. Statisticians and mathematical modelers will prepare and
analyze time series data to investigate big questions about human societies.
Seshat Readers. General end-users can browse, search, download, and view
arbitrary slices of the data in a very wide range of attractive and helpful ways.
Seshat Administrators. Technical administrators can manage the data curation
and publication platform or servers to deal with changes in data, schemata,
collection tools and publication formats or tools (e.g., visualizations) over time in a
scalable fashion.
Seshat will be managed through the Dacura Linked Data platform
(http://dacura.scss.tcd.ie), developed at Trinity College Dublin. Dacura provides
support for dataset capturing, curation, and publication (Feeney 2014).
Nonetheless, when dealing with a complex and multi-faceted domain, the
development of formal schema and tools to facilitate convenient and accurate data
input requires considerable experimental and developmental effort. The nature of
the Seshat data is such that there are many opportunities to take advantage of maps
and timelines to capture the spatial and temporal aspects of the data. However,
these tasks are labor-intensive, so the development of the databank system will
proceed incrementally. The Seshat databank will be progressively migrated from
the current Wiki to the Dacura Linked Data platform in a number of phases, each
designed to add functionality to the system, progressively increasing its ability to
gather and present high quality structured data, without interrupting researchers’
ability to add new data. The following text describes the initial phases in the
migration.
98
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
99
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
The two most important dimensions of Seshat expansion in the medium term
are (1) adding more variables and (2) increasing the databank coverage to a
100
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
progressively greater fraction of the world’s surface. As far as new variables are
concerned, it would be extremely interesting to add data on the evolution of
technology and on linguistic evolution. These variable classes are interesting in
themselves because each is associated with a developed body of theory (for recent
reviews, see Boyd et al. 2013, Gray et al. 2013), and they can be important
explanans variables in helping us explain other aspects of the cultural evolution of
human societies. Another exciting area of research is gene-culture coevolution,
which has, so far, been primarily investigated with theoretical models. There is
little reason to doubt that the ongoing methodological advances in ancient DNA
will soon enable us to test these theories with data (Callaway 2015).
Increasing the thematic and geographic coverage, however, will come at a cost.
Projecting from the current rate of data accumulation, we estimate that by 2017,
the size of the Seshat databank will approach—or exceed—the symbolic threshold
of one million ‘facts.’ 3 But even this enormous amount of data will be restricted in
its thematic and, especially, geographic coverage. It is already clear that extending
the geographic coverage to the whole world is not feasible using our current data
collection approach.
One direction that we are currently exploring is crowd sourcing—developing
software that will support recruitment of volunteers to assist in manual data
processing. More radically, we will need to transition to a technology that
automates the harvesting of the required variable values from open web-sources,
including, but not limited to, the repositories of scholarly publications. Thus, we
intend to explore the possibility of harvesting data from such source as JSTOR’s
archive of academic publications.
While harvesting data from unstructured massive repositories, such as JSTOR,
is a future direction of Seshat development, there are also numerous sources of
structured data available, which contain data relevant to the Seshat Databank. The
Linked Data technology, on which the Dacura platform is based, facilitates
interlinking the Seshat data with other datasets and supports applications and
analysis that integrates Seshat data with other sources, as long as they also use the
RDF technology. Given the richness of the Seshat databank and the expanding web
of data, there will be many important insights to be gained by interlinking data that
go beyond the scope of the Seshat big questions themselves.
Another very important direction in which to develop is improved data quality.
Once the databank has been migrated to the triplestore, each data point in it will
have searchable, structured provenance records associated with it, which will
detail the precise source of each variable value. This will go beyond logging
information on data changes and will connect data values with the source of
3We are using ‘facts’ here to refer to individual RDF triples. A typical Seshat ‘data point,’
depending on its informational complexity (such as temporal extent, presence of
uncertainty or disagreement, etc.), is encoded with 3–10 triples.
101
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
Acknowledgments
This work was supported by a John Templeton Foundation grant to the Evolution
Institute, entitled "Axial-Age Religions and the Z-Curve of Human Egalitarianism,"
a Tricoastal Foundation grant to the Evolution Institute, entitled "The Deep Roots
of the Modern World: The Cultural Evolution of Economic Growth and Political
Stability," an ESRC Large Grant to the University of Oxford, entitled "Ritual,
Community, and Conflict" (REF RES-060-25-0085), and a grant from the European
Union Horizon 2020 research and innovation programme (grant agreement No
644055 [ALIGNED, www.aligned-project.eu]). We gratefully acknowledge the
102
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
References
Algaze, G. 2008. Ancient Mesopotamia at the Dawn of Civilization: The Evolution of
an Urban Landscape. University of Chicago Press, Chicago.
Ames, K. 2007. The Archaeology of Rank. Pages 487-513 in R. A. Bentley, H. D. G.
Maschner, and C. Chippendale, editors. Handbook of Archaeological Theories.
Alta Mira Press, Lanham.
Atkinson, Q. D., and H. Whitehouse. 2010. The cultural morphospace of ritual form:
Examining modes of religiosity cross-culturally.
Axelrod, R., and W. D. Hamilton. 1981. The evolution of cooperation. Science
211:1390-1396.
Bar-Yosef, O. 2001. From Sedentary Foragers to Village Hierarchies: the Emergence
of Social Institutions. Pages 1-38 in G. Runciman, editor. The origin of Human
Social Institutions. British Academy, London.
Bellah, R. N. 2011. Religion in Human Evolution: From the Paleolithic to the Axial
Age. Harvard University Press, Cambridge, MA.
Boehm, C. 2001. Hierarchy in the Forest: The Evolution of Egalitarian Behavior.
Harvard University Press, Harvard.
Boehm, C. 2012. Moral Origins: The Evolution of Virtue, Altruism, and Shame. Basic
Books, New York.
Borgerhoff Mulder, M., S. Bowles, T. Hertz, A. Bell, J. Beise, G. Clark, I. Fazzio, M.
Gurven, K. Hill, P. L. Hooper, W. Irons, H. Kaplan, D. Leonetti, B. Low, F.
Marlowe, R. McElreath, S. Naidu, D. Nolin, P. Piraino, R. Quinlan, E. Schniter, R.
Sear, M. Shenk, E. A. Smith, C. v. Rueden, and P. Wiessner. 2009.
Intergenerational Wealth Transmission and the Dynamics of Inequality in
Small-Scale Societies. Science 326:682-688.
Bowles, S. 2009. Did Warfare Among Ancestral Hunter-Gatherers Affect the
Evolution of Human Social Behaviors? Science 324:1293-1298.
Boyd, R., P. J. Richerson, and J. Henrich. 2013. The Cultural Evolution of Technology:
Facts and Theories. Pages 119-142 in P. J. Richerson and M. H. Christiansen,
editors. Cultural Evolution: Society, Technology, Language, and Religion. MIT
Press, Cambridge, MA.
Breasted, J. H. 1919. The Oriental Institute of the University of Chicago. American
Journal of Semitic Languages and Literatures 35:196-204.
103
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
104
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
105
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
106
Turchin et al.: The Seshat Databank. Cliodynamics 6:1 (2015)
107