Trusting Digital Records
Trusting Digital Records
Luciana Duranti t
Translated by Koga Takashi +
The Goal of InterPARES 1 and 2 (1998-2006)
InterPARES began in 1998 with the purpose of developing the body of theory and methods necessary to
ensure that digital records produced in databases and office systems, as well as in dynamic, experiential, and
interactive systems in the course of artistic, scientific, and e-government activities can be created in an accurate
and reliable form, and maintained and preserved in an authentic form, both in the long and short term, for the
use of those who created them and for society at large, regardless of technological obsolescence and media
fragility.
In other words, InterPARES research was meant to develop new theory and new methodology for digital
preservation, based on the understanding that preservation begins at creation.
t Professor, School of Library, Archival and Information Studies, The University of British Columbia
:j: Associate Professor, Faculty of Human Studies, Tenri University
-15-
guidelines guide any person, office or organization who uses them to create digital records so that can be main-
tained in the right way, and to keep and use them ensuring that their accuracy, reliability and authenticity are
protected and that they remain accessible through time.
These guidelines have already been translated into several languages. On the website they are posted in
English, French, Spanish, Catalan, Portuguese, and Chinese. They have not been translated in Japanese yet,
but, if useful, the Japanese professional community is very welcome to do so.
The third key product is the "Preserver Guidelines" : recommendations for digital preservation for archival
institutions, programs, units, and organizations. They are basically the complement to the Creator Guidelines
as they build on them. These are for archivists or whomever is in charge of preservation. They represent all the
digital guidance that is needed to preserve digital records received from the creator. These guidelines have
been translated into the same languages as the Creator Guidelines.
The fourth key product is the "Benchmarks and Baseline Requirements for Authenticity." Benchmarks
requirements for authenticity are the requirements for those who create and maintain the records, to make
sure that the records can be proven to be authentic at any given time in their active life. Baseline requirements
are the requirements for archivists or any preserver to maintain authenticity over the long term and to be able
to demonstrate it.
The fifth key product is the "File Format Selection Guidelines," which articulate principles and criteria for
selecting the file formats, wrappers or encoding schemes that are the most appropriate for preservation.
The sixth key product is the "Terminology Database," which is composed of three parts: a glossary, a dic-
tionary, and three ontologies. The glossary comprises the terms contained in InterPARES documents, and
defines them as used by the InterPARES researchers. The dictionary includes the same terms, but defined in
several other ways, using as sources existing glossaries and dictionaries, also from other fields. The three
ontologies are graphics which show the relationships among terms and concepts.
The seventh key product is constituted of two records management models. These are extremely important
for those who want to design systems or analyze what they have, identifying possible gaps. The "Chain of
Preservation Model," or COP model, follows the concept of the lifecycle of records from creation to preserva-
tion. The "Business-driven Recordkeeping Model," or BRM, in contrast to the COP model, follows the
Australian concept of the continuum. If you believe in that concept, you can use that specific model to design
systems that enact that concept. One might guess from what I will discuss in the course of this presentation
that I do not support the BRM. I do support the COP model. But InterPARES wishes to serve the records pro-
fessional community worldwide, thus, it developed models that can accommodate all points of view.
The eighth and ninth products are two books which are available online. The first resulted from the
11
InterPARES 1 project, and the second from InterPARES 2. )
-16-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
and Humanities Research Council of Canada and the University of British Columbia(UBC) in Vancouver,
Canada. The project headquarters resided in the School of Library, Archival and Information Studies, at UBC. I
was the Director.
InterPARES 3 Findings
The findings of Inter PARES 3 were of three types: conceptual findings (I will discuss, among these, the con-
cept of trustworthiness) , methodological findings (I will discuss, among these, preservation methods) , and
strategic findings (I will discuss, among these, the role of the archivist) .
-17-
Trust is based on rules and these rules are related to those who give trust and those who receive trust. The
bond between those who trust (i.e. the trusters) and those who are trusted (i.e. the trustees) is based on
four characteristics of the trustees.
Characteristics of Trustees
What are the characteristics trustees are expected to have?
The first is reputation. One evaluates the trustee's past actions and conduct, and if they are good, then the
trustee can be trusted.
The second is performance, which means that one accepts the present actions of the trustee and compares
them with what is required to fulfill the responsibilities in question.
The third is confidence, which means that one is pretty sure that expectations of performance will be fulfilled.
And fourth is the most important characteristic of the trustee: competence, which means that the trustee has
the knowledge, skills, talents and traits required to be able to perform a task to a given standard.
~1s~
Trusting Digital Records: the Major Findings ofthe InterPARES Project (Duranti)
for their content, but as evidence of the action one has carried out: they attest to the fact that one used the bank
ATM and did certain things and so much money was withdrawn.
The third category is a combination of the other two. For example, if one creates a spreadsheet, one includes
in the spreadsheet one's own data, which is a human statement. But, the program of the spreadsheet processes
the data, so the result is a record that is both stored in and generated by the computer. When one has a record
like a spreadsheet, one cannot just preserve the documentary form, but has to preserve its functionality, the
way it works in the system.
So, when one preserves computer stored records, it is enough to maintain what one sees on the screen.
When one preserves computer generated records, one must preserve the way in which they interact with each
other. But when one preserves a spreadsheet, one has to preserve both.
Types of Trustworthiness
Before proceeding to how we do that, we have to think about what trustworthiness means traditionally.
Records trustworthiness encompasses three attributes of the records: reliability, accuracy and authenticity.
Reliability is the trustworthiness of the record as a fact. For example, my certificate of citizenship is evidence
of my citizenship. So, I trust it as being my citizenship. When we trust the record for what it says, we traditional-
ly accept it at face value without question. We look at who is the author, and, if we trust the author, we trust the
record. If the record is complete, if all the parts are there, then we can trust it. If its creation is controlled, then
we can trust the record. We can trust the content of the record, because we trust the author, the form, and the
process, by inference. For example, if a diagnosis is signed by a doctor, one trusts it, but if it is signed by a
nurse, one does not.
Accuracy refers to how correct and precise the data inside the record are, and we base the assessment on the
same factors on which we assess reliability, but in addition, on the controls on the way of recording and trans-
mitting the content. For example, if we had a table with columns and rows, when one transmits it, the data
might change place. Transmission is the weakest link in the chain of preservation of a record.
Authenticity is the trustworthiness of the record as a record. That means no one has tampered with it, or the
record has not been corrupted and has not changed since creation. Authenticity means that a record retains its
identity and its integrity.
Reliability
I explained how we used to assess reliability, accuracy and authenticity in the traditional record environment.
In the digital environment, if we look at reliability, we can see that the source of the record is still the key. We
think the record is reliable if we trust the source. However, with digital records, the source is no longer only a
reliable person or a reliable procedure, but can be a process or software: if one trusts the software that gener-
ates the record, then one may trust the record.
This implies that the software should be an open-source software, because, if we want to assess the reliability
of the record on the basis of a reliable process of creation, then we need to know what that process is; if the
software is proprietary, we don't know what it is. So, we need to be able to describe the process, or the system
-19-
producing a certain result, or we have to demonstrate the process or the system, and show that it does produce
an identical result.
Accuracy
In order to assess accuracy with traditional records, it was enough to demonstrate that the records were orig-
inal, because one couldn't have changed just the data within the record without the chance of being found out,
but with digital entities, as one can change the data without being spotted, one can only demonstrate that the
record is accurate if one can repeat the same process of creation and obtain the same result. So, repeatability is
one of the fundamental precepts of digital forensics, which is the discipline that has been developed to identify
evidence and to prove that evidence has not been forged. The test of accuracy must be supported by the docu-
mentation of every action carried out on the record, so one must be able to document everything done to the
record.
Of course, in order to repeat a process, one needs to have open-source software, to know what the process
was. And this is especially important for archivists, who do conversion and migration of digital records, moving
them to different media, and when the media become obsolete, to a different operating system, and so on.
Archivists have to be able to prove that, no matter who is doing the process, or under what condition the
process is done, the same process will give the same outcome.
Authenticity
When it comes to authenticity, one has to rely on the contexts of the digital records - the procedural context,
the documentary context, and the technological context, but also, on the identity and integrity of the records.
The identity of the records used to be provided by the date, the author, the signature, the seal, the classifica-
tion code, the registry number, etc. Now the identity of the records is in the metadata. So, it is very important
that the required metadata are preserved, especially the metadata which show the relationship of the record
with the other records, that is the documentary context of the record.
Integrity means that the message that the record is supposed to communicate has not been substantially
altered. But what does that imply?
Integrity
With traditional records, when would one say that a record does not have any more integrity? When it has a
hole in it? Two holes? When it becomes yellow? When the ink bleeds through? When it is cut? When it is wrin-
kled? We used common sense to make such decision.
Data Integrity
In the digital environment, the most important thing about integrity is bitwise integrity, which means that the
data are not modified either intentionally or accidentally without proper authorization.
-20-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
records.
In a digital environment, every time one turns on or
off the computer, one can create change in the records in the system. And every time one accesses the system,
one changes the environment in which the records exist. The police experts who try to sequester records for a
-21-
crime, first turn off the computer, then they make an image of the hard drive; they do not try to search anything
in the computer, because that can accidentally change the bits. Rather they search the hard drive image.
Prevention is very important, but it is equally important to be able to find out when change has occurred.
This is why preservation of digital records is a very laborious effort, because one must continually check to see
that change has not happened.
How does one check? By using logs. Logs are files which are automatically created by the system to track all
the actions taken by the people who interact with the system.
We will shortly return to logs, but now we have to talk about duplication integrity.
Duplication Integrity
In the traditional environment, when we make copies of records, we are concerned about their accuracy, but
we still have the original, so we are not very concerned about the integrity of the copy.
In the digital environment, every time we create a duplicate, that is our record; we no longer have an original.
Duplication integrity is defined as the fact that, if we have a record, or a data set, the process of creating the
duplicate does not modify the record or data, and the duplicate is an exact digital copy of the original record or
data.
In fact, every time we make a copy, what we produce is slightly different from what we had before. This is the
reason why it is good to have a time stamp on the copy that one makes: if we have copies taken at different
times, we would know that they are different because they were made at different times, not because one is a
forgery.
It is important to understand the difference between a copy and an image, because if we tell a forensic expert
that we want a copy of an hard drive, what the forensic expert hears is "image of the hard drive," and they are
two very different things.
A disk image is a bit-by-bit reproduction of the hard drive. The full disk copy of the data on the hard drive
includes also all the empty spaces and all the deleted files.
I know of university archives that take images of all the hard drives of the heads of the departments, because
this way, they have the complete and accurate record of what was generated if the computer fails or material is
accidentally deleted. However, they are legally on a very shaky ground, because the images also preserve all
the files that the heads of the departments had deleted. So, there are ethical problems as well as privacy prob-
lems with this procedure.
It is better to make a real copy, which is a selective duplicate of files. One only copies what one can see, not
everything that ever was inscribed on the hard drive, because one would need permission to access certain
files. So, one should have an incomplete picture of the digital device, and in our preservation responsibilities,
we cannot expect to have a complete picture.
On the other hand, after an archives has acquired the records, and it has its own hard drives with all the
material, then the best way of keeping reproducing it is to image the hard drive.
-22-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
Process Integrity
Process integrity is the most important kind of integrity for archivists. In the digital environment, we have to
be able to prove what we have done to the records, step-by-step, from the moment we have received them to
forever, and what we have to prove is that either we did not interfere with the records, that is, the methods we
used to gather them, capture them, use them, or manage and preserve them did not change them, or that if we
changed them, we documented the changes.
Authentication
Now a word about authentication is in order. Many legislative texts in many countries, especially in Europe,
confuse authenticity with authentication. Legislators think that, if they prescribe a method of authentication,
they have guaranteed the authenticity of the records, but that is not true.
Authentication is simply one of the means of declaring that a record is authentic. But, it can only declare that
a record is authentic in one specific moment in time, when the declaration is made. Authentication does not
keep the record authentic.
The digital signature, for example, has more the function of a seal than the function of a signature, because
the signature assigns responsibility for the content of the record and it is a necessary component of the record,
while the digital signature is an attachment to a complete record.
The problem with the digital signature is that it cannot be preserved with the record, so it is useful for the
transmission of the record, but when one receives it, one cannot preserve it with the record, because it
becomes obsolete before the record, and cannot be migrated with the record.
-23-
that the record has been under responsible, trusted custody from the moment it has been generated.
The digital chain of custody is the recording of the information about the record and its changes and shows
that specific data was in a particular state at a given time and date. Thus, also this is a good authentication
method.
A declaration made by an expert on the trustworthiness of the recordkeeping and the preservation systems is
more important than any digital signature.
Preservation
What we have discussed has very important consequences for the meaning of the concept of preservation,
and for the function of the archivist.
The concept of preservation in the digital environment must include all the processes necessary to transmit
the record through time from creation to forever, including conversion and migration. Preservation is not just
keeping what we have, but ensuring that we create records in such a way that we will be able to preserve them.
The unbroken chain of preservation must begin with the creation of the record and continue from the record-
making system-the system in which the records are generated-to the recordkeeping system, and then to the
record preservation system. When we describe a preservation process, we must begin from the moment in
which the system where the records are going to be created is designed.
-24-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
-25-
does, because we can't wait that scholars, and even less legislators or bureaucrats, tell you what to do. Things
change too fast, and archivists must constantly be on the leading edge of research and test of the findings of all
research projects that are carried out worldwide, because otherwise, they will never catch up.
InterPARES 3 Products
To start with, you can look at the products list of InterPARES 3. I am not going to describe them, as they can
be found on the InterPARES website, under "Products," and then "InterPARES 3." Products can be accessed
by case study or by general study, or through the reports of each team, or by keyword or subject matter.
Furthermore all Inter PARES products can be used at your pleasure, as you wish.
Thank you.
[Notes]
(1) All the key products listed here can be easily accessed through the InterPARES portal at www.interpares.org.
(2) The status of transmission of a record is its degree of perfection. Thus, a draft is incomplete and meant for correc-
tion, an original is the first complete record capable of reaching the purposes for which it was intended, and a copy is
a reproduction of a draft, an original, or another copy. In the paper world, each record continued to exist in the status
of transmission in which it was filed. In the digital world, an original only exists after reception until the moment it is
saved. After that we have only copies. A draft only exists as such for as long as one works on it. After it is saved we can
only retrieve copies of it.
-26-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
l
lnterPARESProject
Lucoana Duran~
Pro)ectD"ector
[7.71F4]
~27~
[:A71F8]
[:A-71 F9]
Terminology Database
Two Records Management Models
Including a glossary, a dictionary and ontologies
Chain of Preservation (COP) Model (lifecycle)
Business-driven Recordkeeping (BDR) Model (continuum)
12
-28-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
13 14
15 16
[-A711"17] [-A71F'18]
-29-
[A 7 1 F 19) [A 71 F 20)
I
InterPARESProject 19 20
LuoanaDuranb
ProjectDrrector
[A 71 r 21J [A71F22)
Accuracy Authenticity
Digital entities are guaranteed accurate if they are repeatable. Context: The procedural, documentary and technological
Repeatability, which is one of the fundamental precepts of environment in which the record was created and used
digital forensics, is supported by the documentation of overtime
each and every action carried out on the record. Identity: The whole of the attributes of a record that
Open source software is again the best choice for assessing characterize it as unique, and that distinguish it from other
accuracy, especially when conversion or migration occurs, records (e.g. date, author, addressee, subject, identifier).
because it allows for a practical demonstration that Integrity: A record has integrity if the message it is meant to
nothing could be altered, lost, planted, or destroyed in the communicate in order to achieve its purpose is unaltered
process (e.g. text and form fidelity, absence of technical changes).
21 22
[A 71 F 23) [A71F24)
23 24
-30-
Trusting Digital Records: the Major Findings ofthe InterPARES Project (Durantil
[-A71F25]
Loss of Integrity:
Analog vs. Digital Loss of Integrity (cont.)
• If Original Bits 101
• Change state to 11 0
• Continues to a 0 11
27 28
29 30
-31-
[ 7, 7 -1 F 31) [7, 7 -1 F 32)
31 32
33 34
35 36
-32-
Trusting Digital Records: the Major Findings of the InterPARES Project (Duranti)
(-A 71 F 38]
The Archivist's New Role (cont.) The Archivist's New Role (cont.)
Determines a preservation strategy based on:
Establishes procedures to prevent, discover, and correct loss or corruption
of records, as well as
- a controlled process of migration of the acquired records to Establishes procedures to guarantee the continuing identity and integrity
the archives technological environment (always keeping the (i.e. authenticity) of the records against media deterioration and across
records also in the format in which they were acquired) technological changes; and
- the accurate documentation of any change that the records Authenticates individual records according to the rules that determine
undergo during such process and every time that the archives responsibility for and means of authentication.
technological environment is upgraded
Controls the accuracy of the records after each conversion or migration
- the implementation and monitoring of privileges concerning
Develops procedures that address issues of intellectual rights and
the access, use and reproduction of the records within the
archives privacy
Recognizes to archival description a primary authentication function
Is constantly involved in research and development projects similar to
those carried out by the industry
37 38
Terminology Database
• Cost-benefit Models
• Directory of Digital Preservation Projects Ethical Models
• Directory oflntemational Standards Relevant to IP3 • File Viewers Assessment
• E-mail Preservation • Open Source Records Management Software Assessment
Protocol Registry Preservation Metadata Applications Profiles
• Community Archives e-Records Assessment • Web 2.0/Social Media
Public Sector Audit Report for Digital Recordkeeping • Organizational Culture & Risk Assessment
Records Management Policies and Procedures Template Education Modules (with ICA)
~
39 InterPARESProject
40
Luc1ana Dur"n~
ProjectO<rector
(-A 71 F 41]
41 42
-33-