Tutorial 1
Tutorial 1
Introduction to Fedora
Copyright: ©2005 The Rector and Visitors of The University of Virginia and Cornell
University
Purpose: This tutorial introduces the basic development questions, design concepts and
project goals of the Flexible Extensible Digital Object Repository Architecture (Fedora).
Audience: This tutorial is intended for anyone who will be using the Fedora software in
any capacity, or who is generally interested in Fedora and its development.
2 of 15
Table of Contents
Table of Contents.................................................................................................................3
Table of Figures...................................................................................................................3
Section 1: What is Fedora?..................................................................................................4
What is Fedora?...............................................................................................................4
Fedora History..................................................................................................................4
Section 2: Motivation...........................................................................................................5
The Problem of Digital Content.......................................................................................5
Key Research Questions..................................................................................................5
Fedora Goals....................................................................................................................5
Design Advantages—Where the Rubber Hits the Road..................................................6
Fedora’s Digital Object Model....................................................................................6
Distributed Repositories...............................................................................................7
Preservation & Archiving............................................................................................7
Content Repurposing...................................................................................................7
Web Services...............................................................................................................8
Easy Integration with Other Applications and Systems...............................................8
Section 3: Digital Object Model..........................................................................................9
Section 4: Fedora Repository Server..................................................................................13
Table of Figures
Figure 1: Fedora Digital Object Architectural View............................................................9
Figure 2: Fedora Digital Object Image Example...............................................................11
Figure 3: Fedora System Architecture (simplified)............................................................13
Figure 4: Client and Web Services Interaction..................................................................14
3 of 15
Section 1: What is Fedora?
What is Fedora?
Fedora is an acronym for Flexible Extensible Digital Object Repository Architecture.
Fedora’s flexibility makes it capable of serving as a digital repository for a variety of use
cases. Among these are digital asset management, institutional repositories, digital
archives, content management systems, scholarly publishing enterprises, and digital
libraries. Fedora is open-source software licensed under the Mozilla Public License.
Fedora History
Fedora began in 1997 as a DARPA and NSF funded research project at Cornell
University, where the initial reference implementation was developed by Sandra Payette,
Carl Lagoze, and Naomi Dushay. Work at Cornell included a CORBA-based technical
implementation, work on policy enforcement, and extensive interoperability testing with
CNRI.
The first practical application of Fedora was the digital library prototype developed at
UVa by Thornton Staples and Ross Wayland in 1999, where the software was adapted to
the web and an RDBMS was added to improve performance. The initial work done on the
Fedora prototype included scalability testing for 10 million objects.
A full scale development project was begun in 2002 with grant funding from the Andrew
W. Mellon Foundation. This project’s charge was to create a production quality Fedora
system, using XML and web services to deliver digital content. Fedora 1.0 was released
in May 2003, with subsequent releases following approximately every quarter which have
added functionality and corrected bugs discovered by users and the Fedora development
team.
In June 2004, the Andrew W. Mellon Foundation funded Fedora Phase 2 for an additional
3 year project. Planning for Phase 2 development is underway as of this writing.
4 of 15
Section 2: Motivation
This section provides information on the issues and questions that have motivated the
development of Fedora.
Conventional Objects: books and other text objects, geospatial data, images,
maps
Complex, Compound, Dynamic Objects: video, numeric data sets and their
associated code books, timed audio
As users become more sophisticated at creating and using complex digital content, digital
repositories must also become more sophisticated. As digital collections grow, and are
made use of in previously unconsidered ways, repository managers are faced with
management tasks of increasing complexity. Collections are being built which contain
multiple data types, and organizations have discovered a need to archive and preserve
complex objects like those listed above, as well as web sites and other complex, multi-
part documents. And finally, as collections grow in both size and complexity, the need to
establish relationships between data objects in a repository becomes more and more
apparent.
5 of 15
Fedora Goals
These five key research questions led logically to ten goals for Fedora’s development.
• Abstraction: The object model is the same whether the object is data,
behavior definitions, or behavior mechanism. It also does not matter
what kind of data the digital objects is representing—text, images,
maps, audio, video, geospatial data are all the same to Fedora.
• Flexibility: Implementers of Fedora can design their content models to
best represent their data and the presentation requirements of their
specific use case.
6 of 15
• Generic: Metadata and content are tightly linked within the digital
object.
• Aggregation: Fedora objects can refer to data that is stored locally or
that is stored on any web accessible server.
• Extensibility: Fedora’s behavior interfaces are extensible because
services are directly associated with data within a Fedora object. As
the services change, the objects change along with them.
Distributed Repositories
The Fedora Architecture, as originally designed by Payette and Lagoze, was
intended to support distributed repositories. This vision is described in the Fedora
Specification and placeholders exist in the current software to build this
functionality. Repository federation is important for several reasons. First,
federation is a natural requirement for delivering integrated access to digital
resources that are owned or managed by several institutions. Second, federation
makes it easy for digital library and other applications to interface with multiple
information sources in a seamless manner. Third, federation can help with
scalability or performance issues for very large repositories. Specifically, a local
federation of repositories can be established as a means of distributing load and
object storage among several running repository instances, so that together these
separate instances can be treated as one ‘virtual repository.’
• XML: Fedora objects’ XML and the schema upon which they are
based are preserved at ingest, during storage, and at export.
• Content Versioning: Fedora repositories offer implementers the
option of versioning data objects. When a data object is versioned, the
object’s audit trail is updated to reflect the changes made to the object,
when the change was made and by whom and a new version of the
modified data is added to the object’s XML. This new datastream
cascades from the original and is numbered to show the relationship
between original and version. This allows users to retrieve older
versions of a data object by performing a date/time search and
retrieval, or the most current version if the date/time criteria are not
included in the search.
• Object to Object Relationships: Relationships between objects can
be stored via the metadata included in the objects. This allows
implementers to link together related objects into parent/child
relationships.
7 of 15
• Event History: Every object in a Fedora repository contains an audit
trail, which preserves a record of every change made to the object.
Content Repurposing
Application of different stylesheets to the data and metadata of a Fedora object
allows multiple views of the object’s content and metadata; for example, one view
for a domain scholar and another for a K-12 audience. Additionally, content
referenced in a Fedora data object can be dynamically transformed as it is called
by a user by use of custom disseminators. Because of this inherent strength and
flexibility, new views and data transformations are simple to add over time as the
implementer’s and user’s requirements change.
Web Services
Fedora is exposed via web services and can interact with other web services. The
interfaces and XML transmission are defined in WSDL.
8 of 15
Section 3: Digital Object Model
Data Object
A data object in a Fedora repository describes content (data and metadata) and a set of
associated behaviors or services that can be applied to that content. Data objects comprise
the bulk of a repository.
The diagram below shows the architectural view of a Fedora digital object.
P e r s is t e n t ID ( P ID ) D ig ita l o b je c t id e n tifie r
F O X M L M e ta d a ta D e s c r ip tiv e: k e y m e t a d a t a
O b je c t P r o p e r t ie s
n e c e s s a ry to m a n a g e a n d
d is c o v e r th e o b je c t a n d its
R e la t io n s h ip M e t a d a t a r e la tio n s h ip s to o th e r o b je c ts
D a t a s t r e a m ( it e m )
D a t a s t r e a m ( it e m ) I t e m P e r s p e c t i v e: S e t o f c o n t e n t
o r m e ta d a ta ite m s
D a t a s t r e a m ( it e m )
D e f a u lt D is s e m in a t o r
S e r v i c e P e r s p e c t i v e:
C u s t o m D is s e m in a t o r
m e th o d s fo r d is s e m in a tin g
C u s t o m D is s e m in a t o r “ v ie w s ” o f c o n te n t
Figure 1: Fedora Digital Object Architectural View
1. Digital Object Identifier: A unique, persistent identifier for the digital object.
2. Descriptive Perspective: The FOXML metadata for a digital object is the
metadata that must be recorded with every digital object to facilitate the
management of that object. FOXML metadata is distinct from other metadata that
is stored in the digital object as content. This type of metadata is the metadata
that is required by the Fedora repository architecture. All other metadata (e.g.,
9 of 15
descriptive metadata, technical metadata) is considered optional from the
repository standpoint, and is treated as a datastream in a digital object.
Object Properties describe the object’s type, its state, the content model to which
it subscribes, the created and last modified dates of the object, and its label.
Datastreams
10 of 15
object creation containing the Dublin Core metadata required for initial
indexing.
The diagram below shows an example digital object modeled on the UVa MrSid Image
digital object content model.
So a disseminator is a set of service subscriptions between the data object and a pair of
behavior objects. The data object defines requirements for presentation of the data
referenced within it by explicitly referencing the behavior objects.
Default Disseminator
Each Fedora data object is created with a default disseminator. The default disseminator
allows repository administrator to get information about the object. For example, get the
object’s profile, list items/get item, list methods, get OAI_DC.
P e r s is t e n t ID ( P ID )
F O X M L M e ta d a ta
O b je c t P r o p e r t ie s
R e la t io n s h ip M e t a d a t a
Im a g e ( m r s id )
D C ( x m l)
T h u m b n a il ( jp g )
D e f a u lt V ie w s
Im a g e V ie w s
M e t a d a t a V ie w s
11 of 15
12 of 15
Custom Disseminator
Repository managers can optionally add as many custom disseminators to a Fedora object
as they desire. In this way, each Fedora data object can be designed for maximum
usability. In the example above, the image views disseminator allows users to retrieve the
content of the object in the views designed by the repository administrators. In this
example, a user could retrieve a thumbnail/preview sized image, a pre-defined medium-
sized image, or a pre-defined high-resolution image. The latter of these are both generated
from a MrSid encoded image, rather than retrieving a static version. The metadata views
retrieve the metadata from the object. Users may retrieve the Dublic Core metadata or
metadata from an XML type datastream, or both.
13 of 15
Section 4: Fedora Repository Server
Thus far, we have talked about the component parts of a Fedora repository, but the larger
picture is also important. A repository is made up of digital objects, but in what context
do those objects exist and how is it that users interact with them?
This diagram shows in very general terms the structure of the entire repository. Users
interact with the content of the repository by means of client applications, web browsers,
batch programs, or server applications. These applications access the repository’s data by
means of the four APIs by which Fedora is exposed: management, access, search, which
are exposed via HTTP or SOAP, and the OAI provider API, which is exposed via HTTP.
W eb B a tch
C lie n t A pp S e rve r A p p
B row ser P ro gra m
H T T P SO A P H T T P SO A P H T T P SO A P H T T P
U s e r A u t h e n t ic a t io n
M a n a g e m e n t, S e c u r ity , a n d A c c e s s S u b s y s t e m s
S to ra g e S u bsy ste m
14 of 15
Client and Web Service Interactions
This diagram gives another view of the larger context of a Fedora repository. Users
perform common tasks such as ingesting objects, searching the repository, or accessing
objects via client applications or a web browser. These client applications mediate this
interaction with the repository via web services on the frontend, and on the backend, the
repository interacts with web services to perform any data transformations that are
requested by users. The transformed data is then passed back to the user via the frontend
web services.
It is important to note that users only interact with the repository via the APIs, even
though it may sometimes seem that they are interacting directly with an object, they are
not.
F ro n te n d B ackend
P ersisten t I D (P I D ) P e r s i s t en t I D ( P I D ) P er s i st e n t I D ( P I D ) P e r si s t e n t I D ( P I D )
F O X M L M etad ata F O X M L M et a d a t a F O X M L M e t ad at a F O X M L M e t ad at a
O b j ec t P r o p er t i es O b j ec t P r o p e r t i e s O b j ec t P r o p e r t i es O b j ec t P r o p e r t i e s
R e l a t i o n sh i p M e t a d a t a R e l at i o n s h i p M et ad a t a R el a t i o n s h i p M et a d at a R el a t i o n s h i p M et a d a t a
D a t a st r e a m ( i t e m ) D at as t r ea m ( i t e m ) D at a st r e am ( i t em ) D at a s t r e am ( i t em )
Content
D a t a st r e a m ( i t e m ) D at as t r ea m ( i t e m ) D at a st r e am ( i t em ) D at a s t r e am ( i t em )
a p p lic a t io n
a p p lic a t io n
D a t a st r e a m ( i t e m ) D at as t r ea m ( i t e m ) D at a st r e am ( i t em ) D at a s t r e am ( i t em )
S erv ice
D ef a u l t D i s se m i n at o r D e f au l t D i s se m i n a t o r D e f au l t D i s sem i n at o r D ef a u l t D i ss e m i n at o r
c li e n t
c li e n t
user
C u st o m D i sse m i n at o r
C u st o m D i sse m i n at o r
C u s t o m D i s se m i n at o r
C u s t o m D i s se m i n at o r
C u s t o m D i s s em i n a t o r
C u s t o m D i s s em i n a t o r
C u s t o m D i ss e m i n at o r
C u s t o m D i ss e m i n at o r Transform
Fedora
W e b S e r v ic e Service
W e b S e r v ic e
Repository Content
S erv ice
b ro w se r
user
System Transform
w eb
P ersisten t I D (P I D ) P e r s i st e n t I D ( P I D ) P er s i st en t I D ( P I D ) P er s i st e n t I D ( P I D )
Service
F O X M L M etad ata F O X M L M etad ata F O X M L M e t ad at a F O X M L M et a d a t a
O b j ec t P r o p er t i es O b j e ct P r o p er t i es O b j e ct P r o p e r t i e s O b j ect P r o p e r t i es
R e l a t i o n sh i p M e t a d a t a R e l a t i o n s h i p M et ad at a R el at i o n s h i p M et a d a t a R el at i o n s h i p M et a d a t a
D a t a st r e a m ( i t e m ) D a t a st r e am ( i t em ) D at a s t r ea m ( i t e m ) D at a st r e am ( i t em )
D a t a st r e a m ( i t e m ) D a t a st r e am ( i t em ) D at a s t r ea m ( i t e m ) D at a st r e am ( i t em )
D a t a st r e a m ( i t e m ) D a t a st r e am ( i t em ) D at a s t r ea m ( i t e m ) D at a st r e am ( i t em )
D ef a u l t D i s se m i n at o r D e f a u l t D i ss e m i n a t o r D e f a u l t D i s s em i n a t o r D e f a u l t D i s sem i n at o r
C u st o m D i sse m i n at o r C u s t o m D i ss em i n a t o r C u s t o m D i ss e m i n at o r C u s t o m D i s s em i n a t o r
C u st o m D i sse m i n at o r C u s t o m D i ss em i n a t o r C u s t o m D i ss e m i n at o r C u s t o m D i s s em i n a t o r
in g e s t o b je c ts
s e a rc h re p o s ito ry
a c c e s s o b je c ts
15 of 15