0% found this document useful (0 votes)
46 views6 pages

Jena: Implementing The RDF Model and Syntax Specification: Brian Mcbride

Uploaded by

saurabhbisht
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views6 pages

Jena: Implementing The RDF Model and Syntax Specification: Brian Mcbride

Uploaded by

saurabhbisht
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Jena: Implementing the RDF Model and Syntax

Specification
Brian McBride
Hewlett Packard Laboratories
Filton Road, Stoke Gifford
Bristol, UK
+44 117 312 9560
brian_mcbride@hp.com

ABSTRACT An implementer setting out to develop an implementation of an


RDF tool must have an interpretation of the specification. This
Some aspects of W3C's RDF Model and Syntax Specification
require careful reading and interpretation to produce a paper describes the interpretation developed for Jena [5], an RDF
conformant implementation. Issues have arisen around API in Java. The guiding principle for this interpretation was to
anonymous resources, reification and RDF Graphs. These and implement, as far as possible, the specification as it is, without
other issues are identified, discussed and an interpretation of embellishment. It is documented here in the hope it will prove
each is proposed. Jena, an RDF API in Java based on this helpful to other developers.
interpretation, is described. Only issues concerning the RDF data model are discussed here;
issues of RDF XML syntax are not considered.
Keywords
RDF, XML 2. INTERPRETING THE RDF MODEL
AND SYNTAX SPECIFICATION
1. INTRODUCTION The RDF Model and Syntax specification defines an abstract
Since the W3C's Resource Description Framework (RDF) Model data model. The model is abstract because it is defined in terms
and Syntax specification [1] completed its path to W3C of abstract mathematical structures such as triples and sets. It is
recommendation several implementations have been developed. a data model only, because no formal semantics is given. It is
These differ in some aspects of their interpretation of the suggested that RDF statements represent facts, but nothing
specification. There has been much discussion of these issues on formal is defined. Others [6] [7] have offered formal
the RDF Interest Mailing List [2] [3] [4], which so far, has not interpretations defined in terms of first order predicate logic.
produced resolution. Inter-mixed with those discussions, have The Model and Syntax specification also defines how to
been others about changes and extensions to the specification. represent data conforming to this data model in XML. The XML
serialization is a representation of the abstract model. Other
All this has caused confusion and uncertainty that is inhibiting representations are also possible. For example, an RDF graph
the acceptance and deployment of RDF. Tool builders wish to may be represented by a data structure in computer memory or
build tools that are correct and conformant. This they cannot do, tables in a relational database. This structure is represented in
because it is not clear what it means to be correct and figure 1.
conformant. Similarly producers and consumers of RDF wish to
produce RDF whose interpretation is well defined. Uncertainty
of interpretation inhibits them from doing so.
One reason for the lack of resolution is that issues are discussed
individually. The issues themselves however, are interlinked. It
is hard for a community discussing, say the subtleties of
reification to agree when the have fundamentally different views
on the nature of resources and their identification.

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission by the authors.
Semantic Web Workshop 2001 Hong Kong, China Figure 1
Copyright by the authors. It is important, as will be seen below, to distinguish between the
abstract data model and its representations. The specifications
define constraints which apply to the abstract data model. The constraint that applies to the abstract model. A particular
abstract model is infinite; representations of the abstract data representation of a resource need not include the URI.
model must be finite and incomplete. The Model and Syntax An alternative interpretation, that all representations of RDF
specification defines no formal semantics for RDF. must have a URI for each resource is inconsistent with the rest of
2.1 Resources and URI’s the Model and Syntax specification, seems draconian and is not
enforceable.
RFC 2396 [8] defines a resource to be a conceptual mapping:
The resource is the conceptual mapping to an entity or
Anonymous resources can be thought of as existentially qualified
set of entities, not necessarily the entity which
variables. The graph in figure 2 shows an anonymous resource
corresponds to that mapping at any particular instance
with a number of properties. This graph can be thought of as
in time. Thus, a resource can remain constant even when
stating that Ora created a specification, whose URI is not
its content ---the entities to which it currently
represented, called "RDF M&S".
corresponds---changes over time, provided that the
conceptual mapping is not changed in the process.

For example, a resource, identified by a specific URI, may


represent the W3C logo. When a browser uses HTTP to request
a representation of that logo, the particular representation it
receives may depend on a number of factors such as time (the dc:title rdf:type dc:creator
logo may change over time) and the file format (jpg, gif or png
representation) required. In this case, the URI identifies the
abstract concept of the W3C logo. A particular representation,
say the JPEG representation, may have its own different URI. M&S Spec w3c:spec Ora Lassila
Can a resource have more than one URI? This is a question not
just for RDF, but for web and internet architecture as a whole,
which, at the time of writing, has not finally been resolved. Figure 2
The RDF Model and Syntax specification, however, takes a Applications creating RDF models are not required to supply a
position on this question. No provision is made in the RDF data URI for all resources. In particular, RDF XML parsers should
model for a resource to have multiple URI's. Provision is made distinguish between resources for which a URI was encoded in
for a resource to have one URI. Other URI's could be associated the serialization and those that were anonymous. Parsers that
with a resource through some property, but the RDF fail to do so, prevent an application from 'round tripping', i.e. an
specifications define no such property. The implication is clear, application is unable to write an RDF graph to a file and recreate
that as far as RDF is concerned, resources have a distinguished the same graph when the XML serialization is read back in.
URI.
It is unfortunate that the XML serialization defined for RDF does
Web principles [9], however, dictate that there can be no central not permit the representation of all possible graphs containing
authority to allocate URI's to conceptual mappings. There is no anonymous resources.
way to stop many individuals independently assigning URIs to
represent, say, the trees in a park. Each such URI defines a new 2.3 Properties
resource. Thus there may be many resources that represent the Properties are resources that are identified by URI's. In an XML
same tree. The RDF specifications do not define a mechanism serialization of RDF, properties are often represented by XML
for stating the equivalence of resources, i.e. that multiple QNames of the form nsprefix:LocalPart, in which case the URI of
resources represent the same conceptual mapping. This is left to the property is the concatenation of the URI associated with the
higher layers of the stack such as DAML-ONT [10]. nsprefix and the LocalPart of the QName.
2.2 Anonymous Resources Care is needed interpreting what the Model and Syntax
specification says about the relationship between properties and
The Model and Syntax specification is unclear about anonymous
resources. In section 2.1 it states: namespaces. Section 2.2.3. states:
Resources are always named by URIs plus optional In RDF, each predicate used in a statement must be

anchor ids identified with exactly one namespace, or schema.

However, in figure 2 of the specification and its preceding text, it In section 6 it states:
introduces the concept of an anonymous resource, that is a It is recommended that property names always be
resource that does not have a URI, and subsequently refers to qualified with a namespace prefix to unambiguously
such resources in three places in section 6. connect the property definition with the corresponding
The repeated references to anonymous resources indicate clearly schema.
the intention of the authors that an RDF graph should be able to
represent a resource without representing its URI. This can be
reconciled with the statement quoted above if it is interpreted as
meaning that whilst a resource must always have a URI, that is a
Two issues arise with these statements: resource itself. Representations of statements typically use URI's
as part of the representation of a resource, but it is important to
· the second statement seems to undermine the first, in understand that the representations are not the same thing as the
that it merely recommends that properties be connected actual statements and resources.
with a namespace, whilst the former requires it. Whilst section 5 of the Model and Syntax specification, the
formal model for RDF, does not explicitly say so, the set of
· the first statement suggests that it is the use of a resources and the set of literals are disjoint. If literal is the
property that is associated with a namespace whilst the literal "http://foo" and resource is the resource whose URI is
latter suggests it is the property that is associated with http://foo, then the statement (predicate, subject, literal) is not
a namespace. the same statement as (predicate, subject, resource).
The first issue is resolved by taking the first statement as Implementations therefore, cannot use just the URI or the literal
definitive. The second statement is explained by the fact that it string to represent a resource or literal; they must have some way
is not possible for an RDF processor, given the URI of a of distinguishing the two.
property, to always determine unambiguously the namespace
with which it is associated. Given a property with URI 2.6 Reified Statements
http://foo/bar, it is not possible algorithmically to determine RDF statements are not resources. Through a mechanism known
whether the namespace is http://foo/ or http://foo/b or as reification, there are resources that represent RDF statements.
http://foo/ba. All are possible. The usual algorithm employed by The Model and Syntax specification (in section 5, rule 9) defines
processors is to search back from the end of URI for the first the reification of an RDF statement to be a resource r which
character that cannot appear in the LocalPart of an XML represents the statement along with four statements, one which
QName. This, however, is not guaranteed to be correct. The defines the type of the resource to be an RDF statement, and
second statement therefore is an admonition to the creators of three others which describe the subject, the predicate and the
XML representations of RDF to remove this ambiguity by object of the statement. The reification of a statement is thus a
specifying the namespace explicitly. small RDF graph containing these four statements.
RDF XML parsers and other RDF processors should retain this Section 5 goes on to state:
information, representing properties not just by their URI, but by The resource r in the definition above is called the reified
the pair consisting of their namespace URI and LocalPart. This statement. When a resource represents a reified
will enable them to acquire and process the RDF Schema [11] statement; that is, it has an RDF:type property with a
that describes each property and to write correctly an RDF graph value of RDF:Statement, then that resource must have
as XML. exactly one RDF:subject property, one RDF:object
The second issue is that the first statement quoted above, allows property, and one RDF:predicate property.
an interpretation in which the property identified by The language here is rather loose. The phrase "When a resource
http://foo/bar could be associated with the namespace http://foo/ represents a reified statement" should be read as "When a
in one statement and the namespace http:/foo/b in another. This resource is a reified statement" to be consistent with the first
would imply that a property, identified by a particular URI could sentence of the paragraph.
have multiple interpretations. RDF Schema would be
undermined by this interpretation, as it would not be possible, Thus a reified statement is the single resource that represents a
when asserting say a domain or range constraint on a property, to statement.
specify to which interpretation of the property, the constraint The paragraph quoted above applies to the RDF abstract data
applied. This interpretation is therefore rejected. model. In the abstract data model, every reified statement does
have all four properties. A representation may represent only
2.4 Literals part of the abstract data model, and so need not include all the
Though the Model and Syntax specification is clear, the nature of properties.
literals is commonly misunderstood. A literal is not just a string
of characters, but also optionally encodes a language identifier. As with trees in the park, or any other object or concept, there is
This language identifier is part of the value of the literal and nothing to preclude statements being given multiple URIs. Thus,
must be represented by implementations. whilst there can only be one statement with a given subject,
predicate and object, there may be many reified statements
2.5 Statements representing that statement. Since each such reified statement
An RDF statement is defined to be a triple consisting of a represents the same statement, the simplest semantics for RDF
predicate, a subject and an object. A triple is a mathematical implies that any property of one is a property of them all.
structure that is uniquely defined by its three components. Thus,
there can be only one statement with a given subject, predicate 2.7 Statements, Statings and Occurrences
and object. There can be many representations of a single triple, An RDF statement is defined to be a triple of the form
e.g. in multiple XML files, databases or computer memories, but (predicate, subject, object). The need of some applications to
those are representations of a triple, not the triple itself. represent occurrences of statements has been identified. For
example, an application may wish to represent the fact that a
The subject of a statement is defined to be a resource. The particular statement occurred in a particular document at a
subject of a statement is not the URI of a resource, it is the
particular time. Occurrences of statements are often called say and implementers are divided on this question. An informal
'statings'. poll of implementers had a majority implementing a graph as a
The term "occurrences" is preferred to "statings". Has a set of statements.
statement that occurs in a collection of fallacies been stated? It The suggested interpretation of an RDF statement is as a fact.
certainly occurs in that collection, but it is not clear that it has There is little point in including the same fact in a collection
been stated. more than once. When graphs are merged, it is wasteful if
The Model and Syntax specification states that a reified statements that occur in more than one of the source graphs occur
statement represents a statement. For example, in section 4.1 more than once in the resulting graphs. For this interpretation, a
para 6: graph is a set of statements.

A new resource with the above four properties


3. THE JENA RDF API
represents the original statement...
Jena is an API in the Java programming language, for the
creation and manipulation of RDF graphs. It implements the
Despite this, there has been a suggestion in the RDF community interpretation of the RDF specifications described in section 2
that reified statements represent occurrences of statements. This above.
can only be consistent with the Model and Syntax specification if
Jena was developed to satisfy two goals:
a resource can represent both a statement and an occurrence of a
statement. For any such resource, it is easy to construct a
· to provide an API that was easier for the programmer to
contradiction.
use than alternative implementations
Consider a statement S that occurs in two documents http://foo
and http://bar. Let RS be a reified statement representing both S · to be conformant to the RDF specifications
and its occurrence in http://foo. Then the statement (occursIn,
RS, http://foo) is true. Is the statement (occursIn, RS, http://bar) An open source implementation of the Jena API is available
true? It is true of the statement S, but it is not true of the from:
occurrence of S in http://foo. So this statement is both true and http://www-uk.hpl.hp.com/people/bwm/rdf/jena
false of RS, a contradiction.
Thus reified statements represent statements, not occurrences of
3.1 API Features
The Jena API is designed specifically for the Java programming
statements or statings.
language. API's can be programming language neutral;
2.8 RDF Graphs sometimes, like the document Object Model (DOM) API [12],
defined using an interface definition language (IDL). A language
The Model and Syntax specification refers to the concept of an
RDF graph, i.e. a specific collection of RDF statements, but binding can then be defined for any given programming
omits this concept from the formal model. Implementations deal language. This approach prohibits an API from exploiting the
with specific collections of statements and generally implement features of a specific programming language. The alternative
the concept of a graph, though it is frequently called a model. approach, as exemplified by JDom [13], is to define an API that
takes advantage of the features of a specific programming
There is a need to name with a URI, a specific collection of RDF language and environment. Jena adopts the latter approach.
statements. For example, RDF Schema is represented by a
specific collection of RDF statements. Accessing the URI of Previous RDF API's had adopted either a statement centric or a
RDF Schema will return an XML representation of that resource centric approach. In the statement centric approach, as
collection of statements. Implementations must manipulate implemented by SiRPAC [14], method calls are defined in terms
specific named collections of statements. There is also a need to of statements, which reflects the underlying implementation of
make statements about specific collections of statements, e.g. to an RDF graph as a collection of triples. Applications, however,
state that the title of the collection of statements representing are often more conveniently written in terms of resources and
RDF Schema is "RDF Schema". their properties, as in DATAX [15].
Since the RDF Model and Syntax Specification does not provide Jena integrates both programming styles into a single API.
any formal language for graphs, some is suggested here. Applications can be written using a statement centric approach, a
resource centric approach or a mixture of both. For example1:
A collection of RDF statements is known as an RDF graph. So
Resource res = model.createResource();
that RDF may be used to describe an RDF graph, a graph may be model.addStatement(res, RDF.type, RDFS.Class);
represented by a resource. The reification of an RDF graph G model.addStatement(res, RDFS.label, "example");
consists of a resource g of type rdf:Bag together with a set of model.addStatement(res, RDFS.comment, "…”);
statements S of the form (rdf:_n, g, RSn) for n = 1 to the number may also be written as:
of statements in G. For each statement s in G, there is an
element of S with RSn = a reified statement representing s. g is model.createResource()
known as a reified graph, or alternatively. It is permitted to .addProperty(RDF.type, RDFS.Class)
represent a partial reification of a graph or model.
Is a graph a set of statements, i.e. each statement may appear
only once in a graph, or is it a bag? The specification does not 1
Jena uses the term ‘model’ for an RDF graph.
.addProperty(RDFS.label, "example") The Jena triple store uses a statement object to represent the
.addProperty(RDFS.comment, "…"); reification of a statement. The presence of a statement object, as
The RDF data model supports only string values in literals, either the subject or object of a statement in a graph is equivalent
whereas applications often need to represent integers, floats or to representing the four triples of the reification of the statement
application defined types. Jena provides convenience methods explicitly in the graph. This permits efficient representation of
for the automatic conversion of both Java built in and application reification.
defined types to and from property values, e.g.:
3.3 Jena API Implementation Architecture
r.addProperty(RDF.value, 5.5); The structure of the Jena implementation is shown in figure 3.
r.addProperty(FOO.date, myDate);
Double d =
r.getProperty(RDF.value).getDouble();
Resources may be sub-classed to provide behaviour, a feature XML XML
that is used to provide specific support for RDF containers. parser writer
Subclasses of Resource implement general container behaviour
and specific behaviour for BAGs, SEQs and ALTs. For example:
bag.remove("value"); Jena API
seq.add(index, "value");
query
The first call will delete the appropriate value from the bag. The common classes model engine
second will insert a new value into a sequence, and again
renumber other members as needed.
A flexible query API is provided. All the query methods take a
selector object as an argument. By defining new selector classes,
memory
store
SQL
store
prolog
store

new query languages can be added without disturbing the core
API. A query on a graph may return either a new graph which is
a sub-graph of the original, an iterator which will return all the Figure 3
statements matching the query or a table of values (represented
as a JDBC ResultSet) matching variables in the query. The implementation has been designed to permit the easy
integration of alternative processing modules such as parsers,
3.2 Implementing the Interpretation serializers, stores and query processors.
The Jena API implements anonymous resources, i.e. resources The API itself consists of a collection of Java interfaces
need not have a known URI. The implementation tracks representing resources, properties, literals, containers, statements
internally, the identity of resources, so it is able to determine and models. A common set of classes implement these
when two anonymous resources are in fact the same resource. interfaces, though these may be sub-classed or replaced to
RDFFilter [16], the RDF XML parser that is integrated into Jena, optimize particular implementations. The model class is a
does not create URI's (so called genid's) for anonymous generic implementation of an RDF graph. A standard interface
resources as it parses. connects model to classes that implement storage and basic
Properties in Jena have an associated namespace. Property querying of RDF statements. A standard interface also enables
objects can be queried to determine that namespace. When a integration of specialized query processors.
property object is constructed, either a namespace must be
provided as an argument, or the implementation will attempt to 4. CONCLUSIONS
determine the name space URI by splitting the property URI at This paper has discussed a number of issues in the interpretation
the last character that is illegal in the LocalName part of an XML of the RDF Model and Syntax specification that implementers of
QName. The parser integrated into Jena retains the structure of RDF tools must address. A resolution of those issues, consistent
the QName from the XML serialization and constructs property with the specification as written has been described. Jena, a
objects with the correct name space. Java API for RDF, and its implementation is also described.
Literals in Jena have an associated language encoding. Literals Acknowledgements
are not equal unless their language encodings are equal. The dedicated community of the RDF Interest Group have greatly
RDF graphs are implemented as sets of statements. Adding a helped my understanding of the RDF specifications, as have
statement that is already present to a graph will have no effect. many discussions with my friend and colleague, Stuart
Williams. The motivation to develop the Jena API came
Statements are implemented as a sub-class of resource. Whilst originally from Ian Dickinson. The flexible query interface was
in the formal model statements are not resources, it is convenient suggested by Gabe Beged-Dov. Returning JDBC ResultSet's
in an API to be able to represent use a statement to represent its from queries was suggested by Dan Brickley. I am greatly
reified statement. For example, to add to a model the fact that indebted to Dave Reynolds for his support and encouragement.
the statement (RDF:value, res, "value") occurs in http://foo:
m.createStatement(res, RDF.value, "…")
.addProperty(FOO.occursIn, "http://foo");
References 10. Lynn Stein, Dan Connolly, Deborah McGuinness (eds),
1. O. Lasilla, R. Swick (eds): Resource Description DAML-ONT Initial Release,
Framework (RDF) Model and Syntax Specification, http://www.daml.org/2000/10/daml-ont.html
http://www.w3.org/TR/REC-rdf-syntax/.
11. Dan Brickley, R. V. Guha (eds), Resource Description
2. Discussion Archive for the RDF Interest Group, Framework (RDF) Schema Specification 1.0,
http://lists.w3.org/Archives/Public/www-rdf-interest/ http://www.w3.org/TR/2000/CR-rdf-schema-20000327/

3. RDF Interest Group - Issue Tracking, 12. Arnaud le Hors et al (eds), Document Object Model
http://www.w3.org/2000/03/rdf-tracking/ (DOM) Level 2 Core Specification,
http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-
4. B. McBride, Issues Raised in the RDF Interest Group 20001113/
Mailing List, http://www-
uk.hpl.hp.com/people/bwm/rdf/issues.htm 13. JDom, http://www.jdom.org/

5. B. McBride, Jena, An RDF API in Java, http://www- 14. Janne Saaarela, Art Barstow, Sergey Melnick, Dan
uk.hpl.hp.com/people/bwm/rdf/jena Brickley, SiRPAC - A Simple RDF Parser and
Compiler,
6. Wolfram Conen, Reinhold Klapsing: A Logical http://www.w3.org/RDF/Implementations/SiRPAC/
Interpretation of RDF, http://nestroy.wi-inf.uni-
essen.de/rdf/logical_interpretation/index.html 15. David Megginson, DATAX: Data Exchange in XML,
http://www.megginson.com/DATAX/index.html
7. Richard Fikes, Deborah L McGuinness, An Axiomatic
Semantics for RDF, RDF Schema, and DAML-ONT, 16. David Megginson, RDF Filter,
http://www.ksl.stanford.edu/people/dlm/daml- http://www.megginson.com/Software/
semantics/

8. T. Berners-Lee, R. Fielding, U. C. Irvine, L. Masinter,


RFC 2396: Uniform Resource Identifiers (URI):
Generic Syntax,
http://www.ietf.org/rfc/rfc2396.txt?number=2396

9. Tim Berners-Lee, Web Architecture from 50,000 feet,


http://www.w3.org/DesignIssues/Architecture.html

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy