0% found this document useful (0 votes)

15 views18 pages

Eis 05 J

The paper discusses the role of ontologies in data integration, focusing on central and peer-to-peer architectures. It presents five case studies that illustrate how ontologies can address issues of metadata representation, global conceptualization, high-level querying, declarative mediation, and mapping support. The authors emphasize the importance of ontologies in overcoming data heterogeneity and facilitating interoperability across diverse data sources.

Uploaded by

6a35swarit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views18 pages

Eis 05 J

Uploaded by

6a35swarit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

The Role of Ontologies in Data Integration

Isabel F. Cruz Huiyong Xiao

ADVIS Lab
Department of Computer Science
University of Illinois at Chicago, USA
{ifc | hxiao}@cs.uic.edu

Abstract
In this paper, we discuss the use of ontologies for data integra-
tion. We consider two different settings depending on the system
architecture: central and peer-to-peer data integration. Within those
settings, we discuss five different cases studies that illustrate the use of
ontologies in metadata representation, in global conceptualization, in
high-level querying, in declarative mediation, and in mapping support.
Each case study is described in detail and accompanied by examples.

1 Introduction
1.1 Data Integration
Data integration provides the ability to manipulate data transparently across
multiple data sources. It is relevant to a number of applications including
enterprise information integration, medical information management, geo-
graphical information systems, and E-Commerce applications. Based on the
architecture, there are two diﬀerent kinds of systems: central data integra-
tion systems [1, 3, 7, 10, 18, 22] and peer-to-peer data integration systems
[2, 4, 5, 11, 16, 19]. A central data integration system usually has a global
schema, which provides the user with a uniform interface to access informa-
tion stored in the data sources. In contrast, in a peer-to-peer data integration
system, there are no global points of control on the data sources (or peers).

1
Instead, any peer can accept user queries for the information distributed in
the whole system.
The two most important approaches for building a data integration sys-
tem are Global-as-View (GaV) and Local-as-View (LaV) [22, 17]. In the
GaV approach, every entity in the global schema is associated with a view
over the source local schema. Therefore querying strategies are simple, but
the evolution of the local source schemas is not easily supported. On the
contrary, the LaV approach permits changes to source schemas without af-
fecting the global schema, since the local schemas are deﬁned as views over
the global schema, but query processing can be complex.

1.2 Data Heterogeneity

Data sources can be heterogeneous in syntax, schema, or semantics, thus
making data interoperation a difficult task [6]. Syntactic heterogeneity is
caused by the use of different models or languages. Schematic heterogene-
ity results from structural differences. Semantic heterogeneity is caused by
different meanings or interpretations of data in various contexts. To achieve
data interoperability, the issues posed by data heterogeneity need to be elim-
inated.
The advent of XML has created a syntactic platform for Web data stan-
dardization and exchange. However, schematic data heterogeneity may per-
sist, depending on the XML schemas used (e.g., nesting hierarchies). Like-
wise, semantic heterogeneity may persist even if both syntactic and schematic
heterogeneities do not occur (e.g., naming concepts differently). In this
paper, we are concerned with solving all three kinds of heterogeneities by
bridging syntactic, schematic, and semantic heterogeneities across different
sources.

1.3 Semantic Data Integration using Ontologies

We call semantic data integration the process of using a conceptual repre-
sentation of the data and of their relationships to eliminate possible hetero-
geneities. At the heart of semantic data integration is the concept of ontology,
which is an explicit speciﬁcation of a shared conceptualization [13, 14].
Ontologies were developed by the Artiﬁcial Intelligence community to
facilitate knowledge sharing and reuse [15]. Carrying semantics for particular
domains, ontologies are largely used for representing domain knowledge. A

2
common use of ontologies is data standardization and conceptualization via a
formal machine-understandable ontology language. For example, the global
schema in a data integration system may be an ontology, which then acts as a
mediator for reconciliating the heterogeneities between diﬀerent sources. As
an example of the use of ontologies on peer-to-peer data integration, we can
produce for each source schema a local ontology, which is made accessible
to other peers so as to support semantic mappings between diﬀerent local
ontologies.

1.4 Paper Overview

We review the use of ontologies on heterogeneous data integration systems.
Based on existing approaches to ontology-based data integration and in par-
ticular on our work on central and peer-to-peer data integration, we discuss
how ontologies can be used to facilitate data interoperation and integration.
In Section 2, we present an overview of the concept of ontology and of lan-
guages that are used for representing ontologies. In Section 3, we give a
high-level description of the use of ontologies in data integration. In the
following two sections we discuss ﬁve case studies describing typical uses of
ontologies. Three of those case studies relate to central data integration and
are presented in Section 4. The other two case studies relate to peer-to-peer
data integration and are presented in Section 5. We conclude in Section 6.

2 Ontologies
An ontology is a formal, explicit specification of a shared conceptualization
[13]. In this definition, “conceptualization” refers to an abstract model of
some domain knowledge in the world that identifies that domain’s relevant
concepts. “Shared” indicates that an ontology captures consensual knowl-
edge, that is, it is accepted by a group. “Explicit” means that the type of
concepts in an ontology and the constraints on these concepts are explic-
itly defined. Finally, “formal” means that the ontology should be machine
understandable.
Typical “real-world” ontologies include taxonomies on the Web (e.g., Ya-
hoo! categories), catalogs for on-line shopping (e.g., Amazon.com’s product
catalog), and domain-specific standard terminology (e.g., UMLS1 and Gene
1
http://www.nlm.nih.gov/research/umls/

3
Ontology2 ). As an online lexicon database, WordNet3 is widely used for
discovery of semantic relationships between concepts.
Existing ontology languages include:
XML Schema. Strictly speaking, XML Schema is a semantic markup lan-
guage for Web data. The database-compatible data types supported
by XML Schema provide a way to specify a hierarchical model.4 How-
ever, there are no explicit constructs for deﬁning classes and properties
in XML Schema, therefore ambiguities may arise when mapping an
XML-based data model to a semantic model.

RDF and RDFS. RDF (Resource Description Framework) is a data model

developed by the W3C for describing Web resources.5 RDF allows for
the speciﬁcation of the semantics of data in a standardized, interop-
erable manner. In RDF, a pair of resources (nodes) connected by a
property (edge) forms a statement: (resource, property, value). RDFS
(RDF Schema)6 is a language for describing vocabularies of RDF data
in terms of primitives such as rdfs:Class, rdf:Property, rdfs:domain, and
rdfs:range. In other words, RDFS is used to deﬁne the semantic rela-
tionships between properties and resources.

DAML+OIL. DAML-OIL (DARPA Agent Markup Language-Ontology In-

terface Language) is a full-ﬂedged Web-based ontology language devel-
oped on top of RDFS.7 It features an XML-based syntax and a layered
architecture. DAML-OIL provides modeling primitives commonly used
in frame-based approaches to ontology engineering, and formal seman-
tics and reasoning support found in description logic approaches. It
also integrates XML Schema data types for semantic interoperability
in XML.

OWL. OWL (Web Ontology Language) is a semantic markup language for

publishing and sharing ontologies on the Web. It is developed as a
vocabulary extension of RDF and is derived from DAML+OIL.8
2
http://www.geneontology.org
3
http://www.cogsci.princeton.edu/∼wn/
4
http://www.w3.org/TR/xmlschema-2
5
http://www.w3.org/TR/rdf-primer
6
http://www.w3.org/TR/rdf-schema
7
http://www.w3.org/TR/daml+oil-reference
8
http://www.w3.org/TR/owl-ref

4
Other ontology languages include SHOE (Simple HTML Ontology Exten-
sions),9 XOL (Ontology Exchange Language),10 and UML (Uniﬁed Modeling
Language).11
Among all these ontology languages, we are most interested in XML
Schema and RDFS for their particular roles in data integration and the
“Semantic Web” [12]. More speciﬁcally, XML Schema and RDFS use the
same syntax and can be used for data modeling and ontology representation.
But they have their own particular features in the sense that XML data has
document structure in terms of the nesting elements in an individual XML
document, whereas RDF data has domain structure formed by the concepts
and relationships between concepts [11, 16]. We shall discuss this issue in
detail in Section 4.

3 Ontologies for Data Integration

Ontologies have been extensively used in data integration systems because
they provide an explicit and machine-understandable conceptualization of a
domain. They have been used in one of the three following ways [23]:

Single ontology approach. All source schemas are directly related to a

shared global ontology that provides a uniform interface to the user
[9]. However, this approach requires that all sources have nearly the
same view on a domain, with the same level of granularity. A typical
example of a system using this approach is SIMS [3].

Multiple ontology approach. Each data source is described by its own

(local) ontology separately. Instead of using a common ontology, local
ontologies are mapped to each other. For this purpose, an additional
representation formalism is necessary for deﬁning the inter-ontology
mappings. The OBSERVER system [18] is an example of this approach.

Hybrid ontology approach. A combination of the two preceding approaches

is used. First, a local ontology is built for each source schema, which,
however, is not mapped to other local ontologies, but to a global shared
ontology. New sources can be easily added with no need for modifying
9
http://www.cs.umd.edu/projects/plus/shoe
10
http://www.ai.sri.com/pkarp/xol/
11
http://www.uml.org/

5
existing mappings. Our layered framework [9] is an example of this
approach.

The single and hybrid approaches are appropriate for building central data
integration systems, the former being more appropriate for GaV systems
and the latter for LaV systems. A hybrid peer-to-peer system, where a
global ontology exists in a “super-peer” can also use the hybrid ontology
approach [11]. The multiple ontology approach can be best used to construct
pure peer-to-peer data integration systems, where there are no super-peers.
We identify the following ﬁve uses of ontologies in data integration:

Metadata Representation. Metadata (i.e., source schemas) in each data

source can be explicitly represented by a local ontology, using a single
language.

Global Conceptualization. The global ontology provides a conceptual view

over the schematically-heterogeneous source schemas.

Support for High-level Queries. Given a high-level view of the sources,

as provided by a global ontology, the user can formulate a query without
speciﬁc knowledge of the diﬀerent data sources. The query is then
rewritten into queries over the sources, based on the semantic mappings
between the global and local ontologies.

Declarative Mediation. Query processing in a hybrid peer-to-peer system

uses the global ontology as a declarative mediator for query rewriting
between peers.

Mapping Support. A thesaurus, formalized in terms of an ontology, can

be used for the mapping process to facilitate its automation.

In the following sections we discuss ﬁve case studies, which correspond

to the above ﬁve uses. The three ﬁrst case studies are in the context of
centralized data integration systems (Section 4), while the last two are in the
context of peer-to-peer data integration systems (Section 5). We base our
discussion on our previous work [9, 10, 11, 24, 25].

6
4 Central Data Integration
In this section, we will describe three case studies of ontologies in the context
of central data integration. To make the issues concrete, we use a running
example involving two XML sources and demonstrate how to enable semantic
interoperation between them.

Example 1 Figure 1 displays two XML schemas (S1 and S2 ) and their re-
spective documents (D1 and D2 ), which are represented as trees. The two
XML documents conform to diﬀerent schemas but represent data with similar
semantics. In particular, both schemas represent a many-to-many relation-
ship between two concepts: book and author in S1 (equivalently denoted by
article and writer in S2 ). However, structurally speaking, they are dif-
ferent: S1 (book-centric schema) has the author element nested under the
book element, whereas S2 (author-centric schema) has the article element
nested under the writer element.
Semantically equivalent data elements, such as the authors of publica-
tion “b2 ”, can be reached using diﬀerent XML path patterns, respectively for
schema S1 and schema S2 :

/books/book[@booktitle="b2"]/author/@name

and

/writers/writer[article/@title="b2"]/@fullname

where the contents in the square brackets specify the constraints for the search
patterns.

books writers
writers
books
book book writer writer writer
writer *
book *
author article article article
author author article *
[1..10] "b2" author @fullname "w1" "w2" "w3"
@booktitle "b1" @title
@name
"a1" "a2" "a3" "t1" "t2" "t2"
XML schema S1 XML document D1 XML schema S2 XML document D2
"books.xml" "writers.xml"

Figure 1: Two XML sources with heterogeneous schemas.

7
The example demonstrates that multiple XML schemas (or structures)
can exist for a single conceptual model. In comparison, the schema or on-
tology languages (e.g., RDFS, DAML+OIL, and OWL) that operate on
the conceptual level are structurally ﬂat so that the user can formulate a
query from a conceptual perspective without considering the structure of the
source [1, 7, 23, 10].
Figure 2 shows the architecture of a system that interoperates among

RDF-based
global ontology

mapping table

local RDF local RDF ... local RDF

ontology 1 ontology 2 ontology n

Query translator
Query in data-integration direction

Query in peer-to-peer direction

local XML local XML ... local XML

Ontology Integration
source 1 source 2 source n

Figure 2: An architecture for XML data integration.

schematically heterogeneous data sources [10]. The following three cases

study in detail the principles embodied in this architecture.
Case Study 1 - Metadata Representation
As a ﬁrst step for bridging across the heterogeneities of diverse local
sources, a local ontology must be generated from each source database schema
(e.g., relational, XML, or RDF). A local ontology is a conceptualization of
the elements and relationships between elements in each source schema. To
facilitate interoperation, those ontologies should be expressed using the same
model. Furthermore, for the sake of correct query processing, the structure
of source schemas and the integrity constraints (e.g., relational foreign keys)
expressed on the schemas should be preserved in the local ontology. We
choose RDFS to represent each local ontology.
In our approach, ontology generation from source schemas is accomplished
by model-based schema transformation [9]. In particular, the following ap-
proaches are taken for the relational and XML schema transformation:

8
Relational Schema. Relations are converted into RDF classes and attributes
into RDF properties, which are attached to the class corresponding to
the relation to which the attributes belong. Foreign key dependencies
between two relations are represented by two properties (corresponding
to the two relations) sharing the same value in the target local ontology.

XML Schema. Complex-type elements are converted into RDF classes and
simple-type elements and attributes are converted into RDF properties.
This transformation process encodes the mapping information between
each concept in the local RDF ontology and the path to the corre-
sponding element in the XML source. Nesting relationships between
XML elements are represented using a meta-property rdfx:contains; rdfx
stands for the namespace where contains is deﬁned. This meta-property
enables the RDF representation of the XML nesting structure, by con-
necting two RDF classes representing the two nesting XML elements.

Example 2 Following Example 1, Figure 3 shows the local RDF ontologies

S1 and S2 , which are generated respectively from the XML source schemas
S1 and S2 .

rdfx:contains rdfx:contains rdfx:contains rdfx:contains

Books Book Author Article Writer Writers

rdfs:domain rdfs:domain rdfs:domain rdfs:domain

booktitle name title fullname

Local RDF ontology S1' Local RDF ontology S2'

Figure 3: RDF-based local ontologies generated from XML schemas.

Case Study 2 - Global Conceptualization

To make the integration system accessible through the uniform interface
of the global ontology, semantic mappings are established between the global
ontology and the local ontologies. In our approach, this mapping process is
accomplished during the construction of the global ontology, which is gener-
ated by merging the local ontologies, for example, using a GaV approach.
We consider that each local ontology is merged into the global ontol-
ogy, the target ontology. The process of ontology merging consists of several
operations:

9
• Copying a class and/or its properties: classes and properties that do
not exist in the target ontology are copied into it.

• Class Merging: conceptually equivalent classes in the local and target

ontologies are combined into one class in the target ontology.

• Property Merging: conceptually equivalent properties of a class in the

local and target ontologies are combined into one property in the target
ontology.

• Relationship Merging: conceptually equivalent relationships from one

class c1 to another class c2 in the local and target ontologies are com-
bined into a single relationship in the target ontology (i.e., an RDF
property having c1 as its domain and c2 as its range).

• Class Generalization: related classes in the local and target ontologies

can be generalized into a a superclass. The superclass can be obtained
by searching an existing knowledge domain (e.g., the DAML Ontology
Library 12 ) or reasoning over a thesaurus.

We note that along with the above operations, semantic correspondences

are established. For example, for each element pL in a local ontology, if there
exists a semantically equivalent element pG in the global ontology, the two
elements will be merged and a correspondence between pL and pG will be
generated.

Example 3 Figure 4 shows the global RDF ontologies generated by merging

the local ontologies S1 and S2 of Example 2. Note that the classes (properties)
represented in grey are merged classes (properties), and the classes Book and
Author are also extended, with Publication and Person being their superclasses,
respectively.

Case Study 3 - Support for High-level Queries

Given a conceptual view of available information sources, the user may
pose a query in terms of the global ontology. We say the query is a high-
level query if its formulation does not require awareness of particular source
schemas. The query is then reformulated by a rewriting algorithm into a
12
http://www.daml.org/ontologies/

10
The global RDF ontology correspondence
Publications Person
rdfs:subClassof
rdfx:contains
rdfx:contains rdfx:contains
Books Book Author Authors

rdfs:domain rdfx:contains rdfs:domain

title name

rdfx:contains rdfx:contains rdfx:contains rdfx:contains

Books Book Author Article Writer Writers

rdfs:domain rdfs:domain rdfs:domain rdfs:domain

booktitle name title fullname

Local RDF ontology S1' Local RDF ontology S2'

Figure 4: A conceptual view on local sources.

subquery for each source. The subqueries over sources are subject to the
structure of source schemas, and may be expressed in a diﬀerent language
from that of the high-level query. An inference mechanism may be needed in
the query rewriting, for example, when a concept involved in the query has
super-concepts or sub-concepts.
In addition to handling high-level queries on the global ontology, a bidi-
rectional query translation algorithm is also supported [10] (see Figure 2).
In this case, we can translate a query posed against an XML source to an
equivalent query against any other XML source.

Example 4 Suppose the user asks the query “Find the persons who have
written publication b2 .” This query will be expressed in a RDF query lan-
guage such as RDQL. 13 First, Person has sub-concept Author, which corre-
sponds to two diﬀerent concepts (Author and Writer) in two diﬀerent RDF
local databases. Therefore the initial query will be rewritten as two sub-queries
to those databases. In turn, those queries may be further rewritten using a
XML query language incorporating the path expressions of Example 1 (unless
the data was materialized under the RDF local ontologies). Using the bidi-
rectional query translation mechanism, a query involving the concepts Book
and Author in one source will be translated into a query involving Article and
13
http://www.hpl.hp.com/semweb/rdql.htm

11
Q2 Q1
peer 1 super peer peer n
Q2n'
XML to mapping table mapping table
local RDF
wrapper Global RDF local RDF
XML schema
ontology
schema mapping Q11'
table Q1n'

Q2i' Q1i'
peer i
Query processing in
XML to
data-integration fashion
local RDF
XML wrapper Query processing in
schema hybrid P2P fashion
mapping
table
Mapping process

Figure 5: The hybrid peer-to-peer architecture of PEPSINT.

Writer in another data source, by using the correspondences established by the

global ontology.

5 Peer-to-Peer Data Integration

We consider again the two XML sources of Figure 1. However, this time they
are connected in a peer-to-peer architecture. We consider a hybrid peer-to-
peer architecture with two types of peers: super-peers containing the global
RDF ontology, and peers each containing a data source and an ontology.
Each peer represents an autonomous information system and connects to
a super-peer via semantic mappings. Peer-to-peer data integration systems
or frameworks include LRM (Local Relational Model) [5], Hyperion [2], Pi-
azza [16], PeerDB [19], SEWASIE [4], and PEPSINT [11].
Case Study 4 - Declarative Mediation
The PEPSINT system is a hybrid peer-to-peer system whose architecture
is shown in Figure 5. PEPSINT uses a GaV approach. The global ontology
in a super-peer serves two functions: (1) It provides the user with a uniform
high-level view of the data sources in the distributed peers, and (2) it serves
as a a mediator for query translation from one peer to another. The former
function is similar to the one described in Case Study 3. The latter function
is discussed in detail here.
The user can pose a query against the local XML or RDF data source in
any peer. Locally, the query will be executed on the local source to get a local

12
Q1: List all publications

<publications> Publication <books>

<publication title="b1"> <book booktitle= 2?
<author> a1 </author> rdfs:subClassof rdfs:subClassof <author> a2 </author>
<ISBN> 1234567890 </ISBN> <price> $23.00 </price>
Paper Book
</publication > </book>
</publications> The global RDF ontology in </books>
the super-peer
XML source in Peer p1 Q2
XML source in Peer p2

Figure 6: Mediation for peer-to-peer query rewriting.

answer. Meanwhile, the source query is rewritten into a target query over
every connected peer. The query rewriting utilizes the global ontology, and
the composition of mappings from the original peer to the super peer with
mappings from the super-peer to the target peers. By executing the target
query, each peer returns an answer to the original peer, called the remote
answer. The local and remote answers are integrated and returned to the
user at the site of the originating peer.

Example 5 Consider two XML sources, one in peer p1 and the other in
peer p2 , and a global ontology expressed in RDF in a super-peer. As shown
in Figure 6, the global ontology consists of a class Publication and two sub-
classes Paper and Book. The Publication class is mapped to the publication
element of the XML source in p1 , while the class Book corresponds to book
of the XML source in p2 . An XML query Q1 on p1 involving publication will
be rewritten to a target query Q2 on p2 involving include book. The XML
fragments inside the dashed-line boxes are integrated and returned as answers.

Case Study 5 - Mapping Support

A thesaurus can be used for data integration to facilitate the automation of
the schema mapping process [21, 9]. In particular, it can help discovering the
semantic relationships between concepts in diﬀerent schemas or ontologies.
WordNet is an example of such a thesaurus. It consists of a network of terms
and their semantic relations (e.g., synonym, hypernym, and hyponym). A
term may have multiple senses, each being a synset.

13
A thesaurus-based schema matching approach has been devised for peer-
to-peer data integration [24]; this approach consists of the following three
steps (as illustrated in Figure 7):
1. Path Exploration. Among the semantic relations between synsets
in WordNet, we choose those of synonymy, hyponymy/hypernymy (i.e., more
specific/more general), and related-to, when enumerating the paths between
two arbitrary concepts from different local ontologies in peers. As shown in
Figure 7, six paths are found from Quantity to Number.
2. Path Selection. When multiple paths are found between two con-
cepts, we choose the optimal path, which corresponds to the most likely se-
mantic relation between the two concepts. For this purpose, semantic simi-
larities (i.e., the number above each path in the figure) are calculated for all
the paths. The calculation is implemented by assigning different semantic
relations with different weights (e.g., 1 for synonymy and 0.8 for hypernymy)
and then taking the average of all the weights. The path with highest sim-
ilarity is then chosen as the optimal path. If there is more than one such
path, then the user’s intervention is needed.
3. Semantic Derivation. The last step is to derive the (direct) se-
mantic relationship, Sem, between the two concepts by reasoning on the se-
mantic relations along the optimal path p between them. More specifically,
Sem(p) = Sem(pn ) is computed based on the following recursive algorithm,
where pn = (r1 , r2 , ..., rn ), and ri (1≤i≤n) are the edges (semantic relations)
along p.

Sem(pn ) = Sem(pn−1 ) ∧ Sem(rn ), if n > 1; (1)

Sem(pn ) = ≈, ⊇, ⊆, or ∼, if n = 1. (2)

In the above formulas, the symbols ≈, ⊇, ⊆, and ∼, respectively stand

1
SYN (Synonym): 1
Amount
HYPER (Hypernym): 0.8 SYN SYN
0.9 SY Quantity Amount Number
N N
HYPO (Hyponym): 0.8 SY Total HY
REL (Related-to): 0.5 SYN 0.8
PO 2. Path 3. Semantic
O HYPO Selection Derivation
H YP Definite Quantity
Quantity HYP O 0.8 Number
WordNet HYPO SYN
HY Product Quantity Number
1. Path
HY PE R 0.8 HY
PO
Exploration PE
R Constant PO
HY
0.8
Sum

Figure 7: Thesaurus-based schema mapping process.

14
for the semantic relation of synonymy, hypernymy, hyponymy, and related-to.
The operation ∧ obeys the rules that are shown in Table 1.

∧ ≈ ⊇ ⊆ ∼
≈ ≈ ⊇ ⊆ ∼
⊇ ⊇ ⊇ ? ∼
⊆ ⊆ ? ⊆ ∼
∼ ∼ ∼ ∼ ∼

Table 1: Inference rules for semantic relations: a white cell (at the intersec-
tion of each pair of grey cells) contains the result of the operation on the
relations in the two grey cells, and a question mark indicates that human
intervention is needed.

6 Conclusions
The advent of XML has created a syntactic platform for Web data stan-
dardization and exchange. However, XML has several problems. First of
all, documents expressed in XML share the same syntax, but can be other-
wise heterogeneous, for example by having different structures and naming
conventions. Also, an XML document does not express the semantics of the
elements or of the relationships among elements explicitly. Therefore, it is
not a suitable language for metadata representation.
Ontologies provide an explicit and formal specification of a shared concep-
tualization, and are able to facilitate knowledge sharing and reuse. We use
ontologies expressed in RDFS, a semantically rich schema language, to bridge
across syntactic, schematic, and semantic heterogeneities in data sources.
In this paper, we have presented five different case studies that illustrate
the role that ontologies play in the process of data integration, in centralized
and peer-to-peer architectures.
Related research includes research on ontology generation, ontology map-
ping, and ontology evolution. An ontology can be generated manually using
an authoring tool or (semi-)automatically from various knowledge sources
(e.g., database schemas). Techniques used for ontology mapping, including
ontology alignment and ontology merging [20, 8], overlap to a large extent
with those techniques for schema matching [21]. Finally, ontology evolution,
also called ontology versioning, involves changes on representation, structure,

15
and semantics of ontologies. Each step of such an evolution must ensure the
consistency between the old version and the improved version of the ontol-
ogy, just as if a database schema’s evolution must guarantee the consistency
of the new schema with the data.

References
[1] B. Amann, C. Beeri, I. Fundulaki, and M. Scholl. Ontology-Based Inte-
gration of XML Web Resources. In Proceedings of the 1st International
Semantic Web Conference (ISWC 2002), pages 117–131, 2002.

[2] M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R. J. Miller, and

J. Mylopoulos. The Hyperion Project: From Data Integration to Data
Coordination. SIGMOD Record, 32(3):53–38, 2003.

[3] Y. Arens, C. A. Knoblock, and C. Hsu. Query Processing in the SIMS

Information Mediator. In The AAAI Press, May 1996.

[4] S. Bergamaschi, F. Guerra, and M. Vincini. A Peer-to-Peer Informa-

tion System for the Semantic Web. In Proceedings of the International
Workshop on Agents and Peer-to-Peer Computing (AP2PC 2003), July
2003.

[5] P. A. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos,

L. Seraﬁni, and I. Zaihrayeu. Data Management for Peer-to-Peer Com-
puting: A Vision. In WebDB 2002, pages 89–94, 2002.

[6] Y. A. Bishr. Overcoming the semantic and other barriers to GIS inter-
operability. International Journal of Geographical Information Science,
12(4):229–314, 1998.

[7] S. D. Camillo, C. A. Heuser, and R. dos Santos Mello. Querying Hetero-

geneous XML Sources through a Conceptual Schema. In Proceedings of
the 22nd International Conference on Conceptual Modeling (ER 2003),
pages 186–199, 2003.

[8] I. F. Cruz, W. Sunna, and A. Chaudhry. Semi-Automatic Ontology

Alignment for Geospatial Data Integration. In GIScience 2004, LNCS,
pages 51–66. Springer Verlag, 2004.

16
[9] I. F. Cruz and H. Xiao. Using a Layered Approach for Interoperability
on the Semantic Web. In Proceedings of the 4th International Conference
on Web Information Systems Engineering (WISE 2003), pages 221–232,
Rome, Italy, December 2003.

[10] I. F. Cruz, H. Xiao, and F. Hsu. An Ontology-based Framework for Se-

mantic Interoperability between XML Sources. In Proceedings of the 8th
International Database Engineering & Applications Symposium (IDEAS
2004), pages 217–226, July 2004.

[11] I. F. Cruz, H. Xiao, and F. Hsu. Peer-to-Peer Semantic Integration of

XML and RDF Data Sources. In The 3rd International Workshop on
Agents and Peer-to-Peer Computing (AP2PC 2004), July 2004.

[12] S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. C. A. Klein,

J. Broekstra, M. Erdmann, and I. Horrocks. The Semantic Web: The
Roles of XML and RDF. IEEE Internet Computing, 4(5):63–74, 2000.

[13] T. R. Gruber. A Translation Approach to Portable Ontology Speciﬁca-

tions. Knowledge Acquisition, 5(2):199–220, 1993.

[14] T. R. Gruber and G. R. Olsen. An Ontology for Engineering Mathe-

matics. In Proceedings of the 4th International Conference on Principles
of Knowledge Representation and Reasoning (KR 1994), pages 258–269,
1994.

[15] N. Guarino. Formal Ontology and Information Systems. In Proceedings

of the 1st International Conference on Formal Ontologies in Information
Systems (FOIS 1998), pages 3–15, 1998.

[16] A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: Data Man-

agement Infrastructure for Semantic Web Applications. In Proceedings
of the 12th International World Wide Web Conference (WWW 2003),
pages 556–567, 2003.

[17] M. Lenzerini. Data Integration: A Theoretical Perspective. In Pro-

ceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Database Systems (PODS 2002), pages 233–246, Madison,
Wisconsin, June 2002. ACM.

17
[18] E. Mena, V. Kashyap, A. P. Sheth, and A. Illarramendi. OBSERVER:
An Approach for Query Processing in Global Information Systems based
on Interoperation across Pre-existing Ontologies. In Proceedings of the
1st IFCIS International Conference on Cooperative Information Systems
(CoopIS 1996), pages 14–25, 1996.

[19] W. S. Ng, B. C. Ooi, K. L. Tan, and A. Zhou. PeerDB: A P2P-based

System for Distributed Data Sharing. In Proceedings of the 19th Inter-
national Conference on Data Engineering (ICDE 2003), pages 633–644,
2003.

[20] N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Au-
tomated Ontology Merging and Alignment. In Proceedings of the 17th
National Conference on Artiﬁcial Intelligence and 12th Conference on
Innovative Applications of Artiﬁcial Intelligence (AAAI/IAAI 2000),
pages 450–455, 2000.

[21] E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic

Schema Matching. VLDB J., 10(4):334–350, 2001.

[22] J. D. Ullman. Information Integration Using Logical Views. In Proceed-

ings of the 6th International Conference on Database Theory (ICDT
1997), pages 19–40, 1997.

[23] H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster,

H. Neumann, and S. Hübner. Ontology-Based Integration of Informa-
tion - A Survey of Existing Approaches. In Proceedings of the IJCAI-01
Workshop on Ontologies and Information Sharing, 2001.

[24] H. Xiao and I. F. Cruz. RDF-based Metadata Management in Peer-to-

Peer Systems. In The 2nd IST Workshop on Metadata Management in
Grid and P2P System (MMGPS 2004), 2004.

[25] H. Xiao, I. F. Cruz, and F. Hsu. Semantic Mappings for the Integration
of XML and RDF Sources. In Workshop on Information Integration on
the Web (IIWeb 2004), August 2004.

Semantic Web Unit-IV
No ratings yet
Semantic Web Unit-IV
19 pages
IJCIM - A Generic Ontology Development Framework For Data Integration and Decision Supportpdf
No ratings yet
IJCIM - A Generic Ontology Development Framework For Data Integration and Decision Supportpdf
18 pages
Semantic Web - Introduction and Problem Statement
No ratings yet
Semantic Web - Introduction and Problem Statement
50 pages
Peerj Cs 254
No ratings yet
Peerj Cs 254
30 pages
Bechhofer SKB OWL 2up
No ratings yet
Bechhofer SKB OWL 2up
38 pages
Ontology-Based Mediation With Quality Criteria
No ratings yet
Ontology-Based Mediation With Quality Criteria
12 pages
SWSN Unit-3
No ratings yet
SWSN Unit-3
26 pages
Trev 2011-Q1 Semantic-Web Evain
No ratings yet
Trev 2011-Q1 Semantic-Web Evain
13 pages
Slides-Semantics For The Sea of Data
No ratings yet
Slides-Semantics For The Sea of Data
25 pages
Ontology-Based Computing: Kenneth Baclawski Northeastern University and Jarg
No ratings yet
Ontology-Based Computing: Kenneth Baclawski Northeastern University and Jarg
21 pages
Semantic Web Master Thesis
100% (3)
Semantic Web Master Thesis
7 pages
V 1 GGDGD
No ratings yet
V 1 GGDGD
8 pages
PGIM Setup Admin
No ratings yet
PGIM Setup Admin
324 pages
Ontology-Based Integration of Information - A Survey of Existing Approaches
No ratings yet
Ontology-Based Integration of Information - A Survey of Existing Approaches
10 pages
Ab Initio Session1
100% (1)
Ab Initio Session1
21 pages
Integration Biomedical 4 12
No ratings yet
Integration Biomedical 4 12
5 pages
Ontology Instance Linking: Towards Interlinked Knowledge Graphs
No ratings yet
Ontology Instance Linking: Towards Interlinked Knowledge Graphs
7 pages
9500 MPR Technical Description
100% (1)
9500 MPR Technical Description
90 pages
SNSW Unit-3
No ratings yet
SNSW Unit-3
15 pages
Web Data Integration Summary
No ratings yet
Web Data Integration Summary
10 pages
Semantic Integration: A Survey of Ontology-Based Approaches: Natalya F. Noy
No ratings yet
Semantic Integration: A Survey of Ontology-Based Approaches: Natalya F. Noy
6 pages
UML and The Semantic Web: Stephen Cranefield
No ratings yet
UML and The Semantic Web: Stephen Cranefield
22 pages
SNSW Unit-2
No ratings yet
SNSW Unit-2
19 pages
ODBA
No ratings yet
ODBA
2 pages
Habilitation
No ratings yet
Habilitation
186 pages
SNSW Co3
No ratings yet
SNSW Co3
59 pages
Section#1 - Semantic Web
No ratings yet
Section#1 - Semantic Web
24 pages
#1 Semantic Web Vision and Introduction Part2
No ratings yet
#1 Semantic Web Vision and Introduction Part2
52 pages
2016 Articujo Ijmer Rios
No ratings yet
2016 Articujo Ijmer Rios
9 pages
Ontology
No ratings yet
Ontology
14 pages
Midterm Question Bank Health Informatics
No ratings yet
Midterm Question Bank Health Informatics
42 pages
Semantic Web
No ratings yet
Semantic Web
25 pages
Social Media
No ratings yet
Social Media
6 pages
Lecture 8
No ratings yet
Lecture 8
34 pages
SW Assignment
No ratings yet
SW Assignment
1 page
Semantic Web
No ratings yet
Semantic Web
8 pages
Semantic Web Unit-III
No ratings yet
Semantic Web Unit-III
17 pages
Unit 2 - Social Network Analysis AK
No ratings yet
Unit 2 - Social Network Analysis AK
12 pages
SNSW Unit Iii
No ratings yet
SNSW Unit Iii
15 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
13 pages
Semantic Web 2
No ratings yet
Semantic Web 2
24 pages
Semantic Web 1
No ratings yet
Semantic Web 1
11 pages
06 - Necula
No ratings yet
06 - Necula
13 pages
Semantic Web and Ontologies: 1 What Is An Ontology?
No ratings yet
Semantic Web and Ontologies: 1 What Is An Ontology?
15 pages
Ontology Aagam
No ratings yet
Ontology Aagam
3 pages
Programming The Semantic Web
No ratings yet
Programming The Semantic Web
32 pages
Best WiFi Adapter For Kali Linux - Monitor Mode and Packet Injection - Best Kali Linux Tutorials
No ratings yet
Best WiFi Adapter For Kali Linux - Monitor Mode and Packet Injection - Best Kali Linux Tutorials
14 pages
Semantic - Based Querying Using Ontology in Relational Database of Library Management System
No ratings yet
Semantic - Based Querying Using Ontology in Relational Database of Library Management System
12 pages
Unit Ii - Cs6010 Sna: Unitii Modelling, Aggregating and Knowledge Representation 9
No ratings yet
Unit Ii - Cs6010 Sna: Unitii Modelling, Aggregating and Knowledge Representation 9
17 pages
Presentation Seminar
No ratings yet
Presentation Seminar
16 pages
Semantic Web Introduction
No ratings yet
Semantic Web Introduction
44 pages
Unit 2
No ratings yet
Unit 2
20 pages
4.1 Semantic Data and Web: Unit 4 Ontology
No ratings yet
4.1 Semantic Data and Web: Unit 4 Ontology
12 pages
Ontologies For Semantically Interoperable Systems
No ratings yet
Ontologies For Semantically Interoperable Systems
5 pages
The Extensible Markup Language
No ratings yet
The Extensible Markup Language
6 pages
Iot Based Smart Home Automation System
No ratings yet
Iot Based Smart Home Automation System
7 pages
A RDF-based Data Integration Framework
No ratings yet
A RDF-based Data Integration Framework
6 pages
Introduction To The Semantic Web
No ratings yet
Introduction To The Semantic Web
7 pages
WA0039. Pages Deleted (1) Merged Cropped
No ratings yet
WA0039. Pages Deleted (1) Merged Cropped
38 pages
The Concept of Semantic Web in Library Services: Saikat Goswami, Payel Biswas
No ratings yet
The Concept of Semantic Web in Library Services: Saikat Goswami, Payel Biswas
6 pages
Swoogle: Showcasing The Significance of Semantic Search
No ratings yet
Swoogle: Showcasing The Significance of Semantic Search
8 pages
COS4840 Oncology Assignment1
No ratings yet
COS4840 Oncology Assignment1
5 pages
The Paradigm Shift in Indian Oil and Gas Industry: A Knowledge Paper Prepared For
No ratings yet
The Paradigm Shift in Indian Oil and Gas Industry: A Knowledge Paper Prepared For
36 pages
Semantic Web
No ratings yet
Semantic Web
5 pages
Rslogix 500 Project Report
No ratings yet
Rslogix 500 Project Report
30 pages
Linear Programming - 17 March 23
No ratings yet
Linear Programming - 17 March 23
8 pages
Parul University: R (ABCDEF) and FD's (BC ADEF, A BCDEF, B F, D E)
No ratings yet
Parul University: R (ABCDEF) and FD's (BC ADEF, A BCDEF, B F, D E)
2 pages
Seminor Rough Report
No ratings yet
Seminor Rough Report
18 pages
Project Proposal - Medical Image Analysis
No ratings yet
Project Proposal - Medical Image Analysis
2 pages
New Text Document
No ratings yet
New Text Document
8 pages
Design of Healthbot Using AI For Medical Assistance
No ratings yet
Design of Healthbot Using AI For Medical Assistance
7 pages
BNCSD502C
No ratings yet
BNCSD502C
10 pages
Iatlis Paper
No ratings yet
Iatlis Paper
17 pages
Instant Ebooks Textbook Introducing Delphi ORM: Object Relational Mapping Using TMS Aurelius John Kouraklis Download All Chapters
100% (1)
Instant Ebooks Textbook Introducing Delphi ORM: Object Relational Mapping Using TMS Aurelius John Kouraklis Download All Chapters
55 pages
Sap Commerce Notes
No ratings yet
Sap Commerce Notes
12 pages
Nmap
No ratings yet
Nmap
2 pages
Installing, Configuring, and Using M-Files For Adobe Acrobat Sign
No ratings yet
Installing, Configuring, and Using M-Files For Adobe Acrobat Sign
28 pages
Aditya's Resume
No ratings yet
Aditya's Resume
1 page
Subnetting Assignment #01: Instructions
No ratings yet
Subnetting Assignment #01: Instructions
4 pages
Mark 7 Arterion Injection System Brochure (PP-M-MARK-US-0076-1) - 0
No ratings yet
Mark 7 Arterion Injection System Brochure (PP-M-MARK-US-0076-1) - 0
9 pages
cs2 Report
No ratings yet
cs2 Report
17 pages
Satish - Quiz 1 Desktop Protection and Email - 1653325571341
No ratings yet
Satish - Quiz 1 Desktop Protection and Email - 1653325571341
11 pages
ISTE STDS Self Assessment - Sarah - Duong
No ratings yet
ISTE STDS Self Assessment - Sarah - Duong
4 pages
Subnetting A Network With IP Addresses To Share Among Different Sites
No ratings yet
Subnetting A Network With IP Addresses To Share Among Different Sites
5 pages
Bus 4407 Written Assignment Unit 5
No ratings yet
Bus 4407 Written Assignment Unit 5
3 pages
Question
100% (3)
Question
6 pages
Technology Back Up Plan Essay PDF
No ratings yet
Technology Back Up Plan Essay PDF
2 pages
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Eis 05 J

Uploaded by

Eis 05 J

Uploaded by

The Role of Ontologies in Data Integration

Isabel F. Cruz Huiyong Xiao

1.2 Data Heterogeneity

1.3 Semantic Data Integration using Ontologies

1.4 Paper Overview

RDF and RDFS. RDF (Resource Description Framework) is a data model

DAML+OIL. DAML-OIL (DARPA Agent Markup Language-Ontology In-

OWL. OWL (Web Ontology Language) is a semantic markup language for

3 Ontologies for Data Integration

Single ontology approach. All source schemas are directly related to a

Multiple ontology approach. Each data source is described by its own

Hybrid ontology approach. A combination of the two preceding approaches

Metadata Representation. Metadata (i.e., source schemas) in each data

Global Conceptualization. The global ontology provides a conceptual view

Support for High-level Queries. Given a high-level view of the sources,

Declarative Mediation. Query processing in a hybrid peer-to-peer system

Mapping Support. A thesaurus, formalized in terms of an ontology, can

In the following sections we discuss ﬁve case studies, which correspond

Figure 1: Two XML sources with heterogeneous schemas.

local RDF local RDF ... local RDF

Query in peer-to-peer direction

local XML local XML ... local XML

Figure 2: An architecture for XML data integration.

schematically heterogeneous data sources [10]. The following three cases

Example 2 Following Example 1, Figure 3 shows the local RDF ontologies

rdfx:contains rdfx:contains rdfx:contains rdfx:contains

rdfs:domain rdfs:domain rdfs:domain rdfs:domain

Local RDF ontology S1' Local RDF ontology S2'

Figure 3: RDF-based local ontologies generated from XML schemas.

Case Study 2 - Global Conceptualization

• Class Merging: conceptually equivalent classes in the local and target

• Property Merging: conceptually equivalent properties of a class in the

• Relationship Merging: conceptually equivalent relationships from one

• Class Generalization: related classes in the local and target ontologies

We note that along with the above operations, semantic correspondences

Example 3 Figure 4 shows the global RDF ontologies generated by merging

Case Study 3 - Support for High-level Queries

rdfs:domain rdfx:contains rdfs:domain

rdfx:contains rdfx:contains rdfx:contains rdfx:contains

rdfs:domain rdfs:domain rdfs:domain rdfs:domain

booktitle name title fullname

Local RDF ontology S1' Local RDF ontology S2'

Figure 4: A conceptual view on local sources.

Figure 5: The hybrid peer-to-peer architecture of PEPSINT.

Writer in another data source, by using the correspondences established by the

5 Peer-to-Peer Data Integration

<publications> Publication <books>

Figure 6: Mediation for peer-to-peer query rewriting.

Case Study 5 - Mapping Support

Sem(pn ) = Sem(pn−1 ) ∧ Sem(rn ), if n > 1; (1)

In the above formulas, the symbols ≈, ⊇, ⊆, and ∼, respectively stand

Figure 7: Thesaurus-based schema mapping process.

[2] M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R. J. Miller, and

[3] Y. Arens, C. A. Knoblock, and C. Hsu. Query Processing in the SIMS

[4] S. Bergamaschi, F. Guerra, and M. Vincini. A Peer-to-Peer Informa-

[5] P. A. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos,

[7] S. D. Camillo, C. A. Heuser, and R. dos Santos Mello. Querying Hetero-

[8] I. F. Cruz, W. Sunna, and A. Chaudhry. Semi-Automatic Ontology

[10] I. F. Cruz, H. Xiao, and F. Hsu. An Ontology-based Framework for Se-

[11] I. F. Cruz, H. Xiao, and F. Hsu. Peer-to-Peer Semantic Integration of

[12] S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. C. A. Klein,

[13] T. R. Gruber. A Translation Approach to Portable Ontology Speciﬁca-

[14] T. R. Gruber and G. R. Olsen. An Ontology for Engineering Mathe-

[15] N. Guarino. Formal Ontology and Information Systems. In Proceedings

[16] A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: Data Man-

[17] M. Lenzerini. Data Integration: A Theoretical Perspective. In Pro-

[19] W. S. Ng, B. C. Ooi, K. L. Tan, and A. Zhou. PeerDB: A P2P-based

[21] E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic

[22] J. D. Ullman. Information Integration Using Logical Views. In Proceed-

[23] H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster,

[24] H. Xiao and I. F. Cruz. RDF-based Metadata Management in Peer-to-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.