ST 1
ST 1
Michael Zakharyaschev
Department of Computer Science and Information Systems
– email: zmishaz@gmail.com
– homepage: http://www.dcs.bbk.ac.uk/~michael
– ST Web page: http://www.dcs.bbk.ac.uk/~michael/sw15/sw15.html
Acknowledgements
Semantic Technologies 1 2
Knowledge Graphs are everywhere
Knowledge Graphs Everywhere
Semantic Technologies 1 3
Knowledge Graphs are everywhere
Knowledge Graphs Everywhere
Semantic Technologies 1 4
What is a Knowledge Graph?
The original “Knowledge
Google Graph” (Google, 2012):
Knowledge Graph (2012)
Semantic Technologies 1 4
What is a Knowledge Graph?
The original “Knowledge
Google Graph” (Google, 2012):
Knowledge Graph (2012)
Semantic Technologies 1 6
So, what is a Knowledge Graph?
Graphs are ‘drawings’ with dots and (not necessarily straight) lines or arrows:
x u uy a u ub
1 u
I - uI
2
@
@ Z
@ Z
@ Z
= ZZ
u ~ u3
?
z u u
d u u 4
@
w c J
Semantic Technologies 1 7
Different kinds of graphs
Semantic Technologies 1 8
Example 1: Niche overlap graphs in ecology
Racoon u
H u
H u Owl
HH Hawk HH
H H
HH HH
u HHuSquirrel HHu Crow
H H
Opossum XXX
XXX H
HH
XXX
XXXHH
XXH
Shrew u u X u Woodpecker
XHXH
Mouse
Semantic Technologies 1 9
Example 2: Road networks
x x
Oxford London
x x
Cambridge Brighton
; multigraph
Semantic Technologies 1 10
Example 3: ‘Knowledge Graph’
Semantic Technologies 1 11
Example 3: ‘Knowledge Graph’
...
21st century: information society, digital economy
“I have a dream for the Web [in which computers] become capable
of analyzing all the data on the Web — the content, links,
and transactions between people and computers.
A Semantic Web , which should make this possible, has yet to emerge,
but when it does, the day-to-day mechanisms of trade, bureaucracy
and our daily lives will be handled by machines talking to machines. The
intelligent agents people have touted for ages will finally materialize.”
(Berners-Lee, 1999)
The Semantic Web is a ‘web of data’ that facilitates machines to understand the
semantics, or meaning, of information on the WWW. It extends the network of hy-
perlinked human-readable web pages by inserting machine-readable metadata
about pages and how they are related to each other, enabling automated agents
to access the Web more intelligently and perform tasks on behalf of users
Berners-Lee is now the director of the World Wide Web Consortium (W3C),
which oversees the development of Semantic Web standards.
Since 2013, Semantic Web activities have been subsumed by
Web of Data activities
Semantic Technologies 1 14
Understanding the problem with WWW
Semantic Technologies 1 15
Understanding the problem with WWW
Task: can we make the data on the Web explicit and machine readable ?
Semantic Technologies 1 15
How to make the data on the Web more accessible?
ks at
wor
publish
ed by
Semantic Technologies 1 16
How to make the data on the Web more accessible?
ks at
wor
publish
ed by
Semantic Technologies 1 17
Linked Data basic principles
3. Provide useful information about what a name identifies when it’s looked up,
using open standards such as RDF , SPARQL , etc.
4. Refer to other things using their HTTP URI-based names when publishing
data on the Web.
– All kinds of conceptual things, they have names now that start with HTTP.
– If I take one of these HTTP names and I look it up, I will get back some data in a
standard format which is kind of useful data that somebody might like to know about
that thing, about that event.
– When I get back that information it’s not just got somebody’s height and weight and
when they were born, it’s got relationships. And when it has relationships, whenever
it expresses a relationship then the other thing that it’s related to is given one of
those names that starts with HTTP.
Semantic Technologies 1 18
Another application of KGs: data integration
ID Publisher City
id qpr Harper Collins London
ur
r
f:o
ho
ute
e
a:p nam
ut
r ig
Harper Collins
f:a
a
ina
a:
e
f:titr
l
Ghosh, Amitav
a:name
f:nom http://.../isbn/2020386682
e
ag eur
ep ct
om du
h ra
a: Ghosh, Amitav f:t
www.amitavghosh.com
f:n
om
Besse, Christianne
Query: give me the title of the original (Glass Palace)
Semantic Technologies 1 20
Add more information
• The dataset can be further combined with other sources such as Wikipedia
Semantic Technologies 1 21
Extending merged data
r
London Le Palais des miroirs
:pu
eu
a
ut
f:o
:a
e
f
a:p nam
r ig
Harper Collins
r,
o
th
ina
e
au
f:titr
l
a:
Ghosh, Amitav
a:name
http://.../isbn/2020386682
e
ag e ur
p ct
r:ty
e u
om ad
pe
h r
a: f: t
www.amitavghosh.com
e
foaf:Person r:typ f:n
om
Besse, Christianne
Query: give me the home page of the original’s auteur
Semantic Technologies 1 22
What did we do?
It can become even more powerful if we add extra knowledge such as:
• a full classification of various types of library data
• geographical information
• etc.
Semantic Technologies 1 23
What are Semantic Technoligies?
2. formal ‘rules’ that allow the machines to extract information from the data
(classify, query, etc.)
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign)
refers to the aspects of meaning that are expressed in a language, code, or
other form of representation.
In other words, semantics refers to the meanings assigned to symbols and sets
of symbols in a language.
Semantic Technologies 1 25
What is Semantics?
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign)
refers to the aspects of meaning that are expressed in a language, code, or
other form of representation.
In other words, semantics refers to the meanings assigned to symbols and sets
of symbols in a language.
• to a human?
Semantic Technologies 1 25
What is Semantics?
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign)
refers to the aspects of meaning that are expressed in a language, code, or
other form of representation.
In other words, semantics refers to the meanings assigned to symbols and sets
of symbols in a language.
• to a human?
• to a computer?
Semantic Technologies 1 25
What is Semantics?
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign)
refers to the aspects of meaning that are expressed in a language, code, or
other form of representation.
In other words, semantics refers to the meanings assigned to symbols and sets
of symbols in a language.
• to a human?
• to a computer?
Ontology in Philosophy
a philosophical discipline — a branch of phi-
oντ oλoγiα losophy that deals with the nature and the
organisation of reality
Semantic Technologies 1 26
Ontology in Philosophy
• What exists?
• Is existence a property?
• What is an object?
• Do non-physical
(abstract) objects exist?
• How things
should be classified?
Semantic Technologies 1 27
Ontology in Philosophy
• What exists?
Aristotle’s ontology:
• Is existence a property?
• What is an object?
• Do non-physical
(abstract) objects exist?
• How things
should be classified?
Semantic Technologies 1 27
Ontology in Computer Science
Semantic Technologies 1 28
Ontology in Computer Science
Semantic Technologies 1 28
Schema.org
They propose using the schema.org vocabulary along with the Microdata, RDFa,
or JSON-LD formats to mark up website content with metadata about itself.
Such markup can be recognised by search engine spiders and other parsers,
thus gaining access to the meaning of the sites.
Inspired by earlier formats such as Microformats, FOAF, OpenCyc.
To test the validity of the data marked up with the schemas and Microdata,
such validators as the Google Structured Data Testing Tool, Yandex Microformat
validator and Bing Markup Validator can be used.
Some Schema markups such as Organization and Person are used to influence
Google’s Knowledge Graph results. http://schema.org/Person
How to mark up your content using microdata: http://schema.org/docs/gs.html
Semantic Technologies 1 29
Google’s Knowledge Graph
https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html
Semantic Technologies 1 30
Wikidata
Wikidata is a free and open knowledge base that can be read and edited by
both humans and machines. Wikidata acts as central storage for the structured
data of its Wikimedia sister projects including
Wikipedia, Wikivoyage, Wikisource, and others.
Wikidata is a document-oriented database, focused on items. Each item represents
a topic and is identified by a unique ID. Information is added to items by creating
statements. Statements take the form of key-value pairs.
Semantic Technologies 1 31
Wikidata
Wikidata is a free and open knowledge base that can be read and edited by
both humans and machines. Wikidata acts as central storage for the structured
data of its Wikimedia sister projects including
Wikipedia, Wikivoyage, Wikisource, and others.
Wikidata is a document-oriented database, focused on items. Each item represents
a topic and is identified by a unique ID. Information is added to items by creating
statements. Statements take the form of key-value pairs.
also http://wiki.dbpedia.org
Semantic Technologies 1 31
Ontologies in sciences
• Bioinformatics
Semantic Technologies 1 33
BBC Online
Launched in the mid 1990s, the BBC website was focused on supporting
BBC Web-based service is one of the most visited websites and the world’s largest news
website. As of 2007, it contained over two million pages
Semantic Technologies 1 34
BBC Online
Launched in the mid 1990s, the BBC website was focused on supporting
BBC Web-based service is one of the most visited websites and the world’s largest news
website. As of 2007, it contained over two million pages
difficult to find everything BBC has published about any given object
Semantic Technologies 1 34
Creating a website for the Football World Cup 2010
“The underlying publishing framework does not author content directly; rather
it publishes data about the content — metadata. The published metadata
describes the world cup content at a fairly low-level of granularity, provid-
ing rich content relationships and semantic navigation. By querying this
published metadata we are able to create dynamic page aggregations
for teams, groups and players.”
– Stats and scores from other sources are imported from XML and
mapped to ontological concepts
– Use of the technique also for the 2012 Olympic Games in London
Semantic Technologies 1 36
The BBC Football World Cup 2010
Semantic Technologies 1 37
The underlying architecture
Semantic Technologies 1 39
Data access in industry
(from Norwegian Petroleum Directorate’s FactPages)
5 days later:
SELECT DISTINCT cores.wlbName, cores.lenghtM, wellbore.wlbDrillingOperator, wellbore.wlbCompletionYear
FROM
( (SELECT wlbName, wlbNpdidWellbore, (wlbTotalCoreLength * 0.3048) AS lenghtM
FROM wellbore core
WHERE wlbCoreIntervalUom = ’[ft ]’ )
UNION
(SELECT wlbName, wlbNpdidWellbore, wlbTotalCoreLength AS lenghtM
FROM wellbore core
WHERE wlbCoreIntervalUom = ’[m ]’ )
) as cores,
( (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore development all
UNION
(SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore exploration all )
UNION
(SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore shallow all )
) as wellbore
WHERE wellbore.wlbNpdidWellbore = cores.wlbNpdidWellbore
...
Semantic Technologies 1 39
Data access in industry
(from Norwegian Petroleum Directorate’s FactPages)
5 days later:
SELECT DISTINCT cores.wlbName, cores.lenghtM, wellbore.wlbDrillingOperator, wellbore.wlbCompletionYear
FROM
( (SELECT wlbName, wlbNpdidWellbore, (wlbTotalCoreLength * 0.3048) AS lenghtM
FROM wellbore core
WHERE wlbCoreIntervalUom = ’[ft ]’ )
UNION
(SELECT wlbName, wlbNpdidWellbore, wlbTotalCoreLength AS lenghtM
FROM wellbore core
WHERE wlbCoreIntervalUom = ’[m ]’ )
) as cores, at Equinor (former Statoil):
( (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore development all
UNION 1,000 TB of relational data
(SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore exploration all )
UNION 2,000 tables
(SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear
FROM wellbore shallow all )
) as wellbore different schemas
WHERE wellbore.wlbNpdidWellbore = cores.wlbNpdidWellbore
... 30–70% of time on data gathering
Semantic Technologies 1 39
Ontology-based data access (OBDA)
[] rdf:type rr:TriplesMap;
rr:logicalTable "select * from wellbore core"; stratumForWellbore
rr:subjectMap [ a rr:TermMap;
rr:template "&npd-v2;wellbore/{wlbNpdidWellbore}/";];
rr:propertyObjectMap [ rr:property npdv:coreIntervalBottom; WellboreStratum ontology
rr:column "wlbCoreIntervalBottom" ];
... mappings
A B C D
1
2
CREATE TABLE wellbore core ( 3
wlbName varchar(60) NOT NULL, 4
wlbCoreNumber int(11) NOT NULL, 5
wlbCoreIntervalTop decimal(13,6),
)
... data sources
Ontology
– gives a high-level conceptual view of the data
– provides a convenient & natural vocabulary for user queries
– enriches incomplete data with background knowledge
Semantic Technologies 1 40