RDF serialisation needs a specification #454

gouttegd · 2025-03-21T09:31:13Z

gouttegd
Mar 21, 2025
Maintainer

If the RDF format produced/consumed by sssom-py is supposed to be usable at the same level as the SSSOM/TSV format, it needs to be specified.

Having to infer the serialisation rules by looking at the output of the “reference implementation“ (which is what I had to do to implement RDF support in SSSOM-Java) is not acceptable for something that pretends to be a standard.

Same thing for the OWL/RDF serialisation, which is only specified (poorly) for the individual mappings, not for mapping set objects.

matentzn · 2025-03-27T14:04:06Z

matentzn
Mar 27, 2025
Maintainer

I think this is critical for some of our prospective user base. for me, the main problem is how to deal with #359 when the pressure mounts (I expect it to mount the latest in the summer). will we have "mapping_id" (as an "optional" property) hang off the reified RDF statement as sssom:mapping_id, or will we use it as the resource identifier for the mapping (which is more correct, but will create an entirely different RDF serialisation).

I expect people to favour the latter, but I find it hard to reconcile this with the "optionality" of the mapping_id.

0 replies

gouttegd · 2025-03-27T14:53:21Z

gouttegd
Mar 27, 2025
Maintainer Author

we use [the mapping_id] as the resource identifier for the mapping (which is more correct, but will create an entirely different RDF serialisation). I expect people to favour the latter

As you note, this would result in a RDF serialisation entirely different than the one currently being supported by SSSOM-Py (and SSSOM-Java, since in the absence of a spec SSSOM-Java basically did the same thing as SSSOM-Py).

That is not necessarily a blocker (RDF serialisation is not specified as of SSSOM 1.0, so it’s fine to radically change it), but still something to be carefully considered.

If people have already started using the current RDF serialisation, we might end up having to specify two different RDF serialisations: one that corresponds to the current output produced by SSSOM-Py (where individual mappings are blank nodes, without a resource identifier), and one that uses the to-be-introduced mapping_id slot as the resource identifier.

I find it hard to reconcile this with the "optionality" of the mapping_id.

(A) The new RDF serialisation format is only available for mapping sets that do use the new mapping_id slot. The slot itself remains generally optional, but the new RDF serialisation format requires it.

(B) Consequently, if you have a mapping set without mapping_id slots and you absolutely want to serialize it into the new RDF format, you must generate mapping_id slots on the fly.

0 replies

gouttegd · 2025-03-29T10:48:16Z

gouttegd
Mar 29, 2025
Maintainer Author

I find it hard to reconcile this with the "optionality" of the mapping_id.

In fact, thinking back about this, I don’t see how is that a problem, given that RDF has the concept of blank nodes?

When serialising to RDF:

If a mapping has a mapping_id slot, then serialise the mapping as a named node with the mapping_id as the resource identifier.
Otherwise, then serialise the mapping as a blank node.

When deserialising from RDF:

If a mapping has a resource identifier, store it into the mapping_id slot.
Otherwise (blank node), leave the mapping_id slot empty.

So, for example, with identifiers:

<https://example.org/myset> a sssom:MappingSet ;
   terms:title "My mapping set" ;
   sssom:mappings <https://example.org/mymapping1>, <https://example.org/mymapping2> .

<https://example.org/mymapping1> a owl:Axiom ;
   owl:annotatedSource UBERON:0000001 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000001 ;
   sssom:mapping_justification semapv:ManualMappingCuration .
   
<https://example.org/mymapping2> a owl:Axiom ;
   owl:annotatedSource UBERON:0000002 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000002 ;
   sssom:mapping_justification semapv:ManualMappingCuration .

The same set, without identifiers:

[] a sssom:MappingSet ;
   terms:title "My mapping set" ;
   sssom:mappings [ a  owl:Axiom ;
       owl:annotatedSource UBERON:0000001 ;
       owl:annotatedProperty semapv:crossSpeciesExactMatch ;
       owl:annotatedTarget FBbt:00000001 ;
       sssom:mapping_justification semapv:ManualMappingCuration
   ], [ a owl:Axiom ;
       owl:annotatedSource UBERON:0000002 ;
       owl:annotatedProperty semapv:crossSpeciesExactMatch ;
       owl:annotatedTarget FBbt:00000002 ;
       sssom:mapping_justification semapv:ManualMappingCuration
   ] .

1 reply

gouttegd Jun 18, 2025
Maintainer Author

An interesting observation that I just made with the latest version of SSSOM-Java (which implements the solution proposed above):

Writing a set like this:

<https://example.org/mymapping1> a owl:Axiom ;
   owl:annotatedSource UBERON:0000001 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000001 ;
   sssom:mapping_justification semapv:ManualMappingCuration .
   
<https://example.org/mymapping2> a owl:Axiom ;
   owl:annotatedSource UBERON:0000002 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000002 ;
   sssom:mapping_justification semapv:ManualMappingCuration .

<https://example.org/myset> a sssom:MappingSet ;
   terms:title "My mapping set" ;
   sssom:mappings <https://example.org/mymapping1>, <https://example.org/mymapping2> .

(with record_id values used as the resources representing the mapping records) is about 60 to 80 times slower (tested with a set containing ~30K records) than writing the same set like this:

<https://example.org/myset> a sssom:MappingSet ;
   terms:title "My mapping set" ;
   sssom:mappings [ a  owl:Axiom ;
       owl:annotatedSource UBERON:0000001 ;
       owl:annotatedProperty semapv:crossSpeciesExactMatch ;
       owl:annotatedTarget FBbt:00000001 ;
       sssom:mapping_justification semapv:ManualMappingCuration
   ], [ a owl:Axiom ;
       owl:annotatedSource UBERON:0000002 ;
       owl:annotatedProperty semapv:crossSpeciesExactMatch ;
       owl:annotatedTarget FBbt:00000002 ;
       sssom:mapping_justification semapv:ManualMappingCuration
   ] .

(without record_id slots, so mapping records are rendered as blank nodes).

The former takes ~2.5 minutes, while the latter takes ~2 seconds. (Also interestingly, this is only when writing to RDF; when reading from RDF, both sets can be parsed in about the same time.)

All that time is spent within the underlying RDF library used by SSSOM-Java (Eclipse Rdf4J), so it’s pretty much outside of SSSOM-Java’s control (though I’ll check if maybe I am using the library in a somehow inefficient way).

Now, that doesn’t mean that I am opposed to serialising a set as in the first example, but this does reinforce my conviction that SSSOM should not require that mapping records have a record_id in order to be serialisable into RDF. If mapping records have a record_id slot, then the value of that slot MUST be used as the resource identifier, but if they don’t, then it should be perfectly acceptable to serialise the records as blank nodes, and the SSSOM spec should not forbid that.

matentzn · 2025-04-02T13:19:05Z

matentzn
Apr 2, 2025
Maintainer

I agree in that this is an ok compromise, but, there are strict limitations. If you stick with blank nodes, for example, you cannot easily build an API on top of the sssom data model (without extending it for API purposes) which requires one to be able to not only go from a mapping set to a mapping, but also back; say you have a search, then you find all mappings related to your search, how to you retain which mapping set a mapping comes from without it being a named node in the graph?

I understand this is a bit detail, but at the very least we should contemplate this as we tripped all over this when trying to build a massive RDF graph with all mappings in the world and wanting to build a super thin API layer on top..

0 replies

gouttegd · 2025-04-02T13:34:27Z

gouttegd
Apr 2, 2025
Maintainer Author

Well then, just explicitly state that your API can only work on mapping sets whose mappings have a mapping_id slot. And if it is provided with mappings without such a slot, have it automatically inject auto-generated IDs (not the same thing as auto-generated blank node IDs!).

0 replies

gouttegd · 2025-04-02T13:39:01Z

gouttegd
Apr 2, 2025
Maintainer Author

And if it is provided with mappings without such a slot, have it automatically inject auto-generated IDs

If we do end up introducing an optional mapping_id slot in SSSOM 1.1, it is almost certain that SSSOM tools like SSSOM-Py or SSSOM-Java will have to provide a feature allowing to inject auto-generated IDs to mappings that do not have one.

Otherwise, how would you, for example, merge a mapping set that contains mapping IDs with a mapping set that does not? As I recall, it was kind of agreed, last time we discussed about a potential mapping_id slot, that it would be optional but with the constraint that within a set, either all mappings have a mapping_id, or none of them have one (it would not make sense to have some mappings with an identifier and some mappings without one).

0 replies

matentzn · 2025-04-11T11:59:21Z

matentzn
Apr 11, 2025
Maintainer

If we do end up introducing an optional mapping_id slot in SSSOM 1.1, it is almost certain that SSSOM tools like SSSOM-Py or SSSOM-Java will have to provide a feature allowing to inject auto-generated IDs to mappings that do not have one.

So what would be the behaviour if you merge two mapping sets without mapping_id? Using blank nodes syntax? You maintain then that in these cases, we do not auto-inject? Or wouldn't it better for rdf to always autoinject (for example by hashing the entire mapping object)?

0 replies

gouttegd · 2025-04-11T13:52:04Z

gouttegd
Apr 11, 2025
Maintainer Author

So what would be the behaviour if you merge two mapping sets without mapping_id?

If the two mapping sets are both without mapping_id slots, then there’s nothing special to do. The resulting set will also be without mapping_id slots. (And so, if the set is then exported to RDF, its mappings will be serialised as blank nodes.)

If one set has mapping_id slots and the other does not, then the user will have to decide between

a) drop the mapping_id slots from the first set;
b) fabricate mapping_id slots for the second set;
c) do nothing and proceed with the production of a set with a mix of mappings with a mapping_id slot and mappings without one – but in that case, warn the user than the resulting set will not be compliant with the spec, and therefore may not be handled correctly by all tools).

This is completely orthogonal with the question of what happens if/when the set is later exported to RDF. If the user chooses (a) and then writes to RDF, then the RDF model will contain mappings serialised as blank nodes (since they don’t have mapping_id slots); if she chooses (b) and then writes to RDF, then the RDF model will contain mappings serialised as named nodes (since they will have mapping_id slots); if she chooses (c) and then writes to RDF, then the RDF model will contain some mappings serialised as named nodes (those coming from the first set), and some mappings serialised as blank nodes (those coming from the second set) – arguably this would probably not be a very good idea, which is precisely why the user should be warned against doing that.

You maintain then that in these cases, we do not auto-inject?

I tend to think that auto-injection of mapping IDs should only ever happen upon the explicit request of the user, or at the very least with her informed consent. (“This program can only work if mappings have a unique identifier; if you load a mapping sets that do not contain such identifiers, auto-generated identifiers will be injected. Proceed? Yes/Cancel.”)

Or wouldn't it better for rdf to always autoinject (for example by hashing the entire mapping object)?

I don’t see why that would be better. I get that you are thinking about use cases where mapping identifiers would be needed anyway, but that may not always be the case. Given that neither SSSOM/TSV nor SSSOM/JSON currently mandate that mappings should have identifiers, why would RDF, specifically, mandate that? Knowing that RDF’s blank node syntax gives us a perfectly normal way to represent ID-less mappings?

(And I also think that auto-generated IDs should not be derived from the mapping itself by hashing its contents.)

0 replies

matentzn · 2025-05-19T10:01:16Z

matentzn
May 19, 2025
Maintainer

From @gouttegd on slack:

One thing in particular is absolutely not obvious (and would, in my opinion, deserve discussion): that all multivalued elements (including the list of mappings in a set) are simply represented by a list of triples associating the slot holding the list with the different values of the list.
For example, the list of mappings M1, M2, ..., Mn in a set S is represented by a list of triples like:
S sssom:mappings M1
S sssom:mappings M2
...
S sssom:mappings Mn
OK, this makes sense, and it is the easiest way to represent a list.
But we could have decided to use RDF containers or RDF collections instead (in fact when I started working on adding RDF support for SSSOM-Java I had assumed that RDF collections would be used – I had to look at the output of SSSOM-Py to realise that a much simpler approach had been chosen instead).

One reason I like

S sssom:mappings M1
S sssom:mappings M2
...
S sssom:mappings Mn

I am not a big fan of RDF containers, e.g:

S sssom:mappings (
  M1
  M2
  Mn
) .

or even worse, explicitly:

S sssom:mappings _:list1 .
_:list1 rdf:first M1 ;
        rdf:rest _:list2 .
_:list2 rdf:first M2 ;
        rdf:rest rdf:nil .

because it makes it very cumbersome to query with SPARQL. I am so far nearly convinced we should use the simple first option.

0 replies

gouttegd · 2025-05-19T10:15:30Z

gouttegd
May 19, 2025
Maintainer Author

I am so far nearly convinced we should use the simple first option.

No objection to that, but there’s at least two things that are at least worth discussing with the simple option.

(A) It does not guarantee the order of items. Maybe it’s a big deal, maybe not. I personally don’t think it is, but this may run afoul of some people’s expectations.

For example, some people may expect that there is a correlation between the list of author_id and the list of author_label – that is, when both author_id and author_label are present, the first author_label is the label of the first author_id, the second author_label is the label of the second author_id, and so on. This is only possible if we can guarantee that the order of items in lists is always preserved.

(I personally think they should not assume that, and in fact the spec should explicitly and unambiguously state that the two lists are completely unrelated.)

People may also assume things such as, the first author is the most important one, the author that did most of the work. This again assumes that there is an order in lists. (Again, I think people should not assume things like that, but I also recognise that this is not, at first sight, an unreasonable assumption to make.)

(B) RDF lists are closed. There is an explicit list terminator (rdf:nil), so even under an open world assumption you can safely assume that the list you have is the complete list, there can’t be no other items that you are not aware of (in fact you don’t have to assume that, you know it). With a simple unstructured “list” of triples, the list is open-ended.

Again, maybe this is a big deal, maybe not, but I think this is an important semantic difference that warrants at least a discussion.

0 replies

matentzn · 2025-05-19T10:38:08Z

matentzn
May 19, 2025
Maintainer

All good points. I am a bit concerned if we can impose a sort order at all if we rely on standard serialisers? Like if we take SSSOMs:

MUST sort the mappings in lexicographical order on all their slots, in the order the slots appear in the “Slots” table.

Can something like this be done with an out of the box RDF writer like say Jenas or rdflib, or do we have to implement our own serialisation procedure?

0 replies

gouttegd · 2025-05-19T10:53:41Z

gouttegd
May 19, 2025
Maintainer Author

I am a bit concerned if we can impose a sort order at all if we rely on standard serialisers?

Not if we stick to the “simple” representation of lists. If we want to prescribe an order, then we kinda must use RDF collections.

If we stick to the simple representation, then even if I wrote my own Turtle serialiser in SSSOM-Java to force the list items to be written in a given order, there would be no guarantee that whichever program reads the set would preserve the order. Unless you are somehow suggesting that we invent our own variant of Turtle that is almost identical to standard Turtle, except that the orders in which the triples are listed is significant and must be preserved. (I know you are not suggesting that! right?)

Note, however, that this:

MUST sort the mappings in lexicographical order on all their slots, in the order the slots appear in the “Slots” table.

comes from the spec of the SSSOM/TSV format. This is not relevant for the SSSOM/RDF serialisation.

And more precisely, it comes from the definition of the canonical SSSOM/TSV format, whose only purpose is to reduce spurious diffs due to serialisation differences. It does not mean that the order of mappings is meaningful in any way.

0 replies

gouttegd · 2025-05-19T11:28:36Z

gouttegd
May 19, 2025
Maintainer Author

To clarify: I am not suggesting that we should absolutely revise the way lists are represented in the (existing, undocumented) RDF output. I am perfectly happy with the “simple” representation we currently use, and in fact I would be quite in favour of keeping it if only for the simple reason that this is what SSSOM-Py has always produced – I see no reason for changing that now, unless we decide that the shortcomings of that representation (absence of inherent order, “open-endedness”) are somehow problematic.

I personally don’t think those shortcomings are problematic, I just want to be sure that the different options have been considered – and then that the representation that is ultimately chosen is properly documented.

0 replies

gouttegd · 2025-05-31T17:40:57Z

gouttegd
May 31, 2025
Maintainer Author

I thought you wanted to turn this issue into a discussion?

Anyway, from what I can tell here’s what need to happen before we can get a spec for RDF serialisation:

We need a mapping_id slot (or record_id or whatever: the point is, we need a slot that can hold a unique identifier for a mapping record, that can be turned into a resource identifier when serialising to RDF).
We need a decision on how to render sets that won’t have a mapping_id slot: do we allow rendering them with blank nodes, or do we mandate that all mappings MUST have a mapping_id slot before they can be serialised to RDF (which would mean that RDF serialisation would only be available for SSSOM 1.1+ sets, not for SSSOM 1.0 sets)?
We need a decision on how to render URI-typed slots. For now, both SSSOM-Py and SSSOM-Java render them as RDF literals with a xsd:anyURI type, but this doesn’t make everyone happy (mapping_set_id in RDF should be subject instead of literal object sssom-py#590).
If URI-typed slots are to be rendered as resource identifiers, then we should clarify once and for all that URI-typed slots are expected to contain absolute URIs only (Change definition of type Uri #448) — which does not seem to be the case of the LinkML uri type (Definition for Uri type should clarify what values are acceptable linkml/linkml#2726), so SSSOM probably needs its own absolute_uri type.
Regarding the serialisation of multivalued slots, I think the decision to render them with “simple” lists of triples rather than with complex structures like RDF containers has already been taken. That said, I’d like to hear what people actually interested in having a RDF serialisation (e.g. @andrawaag) think about that, because for now you (@matentzn) and I have been the only two people ever discussing that.

0 replies

gouttegd · 2025-06-02T07:54:38Z

gouttegd
Jun 2, 2025
Maintainer Author

For what it’s worth, for all the above questions, I propose:

Mapping id slot: already proposed (under the term record_id rather than mapping_id, to avoid possible confusion with a hypothetical “core mapping id”).
Allow rendering mappings (and mapping sets for that matter) without an explicit identifier using blank nodes: yes, because I can’t think of no reason not to do that.
Render URI-typed slots (other than mapping_set_id) as resource identifiers rather than xsd:anyURI literals: I’d say no when serialising to RDF, because not all URIs are in fact identifiers and I don’t think we should assume they are. But we can accept both URI literals and resource identifiers when deserialising from RDF.
Clarify than URI-typed slots expect absolute values: already proposed
Assign a property IRI to all enum values: I’d say yes, if only for consistency.

1 reply

matentzn Jun 2, 2025
Maintainer

Assign a property IRI to all enum values: I’d say yes, if only for consistency.

Agreed. I just checked in LinkML channel if there is anything a bit more out of the box for this, but in principle I agree..

gouttegd · 2025-06-15T14:14:36Z

gouttegd
Jun 15, 2025
Maintainer Author

As mentioned somewhere in a ticket, we should review all properties that are currently associated with slots (and that are therefore used to represent the slots in RDF) and check that we are fine with them.

Slot	Property	Comment
`sssom_version`	`sssom:sssom_version`	replace by `dcterms:conformsTo`?
`curie_map`		Irrelevant; in RDF the CURIE map is stored using whatever mechanism supported by the concrete RDF syntax
`mappings`	`sssom:mappings`
`subject_id`	`owl:annotatedSource`
`subject_label`	`sssom:subject_label`
`subject_type`	`sssom:subject_type`
`predicate_id`	`owl:annotatedProperty`
`predicate_modifier`	`sssom:predicate_modifier`
`predicate_type`	`sssom:predicate_type`
`object_id`	`owl:annotatedTarget`
`object_label`	`sssom:object_label`
`object_category`	`sssom:object_category`
`mapping_justification`	`sssom:mapping_justification`
`object_type`	`sssom:object_type`
`mapping_set_id`		Irrelevant; this slot is rendered as the resource identifier, not as a property
`mapping_set_version`	`owl:versionInfo`
`mapping_set_title`	`dcterms:title`
`mapping_set_description`	`dcterms:description`
`mapping_set_confidence`	`sssom:mapping_set_confidence`
`creator_id`	`dcterms:creator`
`creator_label`	`sssom:creator_label`
`author_id`	`pav:authoredBy`
`author_label`	`sssom:author_label`
`reviewer_id`	`sssom:reviewer_id`
`reviewer_label`	`sssom:reviewer_label`
`license`	`dcterms:license`
`subject_source`	`sssom:subject_source`
`subject_source_version`	`sssom:subject_source_version`
`object_source`	`sssom:object_source`
`object_source_version`	`sssom:object_source_version`
`mapping_set_source`	`prov:wasDerivedFrom`
`mapping_source`	`sssom:mapping_source`	could be `prov:wasDerivedFrom` as well? There would be no risk of confusion with `mapping_set_source` since the two slots are on different classes
`mapping_cardinality`	`sssom:mapping_cardinality`
`mapping_tool`	`sssom:mapping_tool`
`mapping_tool_id`	`sssom:mapping_tool_id`
`mapping_tool_version`	`sssom:mapping_tool_version`
`mapping_date`	`pav:authoredOn`	replace by `dcterms:created`? (#457)
`publication_date`	`dcterms:created`	replace by `dcterms:issued`? (#456)
`confidence`	`sssom:confidence`
`subject_match_field`	`sssom:subject_match_field`
`object_match_field`	`sssom:object_match_field`
`match_string`	`sssom:match_string`
`subject_preprocessing`	`sssom:subject_preprocessing`
`object_preprocessing`	`sssom:object_preprocessing`
`curation_rule`	`sssom:curation_rule`
`curation_rule_text`	`sssom:curation_rule_text`
`similarity_score`	`sssom:similarity_score`
`similarity_measure`	`sssom:similarity_measure`
`issue_tracker_item`	`sssom:issue_tracker_item`	replace by `IAO:0000233`? Or is that property too “OBO-specific”?
`issue_tracker`	`sssom:issue_tracker`
`see_also`	`rdfs:seeAlso`
`other`	`sssom:other`	Not really important, that slot should not be used anyway (extensions should be used instead)
`comment`	`rdfs:comment`
`record_id`		Irrelevant, this slos is to be rendered as the resource identifier, not as a property

Likewise, are we happy with the fact that a MappingSet is rendered as sssom:MappingSet and a Mapping is rendered as a owl:Axiom, as in:

<https://example.org/my-mapping-set> a sssom:MappingSet .
<https://example.org/my-mapping-set/mapping0001/> a owl:Axiom .

0 replies

matentzn · 2025-06-16T14:59:03Z

matentzn
Jun 16, 2025
Maintainer

From @nichtich

This follow-up discussion on how to express information about mapping tool in RDF belongs to RDF serialization but starting yet another thread there will unlikely improve the discussion. With three, single-value slots about the mapping tool (mapping_tool_id for URI, mapping_tool_version for version number, mapping_tool for name, description or URL) it is easier than other cases. Still there are multiple ways to model the same information in RDF and each form has its pros and cons.

At the moment I'd favour a hybrid approach with flat RDF properties like proposed by @gouttegd but with defined semantics of this properties how they map to additional RDF triples dcterms:isVersionOf and owl:versionInfo
?maping ex:created_with_mapping_tool wd:Q67641976 , [
            dcterms:isVersionOf wd:Q67641976 ;
            owl:versionInfo "1.11.0" ;
            ex:name_or_description_or_url "http://example.org/tool-homepage"
        ],
        ex:created_with_version_of_mapping_tool "1.11.0" .

wd:Q67641976 ex:name_or_description_or_url "http://example.org/tool-homepage" .

Originally posted by @nichtich in #449

0 replies

gouttegd · 2025-07-03T19:03:16Z

gouttegd
Jul 3, 2025
Maintainer Author

First draft of the RDF spec:

RDF serialisation of SSSOM

RDF formats

This specification defines how a SSSOM mapping set can be converted from or to a RDF model. The RDF model that represents a SSSOM mapping set is independent of the concrete format that may be used to serialise the model (Turtle, N-Triples, N3, RDF/XML, etc.).

Implementations SHOULD support reading and writing a SSSOM set from and to the Turtle serialisation format. They MAY support any other RDF serialisation formats.

Serialisation of slots

The basic principle is that a slot on any given SSSOM object (either a Mapping or a MappingSet) is rendered as a RDF triple where:

the subject is the resource representing the Mapping or MappingSet;
the predicate is either:
- the property indicated by the URI field in the LinkML description of the slot, if such a field is present;
- or a property constructed by catenating the https://w3id.org/sssom/ namespace and the name of the slot;
the object is the value of the slot.

Serialisation of slot values

The following rules determine how the value of a slot is rendered as the object of a RDF triple.

If the slot is typed as an Entity Reference

The value is rendered as a named RDF resource (IRI).

If the slot is typed as a URI

??? Proposition A
The value is rendered as a xsd:anyURI literal.

??? Proposition B
The value is rendered as a named RDF resource (IRI).

When reading from RDF, implementations MAY also accept a value rendered as a named RDF resource.

If the slot is typed as a date

The value is rendered as a xsd:date literal.

If the slot is typed as an enumeration

If the permissible values for the enumeration are defined in the LinkML model as having an associated meaning property, then the value is rendered as a named RDF resource with the indicated property. Otherwise, the value is represented as a xsd:string literal.

Examples:

?mapping sssom:subject_type owl:Class .
?mapping sssom:mapping_cardinality "1:1" .

If the slot is multi-valued

As an exception to the general principle that slots are rendered by a single RDF triple, multi-valued slots are rendered by as many triples as there are values, each value being the object of one triple.

For example, for a mapping whose creator_label slot lists two creators “Alice” and “Bob”:

?mapping sssom:creator_label "Alice", "Bob" .

RDF complex structures such as rdfs:Container or rdfs:List MUST NOT be used to represent multi-valued slots.

Serialisation of a Mapping object

The RDF type of a Mapping object is owl:Axiom.

If the Mapping object has a record_id slot, then the value of that slot is used as the named RDF resource that represents the object (and consequently that slot does not need to be serialised using the general rules listed above). Otherwise, the mapping is represented as a blank node.

Serialisation of a MappingSet object

The RDF type of a MappingSet object is sssom:MappingSet.

A MappingSet object is normally represented by a named RDF resource corresponding to the value of the mapping_set_id slot (which consequently does not need to be serialised using the general rules listed above).

When reading, implementations MAY support a MappingSet represented by a blank node, corresponding to the case where the MappingSet does not have a mapping_set_id slot.

The mappings contained within the set are represented by a series of triples (one per mapping) where:

the subject is the mapping set resource;
the predicate is sssom:mappings;
the object is the mapping resource (represented either as a blank node or as a named resource using the mapping’s record_id slot, as described above).

The curie_map slot MUST NOT be rendered as a normal slot. Instead, if it is needed it MUST be rendered using whatever mechanism is provided by the chosen RDF serialisation format (for example, @prefix declarations in Turtle, or xmlns namespace declarations in RDF/XML). The curie_map slot MAY be omitted entirely if all named resources and predicates are serialised as full-length IRIs and never as CURIEs.

Importantly, when named resources and predicates are serialised as CURIEs, then all used prefix names MUST be declared using the appropriate mechanism for the serialisation format, even prefix names that are considered built-in in the context of SSSOM (such as sssom, skos, semapv, etc.). This is because SSSOM mapping sets serialised as RDF MUST be readable by a generic RDF implementation which is not aware of SSSOM and therefore not aware of any built-in prefix.

10 replies

gouttegd Jul 11, 2025
Maintainer Author

Maybe clearly say sssom:NonRelativeURI?

That PR was not merged when I wrote this draft. ;p

The value is rendered as a named RDF resource (IRI).

I am fine with that as well

OK. For the record I slightly prefer proposition A (nonRelativeURI values are rendered as xsd:anyURI literals), but I don’t have a strong enough opinion to argue for it, so named resource it is.

I 100% agree but this sentence is more a comment than a rule right

Right. That’s another “non-normative clarification”.

The RDF type of a Mapping object is owl:Axiom.

This was a silly thing to do. In fact the whole reification thing fits much better in a OWL serialisation instead of the default RDF one, but well. It will be insane to change that now (?).

I’d say that if you want to change that, now would be moment to do it!

gouttegd Jul 11, 2025
Maintainer Author

I’d say that if you want to change that, now would be moment to do it!

More precisely, now would be the last opportunity to do it.

The RDF serialisation has never been officially specified in SSSOM 1.0, so we could drastically change it if we really wanted to.

I would be mildly against it, on the grounds that even if SSSOM/RDF was not specified, SSSOM-Py has been producing SSSOM/RDF files since, err, forever? and SSSOM-Java has been producing such files since end of last year, and therefore I would not like to change the format for the sole reason that suddenly we don’t like it anymore.

But I could be convinced otherwise.

matentzn Jul 11, 2025
Maintainer

I agree. I am also torn.

If I would volunteer to write the OWL spec on top of the RDF spec, would your objections somewhat alleviated as we can basically tell people: use the OWL serialisation instead? In my experience, the 10 people that use the tools to convert into RDF in fact want to merge these into ontologies. I know one use case for vanilla rdf sssom py translation (which would break with this decision), but I am willing to take the blame and make the fix.

Lets ask a bit around.

gouttegd Jul 11, 2025
Maintainer Author

In my experience, the 10 people that use the tools to convert into RDF in fact want to merge these into ontologies.

There’s at least one use case for RDF that is independent of OWL: being able to run arbitrarily complex queries on a set. Turn your set into a RDF model and load it into anything that allows executing SPARQL or Cypher queries. I know at least one person who does that.

Of course that can be done regardless of whether the RDF model is “vanilla RDF” or OWL/RDF. I am just noting that “merging into a OWL ontology” may not be the sole reason for wanting to export a set to RDF.

If I would volunteer to write the OWL spec on top of the RDF spec, would your objections somewhat alleviated as we can basically tell people: use the OWL serialisation instead?

Let’s answer something first: what would the differences between “vanilla RDF” and OWL/RDF be exactly?

Because if those differences are simply the following 5:

	vanilla RDF	OWL/RDF
type of the `MappingSet` object	`sssom:MappingSet`	`owl:Ontology`
type of the `Mapping` object	`sssom:Mapping`	`owl:Axiom`
predicate representing `subject_id`	`sssom:subject_id`	`owl:annotatedSubject`
predicate representing `predicate_id`	`sssom:predicate_id`	`owl:annotatedPredicate`
predicate representing `object_id`	`sssom:object_id`	`owl:annotatedTarget`

then frankly what is the point of having two formats?

Let’s have a single format with a “vanilla RDF” mode and a “OWL” mode.

Writers MUST support writing a set in OWL mode; they MAY support writing a set in “vanilla RDF” mode.
Readers MUST support reading a set in OWL mode; they MAY support reading a set in “vanilla RDF” mode. If they support both modes, the actual mode is selected as follows:
- does the RDF model contain a resource typed as owl:Ontology? then assume OWL mode;
- does the RDF model contain a resource typed as sssom:MappingSet? then assume “vanilla RDF” mode;
- otherwise, if the model does not contain any resource typed as either owl:Ontology or sssom:MappingSet, then the model does not contain a valid SSSOM representation, and the reader can abort.

Or, let’s just ditch that idea of a “vanilla RDF” and use instead what both SSSOM-Py and SSSOM-Java currently produce.

gouttegd Jul 11, 2025
Maintainer Author

Of note, what the spec currently calls “OWL/RDF” is a bit more complex than what you describe below with your ”SPARQL converter”.

Mappings are represented as owl:Axiom, sure, but the exact type of axiom is dependent on the nature of the predicate, so not all mappings would be represented as

?mapping a owl:Axiom ;
         owl:annotedSubject ?subject_id ;
         owl:annotatedPredicate ?predicate_id ;
         owl:annotatedTarget ?object_id .

For example, if the predicate is an object property, then the mapping is expected to be rendered as an existential restriction:

?mapping a owl:Axiom ;
         owl:annotatedSubject ?subject_id ;
         owl:annotatedPredicate rdfs:subClassOf ; # <- NOT predicate_id !
         owl:annotatedTarget [ a owl:Restriction ;
                               owl:onProperty ?predicate_id ;
                               owl:someValuesFrom ?object_id ] .

So what are talking about exactly here? Is there supposed to be two different “OWL/RDF” serialisations?

matentzn · 2025-07-11T18:05:43Z

matentzn
Jul 11, 2025
Maintainer

Aside from the decision to make SSSOM flat, I am a little conflicted now on modelling the core mapping triple using OWL reification in the RDF serialisation.

Since we do need the OWL serialisation separately, I wonder if:

We should not use OWL reification in the RDF serialisation and
Define the OWL serialisation as a SPARQL CONSTRUCT query and introduce reification there

I know this is a very bad thing to propose after SSSOM 1.0 but well, I wanted at least to say it out loud.

6 replies

matentzn Jul 11, 2025
Maintainer

Right?

Yes, that is what I mean.

Define the OWL serialisation as a SPARQL CONSTRUCT query

Instead of writing a spec for OWL, you just keep a mapping file that converts the RDF serialisation into OWL. This is pseudocode so you get an idea:

PREFIX sssom: <http://w3id.org/sssom/schema/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT {
# This is how it will look like in OWL
  [ ] a owl:Axiom ;
        owl:annotatedSubject ?subject ;
        owl:annotatedPredicate ?predicate ;
        owl:annotatedTarget ?object ;
        ?p ?x .
}
WHERE {
# This is how a mapping looks like in RDF
  ?mapping
    sssom:subject_id ?subject ;
    sssom:predicate_id ?predicate ;
    sssom:object_id ?object ;
   ?p ?x .

}

Obviously this is not thought through yet but thinking.

gouttegd Jul 11, 2025
Maintainer Author

Instead of writing a spec for OWL, you just keep a mapping file that converts the RDF serialisation into OWL.

Providing an easy and “executable” way to convert a SSSOM/RDF object into SSSOM/OWL (in the form of a SPARQL file) would be nice, but I’d argue that it should not dispense from writing a correct spec for SSSOM/OWL. If only because people might have some reason for wanting to do the conversion the other way around (they have a SSSOM/OWL file, they want to be able to read it and convert it to any other SSSOM format; asking them to infer how to read a SSSOM/OWL file by studying the SPARQL query would not be nice).

matentzn Jul 11, 2025
Maintainer

Ok! at the very least, an exectuable declarative conversion can serve as a guidepost to write a complete spec. I assume it will be OK to write a spec that is not self contained but rather phrased as a derivation of another?

nichtich Jul 16, 2025
Maintainer

From my experience, every specification is flawed until someone starts to build an implementation based on the specification. Only then minor or major questions, gaps, inconsistencies and errors are revealed and the specification can be improved to better match reality. So I'd strongly recommend to keep RDF serialization of SSSOM as explicitly preliminary until there is at least one reference implementation (this is independent from my differing opinion on whether to keep the RDF serialization "flat" following the underlying SSSOM model).

gouttegd Jul 16, 2025
Maintainer Author

So I'd strongly recommend to keep RDF serialization of SSSOM as explicitly preliminary until there is at least one reference implementation

Well that’s all good then, because SSSOM-Java does fully implement the spec as currently proposed, and SSSOM-Py is not too far off the mark. From the top of my head, SSSOM-Py’s current implementation differs on the three following points:

it does not render the mapping_set_id slot as the resource identifier for the MappingSet object;
it renders URI-typed slots as xsd:anyURI literals rather than resources;
it renders enumeration-typed slots as string literals.

That is, it produces something like this (prefix declarations omitted for brevity):

[] a sssom:MappingSet ;
   dcterms:license "https://creativecommons.org/licenses/by/4.0/"^^xsd:anyURI ;
   sssom:mapping_set_id "https://example.org/my-set"^^xsd:anyURI ;
   sssom:mappings [ a owl:Axiom ;
                      owl:annotatedSource EXA:0001 ;
                      owl:annotatedProperty skos:exactMatch ;
                      owl:annotatedTarget EXB:0010 ;
                      sssom:predicate_modifier "Not"^^xsd:string ] .

instead of

<https://example.org/my-set> a sssom:MappingSet ;
   dcterms:license <https://creativecommons.org/licenses/by/4.0/> ;
   sssom:mappings [ a owl:Axiom ;
                      owl:annotatedSource EXA:0001 ;
                      owl:annotatedProperty skos:exactMatch ;
                      owl:annotatedTarget EXB:0010 ;
                      sssom:predicate_modifier sssom:NegatedPredicate ] .

which should not be difficult to fix if/when we can get some developer time for SSSOM-Py.

(Incidentally, the fact that SSSOM-Py produces, and has always produced, this kind of output is precisely why the proposed spec has a section about backwards compatibility suggesting that SSSOM/RDF parsers accept those variations – something that is also fully implemented by SSSOM-Java.)

I very much believe in running code, and I have never proposed any addition or amendment to the spec for which I did not already have a working implementation.

gouttegd · 2025-07-15T11:47:41Z

gouttegd
Jul 15, 2025
Maintainer Author

It just occurred to me that we have not considered the case of literal mappings – mappings where the subject (respectively the object) is represented, not by a semantic identifier, but by a string literal in subject_label (resp. object_label) slot.

Given the absence of an identifier for the subject and/or object, such mappings obviously cannot be represented as:

?mapping a owl:Axiom ;
         owl:annotatedSource ?subject_id ;
         owl:annotatedProperty ?predicate_id ;
         owl:annotatedTarget ?object_id .

So what do we do with those?

I can think of several options.

In all the following, we consider the following literal mapping (prefix declarations omitted for brevity):

subject_label    subject_type    predicate_id       object_id
Alice            rdfs literal    skos:exactMatch    EX:0001

A. No literal mappings in RDF
We simply exclude literal mappings from the RDF serialisation. A set containing literal mappings cannot be represented in RDF (or at least, if it contains a mix of literal and non-literal mappings, only the non-literal mappings can be represented).

That would be unfortunate but it doesn’t seem unreasonable to me. After all, if the mapping is not between entities that can be represented as RDF resources, then maybe there is no point in having the mapping in RDF form?

B. Use subject_label where we would use subject_id
I mean doing something like this:

?mapping a owl:Axiom ;
         owl:annotatedSource "alice"^^xsd:string ; # <- literal instead of resource
         owl:annotatedProperty skos:exactMatch ;
         owl:annotatedTarget EX:0001 ;
         sssom:subject_type rdfs:literal .

I think that strictly speaking this should be valid, in that AFAIK it is not explicitly mandated anywhere that the object of a triple with a owl:annotatedSource predicate cannot be a literal – but I feel this is a bit like saying that the rules of football do not explicitly mandate that a player cannot be a horse…

Even if it is a legal construct strictly speaking, I am worried that RDF tools might barf on something like this – or even that RDF folks might want to put a contract on my head for even making such a suggestion.

C. Just omit the owl:annotatedSource
That is, we do

?mapping a owl:Axiom ;
         owl:annotatedProperty skos:exactMatch ;
         owl:annotatedTarget EX:0001 ;
         sssom:subject_label "alice"^^xsd:string ;
         sssom:subject_type rdfs:literal .

(Of note, this is what SSSOM-Java is currently producing – not by explicit decision, just as a logical consequence of the way the RDF serialiser works.)

This should be somewhat fine, but I wonder if the absence of the owl:annotatedSource could cause some issues with RDF consumers?

D. Use a blank node
If the absence of the owl:annotatedSource (as in option C) is deemed problematic, then we could do this:

?mapping a owl:Axiom ;
         owl:annotatedSource [] ; # <- blank node standing for the subject
         owl:annotatedProperty skos:exactMatch ;
         owl:annotatedTarget EX:0001 ;
         sssom:subject_label "alice"^^xsd:string ;
         sssom:subject_type rdfs:literal .

Not convinced it is much better.

E. Inject a fabricated ID
Remember that the literal profile does not preclude a literal subject from also having some kind of identifier (i.e. just because subject_type is set to rdfs literal, it does not mean that the mapping cannot also have a subject_id). So in the absence of a subject_id, we could just fabricate one, just to make the RDF serialiser happy:

?mapping a owl:Axiom ;
         owl:annotatedSource EX:qsdfjklm ; # randomly generated ID
         owl:annotatedProperty skos:exactMatch ;
         owl:annotatedTarget EX:0001 ;
         sssom:subject_label "alice"^^xsd:string ;
         sssom:subject_type rdfs:literal .

For now, I am inclined towards E+C. More precisely: it would be recommended that users of literal mappings generate pseudo-IDs prior to converting to RDF (E); if they don’t do it, and the RDF serialiser is given literal mappings without pseudo-IDs, then it should just omit the owl:annotatedSource predicate when serialising (C).

3 replies

matentzn Jul 16, 2025
Maintainer

😱 ahhhhhhh.

I think that strictly speaking this should be valid

No, its not; the subject/source of a triple must be an IRI. The object/target could be a literal. So we cant use B.

C. Just omit the owl:annotatedSource

I am not keen about this either. I don't even think its legal owl (it might be legal rdf, but still)

D. Use a blank node

I think this is a little better than C.

E. Inject a fabricated ID

Still not convinced.

I see two more options which are not great, but could be ok.

F. Explicitly not add reification to literal mappings:

?mapping a rdf:Statement ;
         sssom:predicate_id skos:exactMatch ;
         sssom:object_id EX:0001 ;
         sssom:subject_label "alice"^^xsd:string ;
         sssom:subject_type rdfs:literal .

G. Invert literal mappings prior to write out:

?mapping a owl:Axiom ;
         owl:annotatedSource EX:0001 ;
         owl:annotatedProperty skos:exactMatch ;  # <- This will have to be properly inverted if its broad / narrow etc
         owl:annotatedTarget "alice"^^xsd:string ; 
         sssom:subject_type rdfs:literal .

I only mention these two for completeness. I think my actually preferred option is A by default, with an OPTION --interpret-subject-label-as-resource to go to E.

But I am really not so sure here.. @nichtich what do you think?

gouttegd Jul 16, 2025
Maintainer Author

C. Just omit the owl:annotatedSource

I am not keen about this either. I don't even think its legal owl (it might be legal rdf, but still)

But we are precisely talking about RDF here. And RDF-wise, I think it is legal. None of the RDF tools I have tested (RDFLib, LightRDF, Jena) has ever complained about it.

F. Explicitly not add reification to literal mappings:

Really not keen on having two completely different representations depending on the type of mappings.

G. Invert literal mappings prior to write out:

This is assuming the mapping can be inverted, i.e. either the predicate is reversible, or it has a known inverse predicate.
SSSOM allows literal-to-literal mappings where both the subject and the object are literals.

I think my actually preferred option is A by default, with an OPTION --interpret-subject-label-as-resource to go to E.

What does that option name is even supposed to mean? Option E is to fabricate a subject_id (or object_id) so that we can basically avoid the entire issue, it has nothing to do with “interpreting the subject label as a resource“ (that would be option B).

But the name of the option notwithstanding, I am OK with A+E (default to A with option to go to E). In effect, this means: “a mapping with a literal subject (resp. object) can only be represented in RDF if it has a subject_id (resp. object_id)“.

matentzn Jul 18, 2025
Maintainer

interpreting the subject label as a resource

This was meant to mean "fabricate an id" and was poorly phrased; so --fabricate-id or some such would be the option.

This comment has been hidden.

Sign in to view

RDF serialisation needs a specification #454

Uh oh!

gouttegd Mar 21, 2025 Maintainer

Replies: 23 comments · 21 replies

Uh oh!

matentzn Mar 27, 2025 Maintainer

Uh oh!

Uh oh!

gouttegd Mar 27, 2025 Maintainer Author

Uh oh!

gouttegd Mar 29, 2025 Maintainer Author

Uh oh!

gouttegd Jun 18, 2025 Maintainer Author

Uh oh!

matentzn Apr 2, 2025 Maintainer

Uh oh!

gouttegd Apr 2, 2025 Maintainer Author

Uh oh!

gouttegd Apr 2, 2025 Maintainer Author

Uh oh!

matentzn Apr 11, 2025 Maintainer

Uh oh!

gouttegd Apr 11, 2025 Maintainer Author

Uh oh!

matentzn May 19, 2025 Maintainer

Uh oh!

Uh oh!

gouttegd May 19, 2025 Maintainer Author

Uh oh!

Uh oh!

matentzn May 19, 2025 Maintainer

Uh oh!

gouttegd May 19, 2025 Maintainer Author

Uh oh!

Uh oh!

gouttegd May 19, 2025 Maintainer Author

This comment has been hidden.

This comment has been hidden.

Uh oh!

gouttegd May 31, 2025 Maintainer Author

This comment has been hidden.

Uh oh!

Uh oh!

gouttegd Jun 2, 2025 Maintainer Author

Uh oh!

matentzn Jun 2, 2025 Maintainer

Uh oh!

gouttegd Jun 15, 2025 Maintainer Author

Uh oh!

matentzn Jun 16, 2025 Maintainer

Uh oh!

Uh oh!

gouttegd Jul 3, 2025 Maintainer Author

RDF serialisation of SSSOM

RDF formats

Serialisation of slots

Serialisation of slot values

If the slot is typed as an Entity Reference

If the slot is typed as a URI

If the slot is typed as a date

If the slot is typed as an enumeration

If the slot is multi-valued

Serialisation of a Mapping object

Serialisation of a MappingSet object

Uh oh!

gouttegd Jul 11, 2025 Maintainer Author

Uh oh!

gouttegd Jul 11, 2025 Maintainer Author

Uh oh!

matentzn Jul 11, 2025 Maintainer

Uh oh!

Uh oh!

gouttegd Jul 11, 2025 Maintainer Author

gouttegd
Mar 21, 2025
Maintainer

Replies: 23 comments 21 replies

matentzn
Mar 27, 2025
Maintainer

gouttegd
Mar 27, 2025
Maintainer Author

gouttegd
Mar 29, 2025
Maintainer Author

gouttegd Jun 18, 2025
Maintainer Author

matentzn
Apr 2, 2025
Maintainer

gouttegd
Apr 2, 2025
Maintainer Author

gouttegd
Apr 2, 2025
Maintainer Author

matentzn
Apr 11, 2025
Maintainer

gouttegd
Apr 11, 2025
Maintainer Author

matentzn
May 19, 2025
Maintainer

gouttegd
May 19, 2025
Maintainer Author

matentzn
May 19, 2025
Maintainer

gouttegd
May 19, 2025
Maintainer Author

gouttegd
May 19, 2025
Maintainer Author

gouttegd
May 31, 2025
Maintainer Author

gouttegd
Jun 2, 2025
Maintainer Author

matentzn Jun 2, 2025
Maintainer

gouttegd
Jun 15, 2025
Maintainer Author

matentzn
Jun 16, 2025
Maintainer

gouttegd
Jul 3, 2025
Maintainer Author

gouttegd Jul 11, 2025
Maintainer Author

gouttegd Jul 11, 2025
Maintainer Author

matentzn Jul 11, 2025
Maintainer

gouttegd Jul 11, 2025
Maintainer Author