Replies: 23 comments 21 replies
-
I think this is critical for some of our prospective user base. for me, the main problem is how to deal with #359 when the pressure mounts (I expect it to mount the latest in the summer). will we have "mapping_id" (as an "optional" property) hang off the reified RDF statement as sssom:mapping_id, or will we use it as the resource identifier for the mapping (which is more correct, but will create an entirely different RDF serialisation). I expect people to favour the latter, but I find it hard to reconcile this with the "optionality" of the mapping_id. |
Beta Was this translation helpful? Give feedback.
-
As you note, this would result in a RDF serialisation entirely different than the one currently being supported by SSSOM-Py (and SSSOM-Java, since in the absence of a spec SSSOM-Java basically did the same thing as SSSOM-Py). That is not necessarily a blocker (RDF serialisation is not specified as of SSSOM 1.0, so it’s fine to radically change it), but still something to be carefully considered. If people have already started using the current RDF serialisation, we might end up having to specify two different RDF serialisations: one that corresponds to the current output produced by SSSOM-Py (where individual mappings are blank nodes, without a resource identifier), and one that uses the to-be-introduced
(A) The new RDF serialisation format is only available for mapping sets that do use the new (B) Consequently, if you have a mapping set without |
Beta Was this translation helpful? Give feedback.
-
In fact, thinking back about this, I don’t see how is that a problem, given that RDF has the concept of blank nodes? When serialising to RDF:
When deserialising from RDF:
So, for example, with identifiers: <https://example.org/myset> a sssom:MappingSet ;
terms:title "My mapping set" ;
sssom:mappings <https://example.org/mymapping1>, <https://example.org/mymapping2> .
<https://example.org/mymapping1> a owl:Axiom ;
owl:annotatedSource UBERON:0000001 ;
owl:annotatedProperty semapv:crossSpeciesExactMatch ;
owl:annotatedTarget FBbt:00000001 ;
sssom:mapping_justification semapv:ManualMappingCuration .
<https://example.org/mymapping2> a owl:Axiom ;
owl:annotatedSource UBERON:0000002 ;
owl:annotatedProperty semapv:crossSpeciesExactMatch ;
owl:annotatedTarget FBbt:00000002 ;
sssom:mapping_justification semapv:ManualMappingCuration . The same set, without identifiers: [] a sssom:MappingSet ;
terms:title "My mapping set" ;
sssom:mappings [ a owl:Axiom ;
owl:annotatedSource UBERON:0000001 ;
owl:annotatedProperty semapv:crossSpeciesExactMatch ;
owl:annotatedTarget FBbt:00000001 ;
sssom:mapping_justification semapv:ManualMappingCuration
], [ a owl:Axiom ;
owl:annotatedSource UBERON:0000002 ;
owl:annotatedProperty semapv:crossSpeciesExactMatch ;
owl:annotatedTarget FBbt:00000002 ;
sssom:mapping_justification semapv:ManualMappingCuration
] . |
Beta Was this translation helpful? Give feedback.
-
I agree in that this is an ok compromise, but, there are strict limitations. If you stick with blank nodes, for example, you cannot easily build an API on top of the sssom data model (without extending it for API purposes) which requires one to be able to not only go from a mapping set to a mapping, but also back; say you have a search, then you find all mappings related to your search, how to you retain which mapping set a mapping comes from without it being a named node in the graph? I understand this is a bit detail, but at the very least we should contemplate this as we tripped all over this when trying to build a massive RDF graph with all mappings in the world and wanting to build a super thin API layer on top.. |
Beta Was this translation helpful? Give feedback.
-
Well then, just explicitly state that your API can only work on mapping sets whose mappings have a |
Beta Was this translation helpful? Give feedback.
-
If we do end up introducing an optional Otherwise, how would you, for example, merge a mapping set that contains mapping IDs with a mapping set that does not? As I recall, it was kind of agreed, last time we discussed about a potential |
Beta Was this translation helpful? Give feedback.
-
So what would be the behaviour if you merge two mapping sets without mapping_id? Using blank nodes syntax? You maintain then that in these cases, we do not auto-inject? Or wouldn't it better for rdf to always autoinject (for example by hashing the entire mapping object)? |
Beta Was this translation helpful? Give feedback.
-
If the two mapping sets are both without If one set has a) drop the This is completely orthogonal with the question of what happens if/when the set is later exported to RDF. If the user chooses (a) and then writes to RDF, then the RDF model will contain mappings serialised as blank nodes (since they don’t have
I tend to think that auto-injection of mapping IDs should only ever happen upon the explicit request of the user, or at the very least with her informed consent. (“This program can only work if mappings have a unique identifier; if you load a mapping sets that do not contain such identifiers, auto-generated identifiers will be injected. Proceed? Yes/Cancel.”)
I don’t see why that would be better. I get that you are thinking about use cases where mapping identifiers would be needed anyway, but that may not always be the case. Given that neither SSSOM/TSV nor SSSOM/JSON currently mandate that mappings should have identifiers, why would RDF, specifically, mandate that? Knowing that RDF’s blank node syntax gives us a perfectly normal way to represent ID-less mappings? (And I also think that auto-generated IDs should not be derived from the mapping itself by hashing its contents.) |
Beta Was this translation helpful? Give feedback.
-
From @gouttegd on slack:
One reason I like
I am not a big fan of RDF containers, e.g:
or even worse, explicitly:
because it makes it very cumbersome to query with SPARQL. I am so far nearly convinced we should use the simple first option. |
Beta Was this translation helpful? Give feedback.
-
No objection to that, but there’s at least two things that are at least worth discussing with the simple option. (A) It does not guarantee the order of items. Maybe it’s a big deal, maybe not. I personally don’t think it is, but this may run afoul of some people’s expectations. For example, some people may expect that there is a correlation between the list of (I personally think they should not assume that, and in fact the spec should explicitly and unambiguously state that the two lists are completely unrelated.) People may also assume things such as, the first author is the most important one, the author that did most of the work. This again assumes that there is an order in lists. (Again, I think people should not assume things like that, but I also recognise that this is not, at first sight, an unreasonable assumption to make.) (B) RDF lists are closed. There is an explicit list terminator ( Again, maybe this is a big deal, maybe not, but I think this is an important semantic difference that warrants at least a discussion. |
Beta Was this translation helpful? Give feedback.
-
All good points. I am a bit concerned if we can impose a sort order at all if we rely on standard serialisers? Like if we take SSSOMs:
Can something like this be done with an out of the box RDF writer like say Jenas or rdflib, or do we have to implement our own serialisation procedure? |
Beta Was this translation helpful? Give feedback.
-
Not if we stick to the “simple” representation of lists. If we want to prescribe an order, then we kinda must use RDF collections. If we stick to the simple representation, then even if I wrote my own Turtle serialiser in SSSOM-Java to force the list items to be written in a given order, there would be no guarantee that whichever program reads the set would preserve the order. Unless you are somehow suggesting that we invent our own variant of Turtle that is almost identical to standard Turtle, except that the orders in which the triples are listed is significant and must be preserved. (I know you are not suggesting that! right?) Note, however, that this:
comes from the spec of the SSSOM/TSV format. This is not relevant for the SSSOM/RDF serialisation. And more precisely, it comes from the definition of the canonical SSSOM/TSV format, whose only purpose is to reduce spurious diffs due to serialisation differences. It does not mean that the order of mappings is meaningful in any way. |
Beta Was this translation helpful? Give feedback.
-
To clarify: I am not suggesting that we should absolutely revise the way lists are represented in the (existing, undocumented) RDF output. I am perfectly happy with the “simple” representation we currently use, and in fact I would be quite in favour of keeping it if only for the simple reason that this is what SSSOM-Py has always produced – I see no reason for changing that now, unless we decide that the shortcomings of that representation (absence of inherent order, “open-endedness”) are somehow problematic. I personally don’t think those shortcomings are problematic, I just want to be sure that the different options have been considered – and then that the representation that is ultimately chosen is properly documented. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
-
I thought you wanted to turn this issue into a discussion? Anyway, from what I can tell here’s what need to happen before we can get a spec for RDF serialisation:
|
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
For what it’s worth, for all the above questions, I propose:
|
Beta Was this translation helpful? Give feedback.
-
As mentioned somewhere in a ticket, we should review all properties that are currently associated with slots (and that are therefore used to represent the slots in RDF) and check that we are fine with them.
Likewise, are we happy with the fact that a
|
Beta Was this translation helpful? Give feedback.
-
From @nichtich
|
Beta Was this translation helpful? Give feedback.
-
First draft of the RDF spec: RDF serialisation of SSSOMRDF formatsThis specification defines how a SSSOM mapping set can be converted from or to a RDF model. The RDF model that represents a SSSOM mapping set is independent of the concrete format that may be used to serialise the model (Turtle, N-Triples, N3, RDF/XML, etc.). Implementations SHOULD support reading and writing a SSSOM set from and to the Turtle serialisation format. They MAY support any other RDF serialisation formats. Serialisation of slotsThe basic principle is that a slot on any given SSSOM object (either a
Serialisation of slot valuesThe following rules determine how the value of a slot is rendered as the object of a RDF triple. If the slot is typed as an Entity ReferenceThe value is rendered as a named RDF resource (IRI). If the slot is typed as a URI??? Proposition A ??? Proposition B When reading from RDF, implementations MAY also accept a value rendered as a named RDF resource. If the slot is typed as a dateThe value is rendered as a If the slot is typed as an enumerationIf the permissible values for the enumeration are defined in the LinkML model as having an associated Examples: ?mapping sssom:subject_type owl:Class .
?mapping sssom:mapping_cardinality "1:1" . If the slot is multi-valuedAs an exception to the general principle that slots are rendered by a single RDF triple, multi-valued slots are rendered by as many triples as there are values, each value being the object of one triple. For example, for a mapping whose ?mapping sssom:creator_label "Alice", "Bob" . RDF complex structures such as Serialisation of a Mapping objectThe RDF type of a If the Mapping object has a Serialisation of a MappingSet objectThe RDF type of a A MappingSet object is normally represented by a named RDF resource corresponding to the value of the When reading, implementations MAY support a MappingSet represented by a blank node, corresponding to the case where the MappingSet does not have a The mappings contained within the set are represented by a series of triples (one per mapping) where:
The Importantly, when named resources and predicates are serialised as CURIEs, then all used prefix names MUST be declared using the appropriate mechanism for the serialisation format, even prefix names that are considered built-in in the context of SSSOM (such as |
Beta Was this translation helpful? Give feedback.
-
Aside from the decision to make SSSOM flat, I am a little conflicted now on modelling the core mapping triple using OWL reification in the RDF serialisation. Since we do need the OWL serialisation separately, I wonder if:
I know this is a very bad thing to propose after SSSOM 1.0 but well, I wanted at least to say it out loud. |
Beta Was this translation helpful? Give feedback.
-
It just occurred to me that we have not considered the case of literal mappings – mappings where the subject (respectively the object) is represented, not by a semantic identifier, but by a string literal in Given the absence of an identifier for the subject and/or object, such mappings obviously cannot be represented as: ?mapping a owl:Axiom ;
owl:annotatedSource ?subject_id ;
owl:annotatedProperty ?predicate_id ;
owl:annotatedTarget ?object_id . So what do we do with those? I can think of several options. In all the following, we consider the following literal mapping (prefix declarations omitted for brevity):
A. No literal mappings in RDF That would be unfortunate but it doesn’t seem unreasonable to me. After all, if the mapping is not between entities that can be represented as RDF resources, then maybe there is no point in having the mapping in RDF form? B. Use ?mapping a owl:Axiom ;
owl:annotatedSource "alice"^^xsd:string ; # <- literal instead of resource
owl:annotatedProperty skos:exactMatch ;
owl:annotatedTarget EX:0001 ;
sssom:subject_type rdfs:literal . I think that strictly speaking this should be valid, in that AFAIK it is not explicitly mandated anywhere that the object of a triple with a Even if it is a legal construct strictly speaking, I am worried that RDF tools might barf on something like this – or even that RDF folks might want to put a contract on my head for even making such a suggestion. C. Just omit the ?mapping a owl:Axiom ;
owl:annotatedProperty skos:exactMatch ;
owl:annotatedTarget EX:0001 ;
sssom:subject_label "alice"^^xsd:string ;
sssom:subject_type rdfs:literal . (Of note, this is what SSSOM-Java is currently producing – not by explicit decision, just as a logical consequence of the way the RDF serialiser works.) This should be somewhat fine, but I wonder if the absence of the D. Use a blank node ?mapping a owl:Axiom ;
owl:annotatedSource [] ; # <- blank node standing for the subject
owl:annotatedProperty skos:exactMatch ;
owl:annotatedTarget EX:0001 ;
sssom:subject_label "alice"^^xsd:string ;
sssom:subject_type rdfs:literal . Not convinced it is much better. E. Inject a fabricated ID ?mapping a owl:Axiom ;
owl:annotatedSource EX:qsdfjklm ; # randomly generated ID
owl:annotatedProperty skos:exactMatch ;
owl:annotatedTarget EX:0001 ;
sssom:subject_label "alice"^^xsd:string ;
sssom:subject_type rdfs:literal . For now, I am inclined towards E+C. More precisely: it would be recommended that users of literal mappings generate pseudo-IDs prior to converting to RDF (E); if they don’t do it, and the RDF serialiser is given literal mappings without pseudo-IDs, then it should just omit the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If the RDF format produced/consumed by
sssom-py
is supposed to be usable at the same level as the SSSOM/TSV format, it needs to be specified.Having to infer the serialisation rules by looking at the output of the “reference implementation“ (which is what I had to do to implement RDF support in SSSOM-Java) is not acceptable for something that pretends to be a standard.
Same thing for the OWL/RDF serialisation, which is only specified (poorly) for the individual mappings, not for mapping set objects.
Beta Was this translation helpful? Give feedback.
All reactions