Skip to content

Conversation

@gouttegd
Copy link
Contributor

This PR implements option ”A” mentioned in #628. That is, it replaces the LinkML-based code for serialising/deserialising to/from RDF by a completely custom code that implements the new SSSOM/RDF spec exactly as we want it.

All the new code is contained within the new sssom.rdf_internal module, which as the name implies is not really supposed to be used outside of SSSOM-Py (though of course people can still do whatever they want!). The public interface to read from RDF is still the from_sssom_rdf method in sssom.parsers, and the public interface to write to RDF is still the to_rdf_graph method in sssom.writers. Both methods are rewritten to use the MappingSetRDFConverter class of the sssom.rdf_internal module, instead of methods from the LinkML runtime.

closes #628

Since the normalised variant of SSSOM/RDF does not match the output
produced (respectively expected) by the LinkML runtime's dumper (resp.
loader), we implement our own serialisation/deserialisation routines for
RDF.

All the code for RDF import/export is contained within the
sssom.rdf_internal module. The main class expected to be used outside of
that module is the MappingSetRDFConverter class.

The `from_sssom_rdf` and `to_rdf_graph` methods in the sssom.parsers and
sssom.writers module are rewritten to use the new converter.
The new RDF converter revealed several issues with our existing tests,
which are fixed here.

* test_convert.py (test_to_rdf_hydrated): That test used an incorrect
  "NOT" value for a predicate modifier slot (the correct value is "Not",
  and there is nothing in the spec to suggest it is case-insensitive);
  it also forgot to declare some prefixes;

* test_writers.py (test_rdflib_endpoint): That test used
  "sssom:MappingCuration" as a column name in a MappingSetDataFrame,
  instead of the expected "mapping_justification"; it also forgot to
  declare some prefixes.
The `override` attribute is not available in the `typing` module in
Python 3.11 and lower, it needs to be obtained from `typing_extensions`
instead.
@gouttegd
Copy link
Contributor Author

Still missing (and the reason why the PR is a draft):

  • More tests (the PR passes the test suite at least on my machine, but I have very little trust in the existing tests).
  • Support for extension slots (this should be completely possible since we are completely not using the LinkML-derived classes).

@gouttegd gouttegd self-assigned this Oct 21, 2025
gouttegd and others added 4 commits October 21, 2025 17:04
Amend most value converters to exploit RDFLib's typing features whenever
possible.

Overhaul the EnumValueConverter so that it only uses the EnumDefinition
from the LinkML schema, without ever using the actual enum
implementation generated from the schema. This notably ensures that
parsed enum values are typed as strings, not as something like
`PermissibleValueText` or whatever.
Add a helper object to facilitate the creation of value converters and
ensure we only create the value converters we need for a given type of
object, based on the slot definitions for that object.
Introduce a `CurieConverterProvider` interface so that we can construct
all value converters even before we get the CURIE converter that the
EntityReferenceValueConverter will need to convert EntityReference-typed
value.

This is the first step towards making it possible to create only one
instance of the MappingSetRDFConverter class (maybe even making it a
Singleton) that can be used to serialize/deserialize many different MSDS
objects -- currently, the class can only really be used once, because it
expects to get the CURIE converter at construction time, and in most
cases the CURIE converter will be specific to only one MSDF or one RDF
graph.
Overhaul the ObjectConverter class and its subclasses so that a single
instance of the MappingSetRDFConverter class can be used to convert many
different MSDF objects (in any direction: RDF to MSDF or MSDF to RDF),
instead of having to create one instance for each conversion.
Add a couple of tests to check the behaviour of the new RDF parser.
Also fix an issue (revealed by one of those tests) with the RECORD_ID
slot, which must be compressed upon parsing since it is an
EntityReference-typed slot.
@gouttegd
Copy link
Contributor Author

Test failure is due to the fact that the property associated to the mapping_date slot has changed since the 1.1.0a2 release of the SSSOM schema (as part of the new SSSOM/RDF specification, mapping_date went from being associated with pav:authoredOn to being associated with dcterms:created).

This will be automatically fixed (and the test suite will pass) once we will have yet another release of the SSSOM schema and SSSOM-Py will have been updated to use the new version.

@gouttegd
Copy link
Contributor Author

@matentzn ⬆️ Can we get a 1.1.0a3 release for the SSSOM schema?

@matentzn
Copy link
Collaborator

Can we get a 1.1.0a3 release for the SSSOM schema?

https://github.com/mapping-commons/sssom/releases/tag/v1.1.0a3

@gouttegd
Copy link
Contributor Author

Thanks!

Use the latest pre-release of the SSSOM schema, which includes the
latest draft of the SSSOM/RDF specification.
@gouttegd gouttegd force-pushed the custom-rdf-converter branch 2 times, most recently from 3a063ec to 49846cc Compare October 28, 2025 23:09
@gouttegd gouttegd force-pushed the custom-rdf-converter branch from 49846cc to ace9ef1 Compare October 28, 2025 23:17
@gouttegd
Copy link
Contributor Author

The last failure is seemingly related to the latest Scipy release (1.16.3), which happened today. Looks like that version cannot work with NumPy <1.24, which we require to test with Pandas<2.0.0.

It seems that the very latest release of Scipy (1.16.3) cannot work with
NumPy<1.24 (apparently Scipy tries to import a `numpy.exceptions` module
that it cannot found; presumably that module had been added after NumPy
1.24, though I did not check).

So when we test that SSSOM-Py can still work with Pandas<2.0.0, we need
to cap Scipy to <1.16.3 in addition to capping NumPy to <1.24.
@gouttegd gouttegd marked this pull request as ready for review October 29, 2025 00:46
@gouttegd
Copy link
Contributor Author

Support for extension slots is still missing, but as the rest of SSSOM-Py does not support them either, this is not really important – and I don’t want to start working on extension slots until there has been a discussion on how to support them in all of SSSOM-Py, not just in the RDF parser/writer.

@gouttegd
Copy link
Contributor Author

Superseded by #635

@gouttegd gouttegd closed this Oct 29, 2025
@gouttegd gouttegd deleted the custom-rdf-converter branch October 29, 2025 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for “new” RDF serialisation

2 participants