-
Notifications
You must be signed in to change notification settings - Fork 14
Use custom code for RDF (de)serialisation #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since the normalised variant of SSSOM/RDF does not match the output produced (respectively expected) by the LinkML runtime's dumper (resp. loader), we implement our own serialisation/deserialisation routines for RDF. All the code for RDF import/export is contained within the sssom.rdf_internal module. The main class expected to be used outside of that module is the MappingSetRDFConverter class. The `from_sssom_rdf` and `to_rdf_graph` methods in the sssom.parsers and sssom.writers module are rewritten to use the new converter.
The new RDF converter revealed several issues with our existing tests, which are fixed here. * test_convert.py (test_to_rdf_hydrated): That test used an incorrect "NOT" value for a predicate modifier slot (the correct value is "Not", and there is nothing in the spec to suggest it is case-insensitive); it also forgot to declare some prefixes; * test_writers.py (test_rdflib_endpoint): That test used "sssom:MappingCuration" as a column name in a MappingSetDataFrame, instead of the expected "mapping_justification"; it also forgot to declare some prefixes.
The `override` attribute is not available in the `typing` module in Python 3.11 and lower, it needs to be obtained from `typing_extensions` instead.
|
Still missing (and the reason why the PR is a draft):
|
Amend most value converters to exploit RDFLib's typing features whenever possible. Overhaul the EnumValueConverter so that it only uses the EnumDefinition from the LinkML schema, without ever using the actual enum implementation generated from the schema. This notably ensures that parsed enum values are typed as strings, not as something like `PermissibleValueText` or whatever.
Add a helper object to facilitate the creation of value converters and ensure we only create the value converters we need for a given type of object, based on the slot definitions for that object.
Introduce a `CurieConverterProvider` interface so that we can construct all value converters even before we get the CURIE converter that the EntityReferenceValueConverter will need to convert EntityReference-typed value. This is the first step towards making it possible to create only one instance of the MappingSetRDFConverter class (maybe even making it a Singleton) that can be used to serialize/deserialize many different MSDS objects -- currently, the class can only really be used once, because it expects to get the CURIE converter at construction time, and in most cases the CURIE converter will be specific to only one MSDF or one RDF graph.
Overhaul the ObjectConverter class and its subclasses so that a single instance of the MappingSetRDFConverter class can be used to convert many different MSDF objects (in any direction: RDF to MSDF or MSDF to RDF), instead of having to create one instance for each conversion.
Add a couple of tests to check the behaviour of the new RDF parser. Also fix an issue (revealed by one of those tests) with the RECORD_ID slot, which must be compressed upon parsing since it is an EntityReference-typed slot.
|
Test failure is due to the fact that the property associated to the This will be automatically fixed (and the test suite will pass) once we will have yet another release of the SSSOM schema and SSSOM-Py will have been updated to use the new version. |
|
@matentzn ⬆️ Can we get a 1.1.0a3 release for the SSSOM schema? |
https://github.com/mapping-commons/sssom/releases/tag/v1.1.0a3 |
|
Thanks! |
Use the latest pre-release of the SSSOM schema, which includes the latest draft of the SSSOM/RDF specification.
3a063ec to
49846cc
Compare
49846cc to
ace9ef1
Compare
|
The last failure is seemingly related to the latest Scipy release (1.16.3), which happened today. Looks like that version cannot work with NumPy <1.24, which we require to test with Pandas<2.0.0. |
It seems that the very latest release of Scipy (1.16.3) cannot work with NumPy<1.24 (apparently Scipy tries to import a `numpy.exceptions` module that it cannot found; presumably that module had been added after NumPy 1.24, though I did not check). So when we test that SSSOM-Py can still work with Pandas<2.0.0, we need to cap Scipy to <1.16.3 in addition to capping NumPy to <1.24.
|
Support for extension slots is still missing, but as the rest of SSSOM-Py does not support them either, this is not really important – and I don’t want to start working on extension slots until there has been a discussion on how to support them in all of SSSOM-Py, not just in the RDF parser/writer. |
|
Superseded by #635 |
This PR implements option ”A” mentioned in #628. That is, it replaces the LinkML-based code for serialising/deserialising to/from RDF by a completely custom code that implements the new SSSOM/RDF spec exactly as we want it.
All the new code is contained within the new
sssom.rdf_internalmodule, which as the name implies is not really supposed to be used outside of SSSOM-Py (though of course people can still do whatever they want!). The public interface to read from RDF is still thefrom_sssom_rdfmethod insssom.parsers, and the public interface to write to RDF is still theto_rdf_graphmethod insssom.writers. Both methods are rewritten to use theMappingSetRDFConverterclass of thesssom.rdf_internalmodule, instead of methods from the LinkML runtime.closes #628