-
Notifications
You must be signed in to change notification settings - Fork 228
Metadata FBC3 #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Metadata FBC3 #1440
Conversation
…t cases, improved implementation. Almost all tests are still passing, except some that seem to be caused by lacking implementations in libsbml.
…c-20304). Only remaining failing test is missing implementation of saving nested annotations of KeyValuePairs (fbc-21502).
|
Fantastic, thank you for picking up the work. I'll need to reacquaint myself with the changes as it has been too long ago, but I will try to review soon so we can finally publish this work. |
|
Thanks! |
…RLs and compact urls with integrated namespace) should all work correctly now. Changed Resource equality to namespace+identifier equality, instead of URI equality. Added tests for these cases.
…identifiers.org identifiers as URI. Added the appropriate tests for that. Made hashing method consistent with equality method.
…or standardized.py.
|
For reference, I fixed/started fixing the following:
I opened an issue in libsbml and got a response (sbmlteam/libsbml#429).
Looking at this issue in the libsml repo, it should not be an issue: sbmlteam/libsbml#360 At least from a technical point of view, mixing FBC3 with L3 v2 Core should be fine.
Added some, will add more.
I took a look at the Resource/identifiers.org logic and found out that the problem is slightly more complex. There are two types of urls, an old variant: Another issue with the two types of URLs is that their patterns overlap. An identifier can have a colon (:), which is the case for biocyc. The biocyc identifiers.org url There are two instances where the compact identifier does not 1:1 match the But all in all, the new identifiers.org interpretation function should handle almost all cases correctly. In other cases, you can always set
This is fixed now. |
…ons to speed up copying of models.
…s not being written to SBML. Updated tests. Added xfail test for nested kvps.
…n adapted accordingly and some tests are marked for skipping when run with python 3.8.
|
The latest commits addressed the following issues:
The new metadata implementation still makes cobrapy slower, which is expected, since extra complexity is added to each object. The comparison between the The worst performance hit still happens for copying-related benchmarks, maybe that can be optimized further. The highest increase in runtime happens for |
…overhead. Metadata should not have any recurring (identical) objects, so those checks are not necessary.
|
This is shaping up very nicely. Thanks!
Not an issue from my side. It's not a massive change and we are talking about a fairly fast operation already (~1s). Does it affect SBML parsing as well? I think we don't have a benchmark for this but it might be good to know how long it takes to read Recon3 for example with the old and new version. The review might take a while because so many files got changed, but the following would help me at least:
I am personally really excited for the custom annotations because that would really fix some pain points we have in MICOM. |
|
@cdiener Thank you, great to hear!
Yes, it will. I would estimate that for files with a lot of annotations it will be in the same range (let's say >2x slower). I will look at adding a benchmark for this. Optionally, we could add a keyword argument (e.g.
Yeah, that's understandable.
I added a jupyter notebook with an example workflow to the documentation, so that this is a chapter/page of the docs. I will look at that again and provide some more scripts so that you can play around with the new metadata system.
It should be 99% backwards compatible (as in: old scripts should almost always still work). The The new save formats (SBML l3 v2 core + fbc3 and json schema 2) can of course not be read reliably by older cobrapy versions.
Great! I am still a proponent of renaming I saw you ran the CI/CD again and it failed for some environments. The failing tests are all |
|
I had to focus on some other work, but will add some example workflows soon.
I added loading performance benchmarks for RECON and RECON3D. The changes in this branch caused a ±1.5x slowdown, which isn't bad compared to the model copy operation and better than I expected. @cdiener Do you want me to include these tests/benchmarks in this pull request, or start a new one? In a separate pull request, the benchmarks can be added first, before this pull request gets merged. This may make it easier to reproduce the comparison I did. |
|
Hi, before sounds good. Thanks. I also started reviewing this now. Sorry for the delay. |
| .idea/**/dynamic.xml | ||
| .idea/**/uiDesigner.xml | ||
| .idea/**/dbnavigator.xml | ||
| .idea/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some files in the .idea folder that need to be removed from the PR.
| import sys | ||
| from os.path import dirname, join | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently fixed the docs so this will need to be rebased there.
| str | ||
| """ | ||
| return ( | ||
| f"{self.__class__.__module__}.{self.__class__.__qualname__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just the class name to make it more compact in the repr (also for html)?
| if resource.namespace is not None and resource.identifier is not None | ||
| ] | ||
|
|
||
| def to_records(self) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat. I would also add a to_frame method like shown in the docs directly because I think this will be useful to many (pandas is already a req anyway).
| str | ||
| """ | ||
| return ( | ||
| f"{self.__class__.__module__}.{self.__class__.__qualname__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just the class name because the module path is quite long.
|
|
||
| def clear(self) -> None: | ||
| """Remove all annotations.""" | ||
| # Could probably handle this better by directly removing all standardized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something you still want to implement?
| entry.remove_from_parent() | ||
| return | ||
| raise ValueError(f"No annotation found for '{value}'") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A __repr__ would be nice too that shows the dict because that was the previous behavior.
| """Parse cobra annotations from a given SBase object. | ||
| Annotations are dictionaries with the providers as keys. | ||
| The annotation format has been changed. We no longer have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this in a Note block.
| skip_install = True | ||
| deps= | ||
| isort ==6.0.1 | ||
| isort >=5.13.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for that?
| skip_install = True | ||
| deps= | ||
| black ==25.1.0 | ||
| black ==23.12.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. What a great PR in general, extremely clean and well documented. Some small comments on things but it looks pretty good. I would not worry about the speed issues for now. This can also be optimized in later PRs.
Updated version of the SBML FBC version 3 implementation #1237 by @akaviaLab and earlier #988 by @Hemant27031999.
Implements:
metadataattribute.annotationattribute. This interface behaves like a dictionary for backward compatibility and allows access to an ObjectsStandardizedAnnotations.For reference, the FBC3 specification can be found here: https://github.com/sbmlteam/sbml-specifications/blob/develop/sbml-level-3/version-1/fbc/spec/sbml-fbc-version-3-release-1.pdf
Major differences with the two previous pull requests:
CVTermhas been renamed toStandardizedAnnotationandKeyValuePairtoCustomAnnotation.StandardizeAnnotationStoreandCustomAnnotationStore, respectively. The annotations keep track of their parent and can be removed usingannotation.remove_from_parent(), similar to how metabolites etc. do this. This is also implemented forResources with respect to their parentStandardizedAnnotation. Theres is also aStandardizedAnnotationList, which is basically the same class asStandardizedAnnotationStore, but it does not change the ownership/parent-child relationship. So this clas can be used as a return type for a selection of existing annotations (a bit like a view).StandardizedAnnotationhas been simplified a bit. Previously, it would contain a qualifier and a list ofExternalResources, which could contain resources and nested annotations. The external resources class has now been removed andStandardizedAnnotations directly contain resources and nested annotations. AResourceis now its own class that handles interepreting URIs.CustomAnnotations (key-value pairs) do not prominently feature anameandidanymore, since I think they make them more confusion. They can still be set, but they are not part of the__init__method etc. ACustomResourcenow also inherits from theObjectclass, so it can have its own metadata. The FBC3 specification is a bit vague here, since it is not explicitly mentioned thatKeyValuePairinherits fromSBase, but it can have a name and id,fbc-21501mentiones that it can have ansboTermandfbc-21502mentions that it can have notes and annotations/metadata, which is bascially what is implemented in the cobrapyObjectclass.Qualifiernames are now also more readable (e.g.Biological_isvs.bqb_is,Modelling_isDerivedFromvsbqm_isDerivedFrom).metadataattribute, instead ofannotation. Theannotationattribute is now a class instance that bascially is a dict-like view of themetadata.standardizedstandardized annotations (SBML CVTerms).to_recordsmethod, of which the output can directly be used in pandas (demonstrated in the metadata jupyter notebook).annotationinterface), this can probably be expanded/improved.Outstanding issues:
CustomAnnotation). As mentioned above, it is explicitly stated thatKeyValuePair, like anSBaseinstance, can have annotations. Reading files that have this does workm, but saving does not, since this was not implemented inlibsbml. I will open an issue in the libsbml repo addressing this.Smaller issues:
CustomAnnotation. An example could be the value offunctionalfor a gene, as proposed in issue [BUG] Gene.functional attribute not saved in SBML model #1422 . One could think of storing its value udner the key 'cobra_gene_functional' for example. Questions would be: should these values still be available undermetadata.custom? And what happens when that value changes? Should it have a way of storing the datatype (e.g. 'bool:true') and should that be implemented as part of theCustomAnnotationclass? This is all not really urgent, but if some changes are needed to the custom annotation interface, we might want to make them now.Resourceclass name, since it's a bit vague, but I wasn't sure what else to name it. In previous discussions it was also mentioned thatAuthormay be a better name thanCreator. TheStandardizedAnnotationandCustomAnnotationclass names are quite long, so we could try to rename those too. MaybeAnnotationinstead ofStandardizedAnnotationandProperty/Symbol/Attribute/Parameter/Flag/... instead ofCustomAnnotation.Fixes issues:
#810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc.
#684: The complete metadata structure has been redesigned. A compatibility interface remains in the
annotationattribute of each object, whilst all structured metadata can be accessed through themetadataattribute. I've split this up, since changing values through the old interface may change the structure/hierarchy or qualifiers. By having a separateannotationandmetadataattribute, it is ensured that changing anything inmetadatadoes not have a destructive effect. Additionally, old code and old file formats (e.g. json schema v1) can just write to and read from theannotationinterface as if nothing has changed, while new code and formats (e.g. json schema v2) use themetadataattribute.#937: As mentioned above, using the
metadatainterface should not lead to loss of any relation information.Let me know what you think of the changes. In the meantime, I will try to fix some of the outstanding issues.