Skip to content

SSSOM-Py inconsistency with types #633

@gouttegd

Description

@gouttegd

(This is somewhat related to #630, but is a much larger problem.)

SSSOM-Py is woefully inconsistent about the type of the values one may found in a MappingSetDataFrame.

Depending on whether the slot is in the “metadata” part or the “date frame” part of the MSDF, and depending on whether the MSDF was parsed from a TSV file, a JSON file, or a RDF file, the same value can have completely different types, as shown in the following table:

Slot type TSV (metadata) TSV (data frame) JSON (metadata) JSON (data frame) RDF (metadata) RDF (data frame)
date datetime.date str str str str str
enum (e.g. entity type) str PermissibleValueText EntityTypeEnum PermissibleValueText EntityTypeEnum PermissibleValueText
entity reference str EntityReference EntityReference EntityReference EntityReference EntityReference

Admittedly the entity reference case is a minor issue, because at least EntityReference is a subtype of str and the two types are therefore compatible. In particular, one can directly compare a str-typed entity reference value and a EntityReference-typed value, and get the expected result:

>>> from sssom_schema import EntityReference
>>> er_as_str = "FBbt:1234"
>>> er_as_EntityReference = EntityReference("FBbt:1234")
>>> er_as_str == er_as_EntityReference
True

So that’s one is merely a small annoyance, fine.

But the other two cases are much more problematic.

The “date” case should be obvious. If I have a date-typed slot, I expect to be able to manipulate its values as date objects. I do not expect to have to call datetime.strptime or similar to turn the value into an actual date. And I expect to be able to compare two “date-typed” values regardless of whether the value comes from the metadata or from the data frame, and regardless of which format the mapping set data frame was parsed from in the first place.

The “enum” case is almost comical. Not only do we have three different types of values (str, EntityTypeEnum – which is the LinkML-generated object normally supposed to represent values of the entity_type_enum –, and PermissibleValueText – which is the generic LinkML type to represent any enum value), but all three types are completely incompatible with each other. If you have three variables representing the same enum value in all three different types, all variables will be considered to be different.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions