-
Notifications
You must be signed in to change notification settings - Fork 9
Converters refactor [utils]: Base utility improvements and updates to for refactor enablement #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📦 Build Artifacts Available |
4403a77
to
d7ec38f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the registry pattern from ClassRegistryMixin
to a generic RegistryMixin
, introduces comprehensive type annotations, and improves API consistency. It lays the foundation for a more extensible class-based converter architecture by providing better type safety, cleaner and expanded APIs, and enhanced auto-discovery capabilities.
Key changes include:
- Renamed
ClassRegistryMixin
toRegistryMixin
with generic type support - Enhanced registry methods with
registered_objects()
,is_registered()
, andget_registered_object()
- Improved polymorphic validation with automatic schema reloading in Pydantic utilities
- Updated type annotations using
from __future__ import annotations
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/unit/utils/test_registry.py | Comprehensive test suite for refactored RegistryMixin with new methods and type validation |
tests/unit/utils/test_pydantic_utils.py | Updated tests for PydanticClassRegistryMixin with enhanced type safety and generic support |
tests/unit/utils/test_auto_importer.py | Restructured tests for AutoImporterMixin with improved fixtures and edge case coverage |
tests/unit/test_model.py | Minor type comment additions for registry attribute access |
tests/unit/models/test_eagle_model.py | Minor type comment additions for registry attribute access |
tests/unit/models/test_eagle_config.py | Minor type comment additions for registry attribute access |
src/speculators/utils/registry.py | Core refactor from ClassRegistryMixin to generic RegistryMixin with enhanced API |
src/speculators/utils/pydantic_utils.py | Enhanced Pydantic integration with generic type support and schema reloading |
src/speculators/utils/auto_importer.py | Streamlined documentation and improved type annotations |
src/speculators/utils/init.py | Updated imports to reflect registry class rename |
src/speculators/model.py | Updated to use new RegistryMixin with generic type parameter |
src/speculators/convert/eagle/init.py | Minor documentation update |
pyproject.toml | Added eval_type_backport dependency for type evaluation support |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
99e7ff3
to
4f47c07
Compare
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
…tion happens Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
4f47c07
to
c67c12f
Compare
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Could you explain the use case for This seems fairly complicating to iterate through all of descendants of |
@fynnsu (cc @shanjiaz), here’s the rationale behind expanding the TL;DR
The high-level requirement is that we need to represent polymorphic types for the instance types/fields that extend our base classes and interfaces. These base classes/interfaces define a standard format in Speculators and ensure consistent, serializable/deserializable objects. We rely on Pydantic’s 1. Naive ImplementationThe simplest approach explicitly references classes. To make Pydantic handle custom reloads, we use generics to tell it which class is involved. This forces prior knowledge of what’s being constructed and requires referencing the specific types everywhere. It works, but only recovers fields when typed explicitly, which makes it brittle for real use. from typing import Generic, TypeVar
from pydantic import BaseModel
class BaseExample(BaseModel):
shared_field: str = ""
class NestedExample(BaseExample):
extra_field: int = 1
NestedT = TypeVar("NestedT", bound=BaseExample)
class ParentExample(BaseModel, Generic[NestedT]):
nested: NestedT
def test_marshalling():
nested = NestedExample()
parent = ParentExample(nested=nested)
parent_dict = parent.model_dump()
parent_rec_notype = ParentExample.model_validate(parent_dict)
print(parent_rec_notype.nested) # only BaseExample fields
parent_rec_typed = ParentExample[NestedExample].model_validate(parent_dict)
print(parent_rec_typed.nested) # includes NestedExample fields
if __name__ == "__main__":
test_marshalling() 2. Registry + Discriminated UnionsThe naive case fails because:
We can solve this with a registry pattern plus Pydantic’s discriminated unions. The registry enables runtime class lookup and construction. But discriminated unions are statically defined meaning external libraries, contrib packages, or prototypes can’t extend them without editing the source to add the new type and developers need to know about and update all container fields that add the polymorphic class. Which are hard limitations. from typing import Literal, Union
from pydantic import BaseModel, Field, ValidationError
from speculators.utils import RegistryMixin
class BaseExample(BaseModel, RegistryMixin):
@classmethod
def create(cls, type_: str, **kwargs) -> "BaseExample":
example_class = cls.registry[type_]
return example_class(**kwargs)
type_: Literal["base"] = "base"
shared_field: str = ""
@BaseExample.register()
class NestedExample(BaseExample):
type_: Literal["nested_internal"] = "nested_internal"
extra_field: int = 1
class ParentExample(BaseModel):
nested: Union[NestedExample, BaseExample] = Field(discriminator="type_")
@BaseExample.register()
class ExternalNestedExample(BaseExample):
type_: Literal["nested_external"] = "nested_external"
different_field: bool = True
def test_marshalling():
nested = BaseExample.create(type_="NestedExample")
print(type(nested)) # NestedExample
parent = ParentExample(nested=nested)
parent_dict = parent.model_dump()
parent_rec = ParentExample.model_validate(parent_dict)
print(parent_rec.nested) # NestedExample fields
external = BaseExample.create(type_="ExternalNestedExample")
print(type(external)) # ExternalNestedExample
try:
ParentExample(nested=external)
except ValidationError as error:
print(error) # fails, union not updated at runtime
if __name__ == "__main__":
test_marshalling() 3. Automatic Discriminated UnionsThe registry + discriminated unions case fails when:
To unlock external extension and remove risks of circular imports, we need a dynamic discriminated union. This can be achieved by:
This is implemented in Speculators via the The catch: Pydantic caches schemas. Unless we manually call from typing import ClassVar, Literal
from pydantic import BaseModel, Field, ValidationError
from speculators.utils import PydanticClassRegistryMixin
# Note, need to replace ReloadableBaseModule for BaseModule in
# PydanticClassRegistryMixin, otherwise it will run the later auto pathway
class BaseExample(PydanticClassRegistryMixin):
@classmethod
def create(cls, type_: str, **kwargs) -> "BaseExample":
example_class = cls.registry[type_]
return example_class(**kwargs)
@classmethod
def __pydantic_schema_base_type__(cls) -> type["BaseExample"]:
if cls.__name__ == "BaseExample":
return cls
return BaseExample
schema_discriminator: ClassVar[str] = "type_"
type_: Literal["base"] = "base"
shared_field: str = ""
@BaseExample.register("nested_internal")
class NestedExample(BaseExample):
type_: Literal["nested_internal"] = "nested_internal"
extra_field: int = 1
class ParentExample(BaseModel):
nested: BaseExample = Field()
@BaseExample.register("nested_external")
class ExternalNestedExample(BaseExample):
type_: Literal["nested_external"] = "nested_external"
different_field: bool = True
def test_marshalling():
nested = BaseExample.create(type_="nested_internal")
parent = ParentExample(nested=nested)
parent_dict = parent.model_dump()
parent_rec = ParentExample.model_validate(parent_dict)
print(parent_rec.nested) # NestedExample fields
external = BaseExample.create(type_="nested_external")
try:
ParentExample(nested=external)
except ValidationError:
print("Validation fails without reload")
# Fix with reload
ParentExample.model_rebuild(force=True)
parent_ext = ParentExample(nested=external)
parent_ext_rec = ParentExample.model_validate(parent_ext.model_dump())
print(parent_ext_rec.nested) # ExternalNestedExample fields
if __name__ == "__main__":
test_marshalling() 4. Automatic Reloadable Discriminated UnionsThe above approach requires calling The from typing import ClassVar, Literal
from pydantic import BaseModel, Field
from speculators.utils import PydanticClassRegistryMixin
class BaseExample(PydanticClassRegistryMixin):
@classmethod
def create(cls, type_: str, **kwargs) -> "BaseExample":
example_class = cls.registry[type_]
return example_class(**kwargs)
@classmethod
def __pydantic_schema_base_type__(cls) -> type["BaseExample"]:
if cls.__name__ == "BaseExample":
return cls
return BaseExample
schema_discriminator: ClassVar[str] = "type_"
type_: Literal["base"] = "base"
shared_field: str = ""
@BaseExample.register("nested_internal")
class NestedExample(BaseExample):
type_: Literal["nested_internal"] = "nested_internal"
extra_field: int = 1
class ParentExample(BaseModel):
nested: BaseExample = Field()
@BaseExample.register("nested_external")
class ExternalNestedExample(BaseExample):
type_: Literal["nested_external"] = "nested_external"
different_field: bool = True
def test_marshalling():
nested = BaseExample.create(type_="nested_internal")
parent = ParentExample(nested=nested)
print(parent.nested) # NestedExample fields
external = BaseExample.create(type_="nested_external")
parent_ext = ParentExample(nested=external)
print(parent_ext.nested) # ExternalNestedExample fields
if __name__ == "__main__":
test_marshalling() RisksThe reload logic is non-trivial since it traverses nested relationships, inspects Pydantic fields, and recursively reloads. With strong tests, though, risk is limited to one utility class that is a core extension only over the top of Python and Pydantic depending on core, stable APIs:
These are mature, stable APIs that rarely change except on major library updates. Potential ImprovementsRight now we traverse all |
@markurtz Thank you for the detailed walkthrough, I feel like I've learned a lot about pydantic just reading this! You're explanation makes sense to me, but I'm still wondering if there's a better way to solve this. Below I've attached my attempt at this. I think it correctly handles dynamic registration and serialization/deserialization without blocking external model or requiring overly complex type annotations. Please let me know what you think! (Also for simplicity I just created a simple registration system but we could probably get this to work with the from collections.abc import Mapping
from typing import Annotated, Literal
from pydantic import BaseModel, BeforeValidator, SerializeAsAny
registry = {}
def register(type_):
def decorator(cls):
registry[type_] = cls
return cls
return decorator
@register("base")
class BaseExample(BaseModel):
type_: Literal["base"] = "base"
shared_field: str = ""
@classmethod
def create(cls, type_: str, **kwargs) -> "BaseExample":
example_class = registry[type_]
return example_class(**kwargs)
@register("nested_internal")
class NestedExample(BaseExample):
type_: Literal["nested_internal"] = "nested_internal"
extra_field: int = 1
def validate_subclass(obj):
type_ = obj["type_"] if isinstance(obj, Mapping) else obj.type_
return registry[type_].model_validate(obj)
class ParentExample(BaseModel):
nested: Annotated[SerializeAsAny[BaseExample], BeforeValidator(validate_subclass)] # here
@register("nested_external")
class ExternalNestedExample(BaseExample):
type_: Literal["nested_external"] = "nested_external"
different_field: bool = True
def test_marshalling():
nested = BaseExample.create(type_="nested_internal")
parent = ParentExample(nested=nested)
print(parent.nested) # NestedExample fields
external = BaseExample.create(type_="nested_external")
parent_ext = ParentExample(nested=external)
print(parent_ext.nested) # ExternalNestedExample fields
parent_ext_rec = ParentExample.model_validate(parent_ext.model_dump())
print(parent_ext_rec.nested) # ExternalNestedExample fields
if __name__ == "__main__":
test_marshalling() In the line marked
|
…yMixin from review Signed-off-by: Mark Kurtz <[email protected]>
Adding in here as well, @fynnsu, it is a simpler implementation, but with that, we remove several core things that will lead to unexpected bugs, performance issues, and general docs/understanding issues. The core breaks down to the following:
I have gone through and pushed an update to build the dependency tree and target only the chains that would be affected. Please take a look through. |
Signed-off-by: Mark Kurtz <[email protected]>
Summary
Refactors the registry pattern from
ClassRegistryMixin
to a genericRegistryMixin
, introduces comprehensive type annotations, and improves API consistency. Lays the foundation for a more extensible class-based converter architecture by providing better type safety, cleaner and expanded APIs, and enhanced auto-discovery capabilities.Details
eval_type_backport
dependency for better type evaluation supportfrom __future__ import annotations
ClassRegistryMixin
toRegistryMixin
with generic type supportRegistryMixin[RegistryObjT]
RegistryObjT
,RegisterT
,BaseModelT
,RegisterClassT
registered_objects()
,is_registered()
, andget_registered_object()
PydanticClassRegistryMixin
to use generic types:PydanticClassRegistryMixin[BaseModelT]
registered_classes()
method for better Pydantic-specific functionalityTest Plan
Related Issues
N/A