Skip to content

Conversation

cheesecake100201
Copy link
Contributor

@cheesecake100201 cheesecake100201 commented Aug 12, 2025

What does this PR do?

Adds Default string to HuggingfacePostTrainingConfig as this was not letting llama stack server spin up for older builds (Not tried for newer ones, tried for starter-run.yaml that I had created before)

Error Received
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/server/server.py", line 625, in <module> main() File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/server/server.py", line 438, in main impls = loop.run_until_complete(construct_stack(config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/stack.py", line 322, in construct_stack impls = await resolve_impls( ^^^^^^^^^^^^^^^^^^^^ File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/resolver.py", line 168, in resolve_impls return await instantiate_providers(sorted_providers, router_apis, dist_registry, run_config, policy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/resolver.py", line 294, in instantiate_providers impl = await instantiate_provider(provider, deps, inner_impls, dist_registry, run_config, policy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/llama_stack/core/resolver.py", line 375, in instantiate_provider config = config_type(**provider.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sarthakdeshpande/Desktop/open source/llama-stack/.venv/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__ validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for HuggingFacePostTrainingConfig dpo_output_dir Field required [type=missing, input_value={'checkpoint_format': 'hu...: None, 'device': 'cpu'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.10/v/missing ++ error_handler 128 ++ echo 'Error occurred in script at line: 128' Error occurred in script at line: 128 ++ exit 1

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 12, 2025
@@ -71,7 +71,7 @@ class HuggingFacePostTrainingConfig(BaseModel):
dpo_beta: float = 0.1
use_reference_model: bool = True
dpo_loss_type: Literal["sigmoid", "hinge", "ipo", "kto_pair"] = "sigmoid"
dpo_output_dir: str
dpo_output_dir: str = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dpo_output_dir: str = ""
dpo_output_dir: str | None = None

I think None is a more suitable default rather than an empty string, some checks for None would be great too in the code, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense will make the same change

@@ -71,7 +71,7 @@ class HuggingFacePostTrainingConfig(BaseModel):
dpo_beta: float = 0.1
use_reference_model: bool = True
dpo_loss_type: Literal["sigmoid", "hinge", "ipo", "kto_pair"] = "sigmoid"
dpo_output_dir: str
dpo_output_dir: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes this is a backwards incompatible change as you noted. We should make that a runtime check where the provider yells if this value continues to be None at runtime as @cdoern notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we need to change anything else here or is this good ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently its throwing stack trace error, do we need to show this a log error only or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll need to add checks in the code which uses dpo_output_dir. This is the source of the errors you are seeing, also please run pre-commit to re-generate the openAPI schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could see only one place where dpo_output_dir was being used have added a check there also, please review again once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdoern I am unable to run pre-commit because of some dependencies issues coming up because of upgrading of my laptop, solely because of change in architecture

@cheesecake100201 cheesecake100201 force-pushed the HuggingfacePostTrainingConfig-branch branch from bfc4379 to 6520a6d Compare August 14, 2025 07:07
@cheesecake100201
Copy link
Contributor Author

@cdoern Please review

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix precommit. I have faith in you to resolve your upgrade issues. 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants