fix: 3769 graceful provider registration failure #3780

akram · 2025-10-10T21:55:29Z

What does this PR do?

Test Plan

have a config-registered-models.yaml run

version: 2
image_name: config-registered-models
apis:
- inference
providers:
  inference:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: BOGUS
metadata_store:
  type: sqlite
  db_path: /tmp/config-registered-model.db
models:
- model_id: test-model
  provider_id: openai
  provider_model_id: custom-model
  model_type: llm

and run

llama stack run config-registered-models.yaml

server is not crashing anymore

INFO     2025-10-10 23:54:35,214 uvicorn.error:84 uncategorized: Started server process [51980]
INFO     2025-10-10 23:54:35,215 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-10-10 23:54:35,215 llama_stack.core.server.server:177 core::server: Starting up
INFO     2025-10-10 23:54:35,216 llama_stack.core.stack:421 core: starting registry refresh task
INFO     2025-10-10 23:54:35,221 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-10-10 23:54:35,222 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321
         (Press CTRL+C to quit)
ERROR    2025-10-10 23:54:35,379 llama_stack.providers.utils.inference.openai_mixin:434 providers::utils:
         OpenAIInferenceAdapter.list_provider_model_ids() failed with: Error code: 401 - {'error': {'message':
         'Incorrect API key provided: BOGUS. You can find your API key at
         https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code':
         'invalid_api_key'}}

cc @mattf

When a provider fails during model registration or listing, the stack should continue initializing rather than crashing. This allows the stack to start even if some providers are misconfigured. - Added error handling in register_resources() - Added unit tests to verify error handling behavior - Improved error logging with provider context - Removed @pytest.mark.asyncio decorators (pytest already configured with async-mode=auto) Fixes llamastack#3769

Added tests to verify that the stack: 1. Continues initialization when providers fail to register models 2. Skips invalid models instead of crashing 3. Handles provider listing failures gracefully 4. Maintains partial functionality with mixed success/failure Example: - OpenAI provider fails to list models - Stack logs error and continues with registered models - Other providers remain functional This prevents the entire stack from crashing when: - Provider API keys are invalid - Models are misconfigured - Provider API is temporarily unavailable

akram · 2025-10-11T17:56:24Z

/assign @mattf
/assign @ashwinb

ashwinb

we should absolutely not do this. this means bogus things in run.yaml will continue to not get fixed. it is not an acceptable method to hide errors like this.

akram · 2025-10-13T14:28:27Z

Hi @ashwinb , thanks for the review.
The error is still logged with an ERROR level as shown in the PR's comment. So, the error is not hidden per se. Or maybe I get it wrong ?

akram · 2025-10-13T14:28:39Z

/hold

ashwinb · 2025-10-13T16:59:35Z

Hi @ashwinb , thanks for the review. The error is still logged with an ERROR level as shown in the PR's comment. So, the error is not hidden per se. Or maybe I get it wrong ?

Yes. We cannot catch the exception, the server should be dying if you misconfigure it like that.

akram added 2 commits October 10, 2025 22:19

akram requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 10, 2025 21:55

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025

akram changed the title ~~Fix/3769 graceful provider registration failure~~ Fix: 3769 graceful provider registration failure Oct 10, 2025

akram changed the title ~~Fix: 3769 graceful provider registration failure~~ fix: 3769 graceful provider registration failure Oct 10, 2025

akram mentioned this pull request Oct 11, 2025

fix: allow skipping model availability check for vLLM #3739

Merged

ashwinb reviewed Oct 13, 2025

View reviewed changes

ashwinb closed this Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: 3769 graceful provider registration failure #3780

fix: 3769 graceful provider registration failure #3780

akram commented Oct 10, 2025

Uh oh!

akram commented Oct 11, 2025

Uh oh!

ashwinb left a comment

Uh oh!

akram commented Oct 13, 2025 •

edited

Loading

Uh oh!

akram commented Oct 13, 2025

Uh oh!

ashwinb commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: 3769 graceful provider registration failure #3780

fix: 3769 graceful provider registration failure #3780

Conversation

akram commented Oct 10, 2025

What does this PR do?

Test Plan

Uh oh!

akram commented Oct 11, 2025

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

akram commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akram commented Oct 13, 2025

Uh oh!

ashwinb commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akram commented Oct 13, 2025 •

edited

Loading