Skip to content

Conversation

are-ces
Copy link
Contributor

@are-ces are-ces commented Oct 13, 2025

What does this PR do?

This PR fixes issues with the WatsonX provider so it works correctly with LiteLLM.

The main problem was that WatsonX requests failed because the provider data validator didn’t properly handle the API key and project ID. This was fixed by updating the WatsonXProviderDataValidator and ensuring the provider data is loaded correctly.

The openai_chat_completion method was also updated to match the behavior of other providers while adding WatsonX-specific fields like project_id. It still calls await super().openai_chat_completion.func(self, params) to keep the existing setup and tracing logic.

After these changes, WatsonX requests now run correctly.

Test Plan

The changes were tested by running chat completion requests and confirming that credentials and project parameters are passed correctly. I have tested with my WatsonX credentials, by using the cli with uv run llama-stack-client inference chat-completion --session

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 13, 2025
@are-ces are-ces changed the title Fixed WatsonX remote inference provider fix: Fixed WatsonX remote inference provider Oct 13, 2025
@are-ces
Copy link
Contributor Author

are-ces commented Oct 13, 2025

#3800

@are-ces
Copy link
Contributor Author

are-ces commented Oct 13, 2025

cc: @jwm4 @leseb @franciscojavierarceo
PTAL :)

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me for me when I use the chat completions API, but it is not working for me when I use the Responses API. Here is my test code for Responses:

response = client.responses.create(
    model=WATSONX_MODEL_ID,
    input="What is the capital of France?"
)

Here is the error I see in the logs:

ERROR    2025-10-13 14:40:35,530 llama_stack.core.server.server:290 core::server: Error executing endpoint
         route='/v1/openai/v1/responses' method='post'
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:280 in route_handler         │
         │                                                                                                             │
         │   277 │   │   │   │   │   return StreamingResponse(gen, media_type="text/event-stream")                     │
         │   278 │   │   │   │   else:                                                                                 │
         │   279 │   │   │   │   │   value = func(**kwargs)                                                            │
         │ ❱ 280 │   │   │   │   │   result = await maybe_await(value)                                                 │
         │   281 │   │   │   │   │   if isinstance(result, PaginatedResponse) and result.url is None:                  │
         │   282 │   │   │   │   │   │   result.url = route                                                            │
         │   283                                                                                                       │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:202 in maybe_await           │
         │                                                                                                             │
         │   199                                                                                                       │
         │   200 async def maybe_await(value):                                                                         │
         │   201 │   if inspect.iscoroutine(value):                                                                    │
         │ ❱ 202 │   │   return await value                                                                            │
         │   203 │   return value                                                                                      │
         │   204                                                                                                       │
         │   205                                                                                                       │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py:3 │
         │ 42 in create_openai_response                                                                                │
         │                                                                                                             │
         │   339 │   │   max_infer_iters: int | None = 10,                                                             │
         │   340 │   │   shields: list | None = None,                                                                  │
         │   341 │   ) -> OpenAIResponseObject:                                                                        │
         │ ❱ 342 │   │   return await self.openai_responses_impl.create_openai_response(                               │
         │   343 │   │   │   input,                                                                                    │
         │   344 │   │   │   model,                                                                                    │
         │   345 │   │   │   instructions,                                                                             │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/o │
         │ penai_responses.py:285 in create_openai_response                                                            │
         │                                                                                                             │
         │   282 │   │   │   │   │   if failed_response and failed_response.error                                      │
         │   283 │   │   │   │   │   else "Response stream failed without error details"                               │
         │   284 │   │   │   │   )                                                                                     │
         │ ❱ 285 │   │   │   │   raise RuntimeError(f"OpenAI response failed: {error_message}")                        │
         │   286 │   │   │                                                                                             │
         │   287 │   │   │   if final_response is None:                                                                │
         │   288 │   │   │   │   raise ValueError("The response stream never reached a terminal state")                │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO     2025-10-13 14:40:35,559 console_span_processor:39 telemetry: 18:40:35.531 [END]
         WatsonXInferenceAdapter.openai_chat_completion [StatusCode.OK] (241.60ms)
INFO     2025-10-13 14:40:35,609 uvicorn.access:473 uncategorized: ::1:49922 - "POST /v1/openai/v1/responses HTTP/1.1"
         500
INFO     2025-10-13 14:40:35,609 console_span_processor:48 telemetry:     output:
         <litellm.litellm_core_utils.streaming_handler.CustomStreamWrapper object at 0x155ae7c50>
INFO     2025-10-13 14:40:35,610 console_span_processor:62 telemetry:  18:40:35.276 [INFO]
         LiteLLM completion() model= meta-llama/llama-3-3-70b-instruct; provider = watsonx
INFO     2025-10-13 14:40:35,612 console_span_processor:39 telemetry: 18:40:35.611 [END]
         InferenceRouter.openai_chat_completion [StatusCode.OK] (339.26ms)
INFO     2025-10-13 14:40:35,612 console_span_processor:48 telemetry:     output: <async_generator object
         InferenceRouter.stream_tokens_and_compute_metrics_openai_chat at 0x155ae7240>
INFO     2025-10-13 14:40:35,613 console_span_processor:39 telemetry: 18:40:35.613 [END]
         InferenceRouter.stream_tokens_and_compute_metrics_openai_chat [StatusCode.OK] (0.04ms)
INFO     2025-10-13 14:40:35,614 console_span_processor:62 telemetry:  18:40:35.609 [ERROR] Error executing endpoint
         route='/v1/openai/v1/responses' method='post'
         Traceback (most recent call last):
           File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 280, in
         route_handler
             result = await maybe_await(value)
                      ^^^^^^^^^^^^^^^^^^^^^^^^
           File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 202, in
         maybe_await
             return await value
                    ^^^^^^^^^^^
           File
         "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py",
         line 342, in create_openai_response
             return await self.openai_responses_impl.create_openai_response(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             ...<13 lines>...
             )
             ^
           File
         "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/open
         ai_responses.py", line 285, in create_openai_response
             raise RuntimeError(f"OpenAI response failed: {error_message}")
         RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO     2025-10-13 14:40:35,616 console_span_processor:62 telemetry:  18:40:35.610 [INFO] ::1:49922 - "POST
         /v1/openai/v1/responses HTTP/1.1" 500
INFO     2025-10-13 14:40:35,617 console_span_processor:39 telemetry: 18:40:35.616 [END] /v1/openai/v1/responses
         [StatusCode.OK] (352.71ms)
INFO     2025-10-13 14:40:35,617 console_span_processor:48 telemetry:     raw_path: /v1/openai/v1/responses

@jwm4
Copy link
Contributor

jwm4 commented Oct 13, 2025

I am not really sure what is going on with the Responses API, but here is my guess:

  1. LiteLLM appears to have a defect in which it is omitting the usage information from watsonx.ai streaming final response blocks. We discussed that defect and a work-around for it here.
  2. Since then, it looks like there has been some additional error finding/handling added to the Responses implementation, maybe in this PR or another recent one. That appears to also be catching this same LiteLLM/watsonx defect.
  3. That suggests that the people who were arguing that working around this apparent defect in LiteLLM at the server level were right after all and it really should be fixed in the provider itself. Ideally we'd do that by fixing LiteLLM directly and then migrating to a new LiteLLM version. However, I do think it is possible to have an easier work-around where we fill in a usage block with 0's for everything so it doesn't crash. The telemetry would still be wrong so we'd want to open a separate defect for that.

@are-ces
Copy link
Contributor Author

are-ces commented Oct 14, 2025

I have added your changes as they were working for me. TYVM @jwm4, nice work!

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold, I'm running some more tests with our suite

leseb added 2 commits October 14, 2025 14:32
The
tests/integration/inference/test_openai_completion.py tests fail on a
few scenarios like:

tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n

FAILED tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] - AssertionError: assert 1 == 2
 +  where 1 = len({0: 'thethenamenameofofthetheususcapitalcapitalisiswashingtonwashington,,dd.c.c..'})

test_openai_completion_logprobs
E   openai.BadRequestError: Error code: 400 - {'error': {'detail': {'errors': [{'loc': ['body', 'logprobs'], 'msg': 'Input should be a valid boolean, unable to interpret input', 'type': 'bool_parsing'}]}}}

test_openai_completion_stop_sequence
E   openai.BadRequestError: Error code: 400 - {'detail': 'litellm.BadRequestError: OpenAIException - {"errors":[{"code":"json_type_error","message":"Json field type error: CommonTextChatParameters.stop must be an array, and the element must be of type string","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-chat"}],"trace":"f758b3bbd4f357aa9b16f3dc5ee1170e","status_code":400}'}

So adding the right exception but we still provide some coverage for
openai through litellm.

Now tests pass:

```
INFO     2025-10-14 14:20:17,115 tests.integration.conftest:50 tests: Test stack config type: library_client
         (stack_config=None)
======================================================== test session starts =========================================================
platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items

tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [  3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:suffix] SKIPPED [  6%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [  9%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=meta-llama/llama-3-3-70b-instruct] SKIPPED [ 12%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 15%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 21%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 28%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=meta-llama/llama-3-3-70b-instruct] SKIPPEDfiles.) [ 31%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_stop_sequence[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] SKIPPED [ 34%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 43%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 46%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] PASSED [ 53%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 56%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 62%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 65%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 68%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 71%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 75%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 78%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 84%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 87%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 90%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 93%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [100%]

======================================================== slowest 10 durations ========================================================
5.97s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling]
3.39s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02]
3.26s call     tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True]
2.64s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling]
1.78s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output]
1.73s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity]
1.58s call     tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True]
1.51s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity]
1.41s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02]
1.20s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02]
====================================================== short test summary info =======================================================
SKIPPED [1] tests/integration/inference/test_openai_completion.py:85: Suffix is not supported for the model: meta-llama/llama-3-3-70b-instruct.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:135: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:115: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:141: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support chat completion calls with base64 encoded files.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:514: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions stop sequence.
SKIPPED [2] tests/integration/inference/test_openai_completion.py:72: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions logprobs.
============================================ 22 passed, 10 skipped, 2 warnings in 35.11s =============================================
```

Signed-off-by: Sébastien Han <[email protected]>
Setting the dimension is not supported see:

```
openai.BadRequestError: Error code: 400 - {'detail': "litellm.UnsupportedParamsError: watsonx does not support parameters: {'dimensions': 384}
```

Successful run:

```
INFO     2025-10-14 14:32:20,353 tests.integration.conftest:50 tests: Test stack config type: library_client
         (stack_config=None)
======================================================== test session starts =========================================================
platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 20 items

tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [  5%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 10%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 15%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 20%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 25%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 30%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 35%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 40%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 45%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 50%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 55%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 60%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 65%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 70%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 75%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 80%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 85%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 90%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 95%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [100%]

======================================================== slowest 10 durations ========================================================
1.84s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
1.62s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
1.23s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.70s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.69s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.61s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.41s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.41s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.41s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
0.38s call     tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr]
====================================================== short test summary info =======================================================
SKIPPED [4] tests/integration/inference/test_openai_embeddings.py:63: Model watsonx/ibm/slate-30m-english-rtrvr hosted by remote::watsonx does not support variable output embedding dimensions.
============================================= 16 passed, 4 skipped, 1 warning in 10.23s ==============================================
```

Signed-off-by: Sébastien Han <[email protected]>
Copy link
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@leseb leseb merged commit 0dbf79c into llamastack:main Oct 14, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants