fix: Fixed WatsonX remote inference provider #3801

are-ces · 2025-10-13T12:49:51Z

What does this PR do?

This PR fixes issues with the WatsonX provider so it works correctly with LiteLLM.

The main problem was that WatsonX requests failed because the provider data validator didn’t properly handle the API key and project ID. This was fixed by updating the WatsonXProviderDataValidator and ensuring the provider data is loaded correctly.

The openai_chat_completion method was also updated to match the behavior of other providers while adding WatsonX-specific fields like project_id. It still calls await super().openai_chat_completion.func(self, params) to keep the existing setup and tracing logic.

After these changes, WatsonX requests now run correctly.

Test Plan

The changes were tested by running chat completion requests and confirming that credentials and project parameters are passed correctly. I have tested with my WatsonX credentials, by using the cli with uv run llama-stack-client inference chat-completion --session

are-ces · 2025-10-13T12:51:45Z

#3800

are-ces · 2025-10-13T12:54:53Z

cc: @jwm4 @leseb @franciscojavierarceo
PTAL :)

ashwinb

lgtm

jwm4

This works for me for me when I use the chat completions API, but it is not working for me when I use the Responses API. Here is my test code for Responses:

response = client.responses.create(
    model=WATSONX_MODEL_ID,
    input="What is the capital of France?"
)

Here is the error I see in the logs:

ERROR    2025-10-13 14:40:35,530 llama_stack.core.server.server:290 core::server: Error executing endpoint
         route='/v1/openai/v1/responses' method='post'
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:280 in route_handler         │
         │                                                                                                             │
         │   277 │   │   │   │   │   return StreamingResponse(gen, media_type="text/event-stream")                     │
         │   278 │   │   │   │   else:                                                                                 │
         │   279 │   │   │   │   │   value = func(**kwargs)                                                            │
         │ ❱ 280 │   │   │   │   │   result = await maybe_await(value)                                                 │
         │   281 │   │   │   │   │   if isinstance(result, PaginatedResponse) and result.url is None:                  │
         │   282 │   │   │   │   │   │   result.url = route                                                            │
         │   283                                                                                                       │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:202 in maybe_await           │
         │                                                                                                             │
         │   199                                                                                                       │
         │   200 async def maybe_await(value):                                                                         │
         │   201 │   if inspect.iscoroutine(value):                                                                    │
         │ ❱ 202 │   │   return await value                                                                            │
         │   203 │   return value                                                                                      │
         │   204                                                                                                       │
         │   205                                                                                                       │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py:3 │
         │ 42 in create_openai_response                                                                                │
         │                                                                                                             │
         │   339 │   │   max_infer_iters: int | None = 10,                                                             │
         │   340 │   │   shields: list | None = None,                                                                  │
         │   341 │   ) -> OpenAIResponseObject:                                                                        │
         │ ❱ 342 │   │   return await self.openai_responses_impl.create_openai_response(                               │
         │   343 │   │   │   input,                                                                                    │
         │   344 │   │   │   model,                                                                                    │
         │   345 │   │   │   instructions,                                                                             │
         │                                                                                                             │
         │ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/o │
         │ penai_responses.py:285 in create_openai_response                                                            │
         │                                                                                                             │
         │   282 │   │   │   │   │   if failed_response and failed_response.error                                      │
         │   283 │   │   │   │   │   else "Response stream failed without error details"                               │
         │   284 │   │   │   │   )                                                                                     │
         │ ❱ 285 │   │   │   │   raise RuntimeError(f"OpenAI response failed: {error_message}")                        │
         │   286 │   │   │                                                                                             │
         │   287 │   │   │   if final_response is None:                                                                │
         │   288 │   │   │   │   raise ValueError("The response stream never reached a terminal state")                │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO     2025-10-13 14:40:35,559 console_span_processor:39 telemetry: 18:40:35.531 [END]
         WatsonXInferenceAdapter.openai_chat_completion [StatusCode.OK] (241.60ms)
INFO     2025-10-13 14:40:35,609 uvicorn.access:473 uncategorized: ::1:49922 - "POST /v1/openai/v1/responses HTTP/1.1"
         500
INFO     2025-10-13 14:40:35,609 console_span_processor:48 telemetry:     output:
         <litellm.litellm_core_utils.streaming_handler.CustomStreamWrapper object at 0x155ae7c50>
INFO     2025-10-13 14:40:35,610 console_span_processor:62 telemetry:  18:40:35.276 [INFO]
         LiteLLM completion() model= meta-llama/llama-3-3-70b-instruct; provider = watsonx
INFO     2025-10-13 14:40:35,612 console_span_processor:39 telemetry: 18:40:35.611 [END]
         InferenceRouter.openai_chat_completion [StatusCode.OK] (339.26ms)
INFO     2025-10-13 14:40:35,612 console_span_processor:48 telemetry:     output: <async_generator object
         InferenceRouter.stream_tokens_and_compute_metrics_openai_chat at 0x155ae7240>
INFO     2025-10-13 14:40:35,613 console_span_processor:39 telemetry: 18:40:35.613 [END]
         InferenceRouter.stream_tokens_and_compute_metrics_openai_chat [StatusCode.OK] (0.04ms)
INFO     2025-10-13 14:40:35,614 console_span_processor:62 telemetry:  18:40:35.609 [ERROR] Error executing endpoint
         route='/v1/openai/v1/responses' method='post'
         Traceback (most recent call last):
           File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 280, in
         route_handler
             result = await maybe_await(value)
                      ^^^^^^^^^^^^^^^^^^^^^^^^
           File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 202, in
         maybe_await
             return await value
                    ^^^^^^^^^^^
           File
         "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py",
         line 342, in create_openai_response
             return await self.openai_responses_impl.create_openai_response(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             ...<13 lines>...
             )
             ^
           File
         "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/open
         ai_responses.py", line 285, in create_openai_response
             raise RuntimeError(f"OpenAI response failed: {error_message}")
         RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO     2025-10-13 14:40:35,616 console_span_processor:62 telemetry:  18:40:35.610 [INFO] ::1:49922 - "POST
         /v1/openai/v1/responses HTTP/1.1" 500
INFO     2025-10-13 14:40:35,617 console_span_processor:39 telemetry: 18:40:35.616 [END] /v1/openai/v1/responses
         [StatusCode.OK] (352.71ms)
INFO     2025-10-13 14:40:35,617 console_span_processor:48 telemetry:     raw_path: /v1/openai/v1/responses

jwm4 · 2025-10-13T19:38:15Z

I am not really sure what is going on with the Responses API, but here is my guess:

LiteLLM appears to have a defect in which it is omitting the usage information from watsonx.ai streaming final response blocks. We discussed that defect and a work-around for it here.
Since then, it looks like there has been some additional error finding/handling added to the Responses implementation, maybe in this PR or another recent one. That appears to also be catching this same LiteLLM/watsonx defect.
That suggests that the people who were arguing that working around this apparent defect in LiteLLM at the server level were right after all and it really should be fixed in the provider itself. Ideally we'd do that by fixing LiteLLM directly and then migrating to a new LiteLLM version. However, I do think it is possible to have an easier work-around where we fill in a usage block with 0's for everything so it doesn't crash. The telemetry would still be wrong so we'd want to open a separate defect for that.

are-ces · 2025-10-14T07:59:04Z

I have added your changes as they were working for me. TYVM @jwm4, nice work!

leseb

Hold, I'm running some more tests with our suite

The tests/integration/inference/test_openai_completion.py tests fail on a few scenarios like: tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n FAILED tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] - AssertionError: assert 1 == 2 + where 1 = len({0: 'thethenamenameofofthetheususcapitalcapitalisiswashingtonwashington,,dd.c.c..'}) test_openai_completion_logprobs E openai.BadRequestError: Error code: 400 - {'error': {'detail': {'errors': [{'loc': ['body', 'logprobs'], 'msg': 'Input should be a valid boolean, unable to interpret input', 'type': 'bool_parsing'}]}}} test_openai_completion_stop_sequence E openai.BadRequestError: Error code: 400 - {'detail': 'litellm.BadRequestError: OpenAIException - {"errors":[{"code":"json_type_error","message":"Json field type error: CommonTextChatParameters.stop must be an array, and the element must be of type string","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-chat"}],"trace":"f758b3bbd4f357aa9b16f3dc5ee1170e","status_code":400}'} So adding the right exception but we still provide some coverage for openai through litellm. Now tests pass: ``` INFO 2025-10-14 14:20:17,115 tests.integration.conftest:50 tests: Test stack config type: library_client (stack_config=None) ======================================================== test session starts ========================================================= platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 32 items tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 3%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:suffix] SKIPPED [ 6%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 9%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=meta-llama/llama-3-3-70b-instruct] SKIPPED [ 12%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 15%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 18%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 21%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 25%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 28%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=meta-llama/llama-3-3-70b-instruct] SKIPPEDfiles.) [ 31%] tests/integration/inference/test_openai_completion.py::test_openai_completion_stop_sequence[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] SKIPPED [ 34%] tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 37%] tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 40%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 43%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 46%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] PASSED [ 53%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 56%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 59%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 62%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 65%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 68%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 71%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 75%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 78%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 81%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 84%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 87%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 90%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 93%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 96%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [100%] ======================================================== slowest 10 durations ======================================================== 5.97s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] 3.39s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] 3.26s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] 2.64s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] 1.78s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] 1.73s call tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] 1.58s call tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] 1.51s call tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] 1.41s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] 1.20s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] ====================================================== short test summary info ======================================================= SKIPPED [1] tests/integration/inference/test_openai_completion.py:85: Suffix is not supported for the model: meta-llama/llama-3-3-70b-instruct. SKIPPED [1] tests/integration/inference/test_openai_completion.py:135: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:115: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py:141: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support chat completion calls with base64 encoded files. SKIPPED [1] tests/integration/inference/test_openai_completion.py:514: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions stop sequence. SKIPPED [2] tests/integration/inference/test_openai_completion.py:72: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions logprobs. ============================================ 22 passed, 10 skipped, 2 warnings in 35.11s ============================================= ``` Signed-off-by: Sébastien Han <[email protected]>

Setting the dimension is not supported see: ``` openai.BadRequestError: Error code: 400 - {'detail': "litellm.UnsupportedParamsError: watsonx does not support parameters: {'dimensions': 384} ``` Successful run: ``` INFO 2025-10-14 14:32:20,353 tests.integration.conftest:50 tests: Test stack config type: library_client (stack_config=None) ======================================================== test session starts ========================================================= platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 20 items tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 5%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 10%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 15%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 20%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 25%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 30%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 35%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 40%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 45%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 50%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 55%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 60%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 65%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 70%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 75%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 80%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 85%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 90%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 95%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [100%] ======================================================== slowest 10 durations ======================================================== 1.84s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 1.62s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 1.23s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.70s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.69s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.61s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.38s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] ====================================================== short test summary info ======================================================= SKIPPED [4] tests/integration/inference/test_openai_embeddings.py:63: Model watsonx/ibm/slate-30m-english-rtrvr hosted by remote::watsonx does not support variable output embedding dimensions. ============================================= 16 passed, 4 skipped, 1 warning in 10.23s ============================================== ``` Signed-off-by: Sébastien Han <[email protected]>

jwm4

LGTM now!

are-ces requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 13, 2025 12:49

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 13, 2025

are-ces changed the title ~~Fixed WatsonX remote inference provider~~ fix: Fixed WatsonX remote inference provider Oct 13, 2025

are-ces force-pushed the watsonxfix branch from ebae59b to e3cb13b Compare October 13, 2025 12:53

ashwinb approved these changes Oct 13, 2025

View reviewed changes

jwm4 suggested changes Oct 13, 2025

View reviewed changes

are-ces force-pushed the watsonxfix branch from fa2f9de to cb519e7 Compare October 14, 2025 07:56

Fixed WatsonX bugs

effe760

leseb requested changes Oct 14, 2025

View reviewed changes

leseb added 2 commits October 14, 2025 14:32

leseb force-pushed the watsonxfix branch from cb519e7 to 833aa0e Compare October 14, 2025 12:47

leseb approved these changes Oct 14, 2025

View reviewed changes

franciscojavierarceo approved these changes Oct 14, 2025

View reviewed changes

jwm4 approved these changes Oct 14, 2025

View reviewed changes

leseb merged commit 0dbf79c into llamastack:main Oct 14, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fixed WatsonX remote inference provider #3801

fix: Fixed WatsonX remote inference provider #3801

Uh oh!

are-ces commented Oct 13, 2025 •

edited

Loading

Uh oh!

are-ces commented Oct 13, 2025

Uh oh!

are-ces commented Oct 13, 2025 •

edited

Loading

Uh oh!

ashwinb left a comment

Uh oh!

jwm4 left a comment

Uh oh!

jwm4 commented Oct 13, 2025

Uh oh!

are-ces commented Oct 14, 2025 •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

jwm4 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: Fixed WatsonX remote inference provider #3801

fix: Fixed WatsonX remote inference provider #3801

Uh oh!

Conversation

are-ces commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

are-ces commented Oct 13, 2025

Uh oh!

are-ces commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

jwm4 left a comment

Choose a reason for hiding this comment

Uh oh!

jwm4 commented Oct 13, 2025

Uh oh!

are-ces commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

jwm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

are-ces commented Oct 13, 2025 •

edited

Loading

are-ces commented Oct 13, 2025 •

edited

Loading

are-ces commented Oct 14, 2025 •

edited

Loading