-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Fixed WatsonX remote inference provider #3801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc: @jwm4 @leseb @franciscojavierarceo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for me for me when I use the chat completions API, but it is not working for me when I use the Responses API. Here is my test code for Responses:
response = client.responses.create(
model=WATSONX_MODEL_ID,
input="What is the capital of France?"
)
Here is the error I see in the logs:
ERROR 2025-10-13 14:40:35,530 llama_stack.core.server.server:290 core::server: Error executing endpoint
route='/v1/openai/v1/responses' method='post'
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:280 in route_handler │
│ │
│ 277 │ │ │ │ │ return StreamingResponse(gen, media_type="text/event-stream") │
│ 278 │ │ │ │ else: │
│ 279 │ │ │ │ │ value = func(**kwargs) │
│ ❱ 280 │ │ │ │ │ result = await maybe_await(value) │
│ 281 │ │ │ │ │ if isinstance(result, PaginatedResponse) and result.url is None: │
│ 282 │ │ │ │ │ │ result.url = route │
│ 283 │
│ │
│ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py:202 in maybe_await │
│ │
│ 199 │
│ 200 async def maybe_await(value): │
│ 201 │ if inspect.iscoroutine(value): │
│ ❱ 202 │ │ return await value │
│ 203 │ return value │
│ 204 │
│ 205 │
│ │
│ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py:3 │
│ 42 in create_openai_response │
│ │
│ 339 │ │ max_infer_iters: int | None = 10, │
│ 340 │ │ shields: list | None = None, │
│ 341 │ ) -> OpenAIResponseObject: │
│ ❱ 342 │ │ return await self.openai_responses_impl.create_openai_response( │
│ 343 │ │ │ input, │
│ 344 │ │ │ model, │
│ 345 │ │ │ instructions, │
│ │
│ /Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/o │
│ penai_responses.py:285 in create_openai_response │
│ │
│ 282 │ │ │ │ │ if failed_response and failed_response.error │
│ 283 │ │ │ │ │ else "Response stream failed without error details" │
│ 284 │ │ │ │ ) │
│ ❱ 285 │ │ │ │ raise RuntimeError(f"OpenAI response failed: {error_message}") │
│ 286 │ │ │ │
│ 287 │ │ │ if final_response is None: │
│ 288 │ │ │ │ raise ValueError("The response stream never reached a terminal state") │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO 2025-10-13 14:40:35,559 console_span_processor:39 telemetry: 18:40:35.531 [END]
WatsonXInferenceAdapter.openai_chat_completion [StatusCode.OK] (241.60ms)
INFO 2025-10-13 14:40:35,609 uvicorn.access:473 uncategorized: ::1:49922 - "POST /v1/openai/v1/responses HTTP/1.1"
500
INFO 2025-10-13 14:40:35,609 console_span_processor:48 telemetry: output:
<litellm.litellm_core_utils.streaming_handler.CustomStreamWrapper object at 0x155ae7c50>
INFO 2025-10-13 14:40:35,610 console_span_processor:62 telemetry: 18:40:35.276 [INFO]
LiteLLM completion() model= meta-llama/llama-3-3-70b-instruct; provider = watsonx
INFO 2025-10-13 14:40:35,612 console_span_processor:39 telemetry: 18:40:35.611 [END]
InferenceRouter.openai_chat_completion [StatusCode.OK] (339.26ms)
INFO 2025-10-13 14:40:35,612 console_span_processor:48 telemetry: output: <async_generator object
InferenceRouter.stream_tokens_and_compute_metrics_openai_chat at 0x155ae7240>
INFO 2025-10-13 14:40:35,613 console_span_processor:39 telemetry: 18:40:35.613 [END]
InferenceRouter.stream_tokens_and_compute_metrics_openai_chat [StatusCode.OK] (0.04ms)
INFO 2025-10-13 14:40:35,614 console_span_processor:62 telemetry: 18:40:35.609 [ERROR] Error executing endpoint
route='/v1/openai/v1/responses' method='post'
Traceback (most recent call last):
File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 280, in
route_handler
result = await maybe_await(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/core/server/server.py", line 202, in
maybe_await
return await value
^^^^^^^^^^^
File
"/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py",
line 342, in create_openai_response
return await self.openai_responses_impl.create_openai_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<13 lines>...
)
^
File
"/Users/bmurdock/git/lls-wx-fix-2/llama-stack/llama_stack/providers/inline/agents/meta_reference/responses/open
ai_responses.py", line 285, in create_openai_response
raise RuntimeError(f"OpenAI response failed: {error_message}")
RuntimeError: OpenAI response failed: 'ModelResponseStream' object has no attribute 'usage'
INFO 2025-10-13 14:40:35,616 console_span_processor:62 telemetry: 18:40:35.610 [INFO] ::1:49922 - "POST
/v1/openai/v1/responses HTTP/1.1" 500
INFO 2025-10-13 14:40:35,617 console_span_processor:39 telemetry: 18:40:35.616 [END] /v1/openai/v1/responses
[StatusCode.OK] (352.71ms)
INFO 2025-10-13 14:40:35,617 console_span_processor:48 telemetry: raw_path: /v1/openai/v1/responses
I am not really sure what is going on with the Responses API, but here is my guess:
|
I have added your changes as they were working for me. TYVM @jwm4, nice work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold, I'm running some more tests with our suite
The tests/integration/inference/test_openai_completion.py tests fail on a few scenarios like: tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n FAILED tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] - AssertionError: assert 1 == 2 + where 1 = len({0: 'thethenamenameofofthetheususcapitalcapitalisiswashingtonwashington,,dd.c.c..'}) test_openai_completion_logprobs E openai.BadRequestError: Error code: 400 - {'error': {'detail': {'errors': [{'loc': ['body', 'logprobs'], 'msg': 'Input should be a valid boolean, unable to interpret input', 'type': 'bool_parsing'}]}}} test_openai_completion_stop_sequence E openai.BadRequestError: Error code: 400 - {'detail': 'litellm.BadRequestError: OpenAIException - {"errors":[{"code":"json_type_error","message":"Json field type error: CommonTextChatParameters.stop must be an array, and the element must be of type string","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-chat"}],"trace":"f758b3bbd4f357aa9b16f3dc5ee1170e","status_code":400}'} So adding the right exception but we still provide some coverage for openai through litellm. Now tests pass: ``` INFO 2025-10-14 14:20:17,115 tests.integration.conftest:50 tests: Test stack config type: library_client (stack_config=None) ======================================================== test session starts ========================================================= platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 32 items tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 3%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:suffix] SKIPPED [ 6%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 9%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=meta-llama/llama-3-3-70b-instruct] SKIPPED [ 12%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 15%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 18%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 21%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 25%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 28%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=meta-llama/llama-3-3-70b-instruct] SKIPPEDfiles.) [ 31%] tests/integration/inference/test_openai_completion.py::test_openai_completion_stop_sequence[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] SKIPPED [ 34%] tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 37%] tests/integration/inference/test_openai_completion.py::test_openai_completion_logprobs_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] SKIPPED [ 40%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 43%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 46%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] PASSED [ 53%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 56%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 59%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 62%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 65%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 68%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 71%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 75%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] SKIPPED [ 78%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 81%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] PASSED [ 84%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 87%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 90%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] SKIPPED [ 93%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [ 96%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-False] PASSED [100%] ======================================================== slowest 10 durations ======================================================== 5.97s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tool_choice_none[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] 3.39s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] 3.26s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=meta-llama/llama-3-3-70b-instruct-True] 2.64s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_with_tools_and_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] 1.78s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_structured_output[txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] 1.73s call tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] 1.58s call tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-True] 1.51s call tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] 1.41s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] 1.20s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] ====================================================== short test summary info ======================================================= SKIPPED [1] tests/integration/inference/test_openai_completion.py:85: Suffix is not supported for the model: meta-llama/llama-3-3-70b-instruct. SKIPPED [1] tests/integration/inference/test_openai_completion.py:135: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:115: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py:141: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support chat completion calls with base64 encoded files. SKIPPED [1] tests/integration/inference/test_openai_completion.py:514: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions stop sequence. SKIPPED [2] tests/integration/inference/test_openai_completion.py:72: Model meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support /v1/completions logprobs. ============================================ 22 passed, 10 skipped, 2 warnings in 35.11s ============================================= ``` Signed-off-by: Sébastien Han <[email protected]>
Setting the dimension is not supported see: ``` openai.BadRequestError: Error code: 400 - {'detail': "litellm.UnsupportedParamsError: watsonx does not support parameters: {'dimensions': 384} ``` Successful run: ``` INFO 2025-10-14 14:32:20,353 tests.integration.conftest:50 tests: Test stack config type: library_client (stack_config=None) ======================================================== test session starts ========================================================= platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 20 items tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 5%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 10%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 15%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 20%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 25%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 30%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 35%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 40%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 45%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 50%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 55%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 60%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 65%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 70%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 75%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 80%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 85%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [ 90%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] SKIPPED [ 95%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] PASSED [100%] ======================================================== slowest 10 durations ======================================================== 1.84s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 1.62s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 1.23s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.70s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.69s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.61s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.41s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=watsonx/ibm/slate-30m-english-rtrvr] 0.38s call tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=watsonx/ibm/slate-30m-english-rtrvr] ====================================================== short test summary info ======================================================= SKIPPED [4] tests/integration/inference/test_openai_embeddings.py:63: Model watsonx/ibm/slate-30m-english-rtrvr hosted by remote::watsonx does not support variable output embedding dimensions. ============================================= 16 passed, 4 skipped, 1 warning in 10.23s ============================================== ``` Signed-off-by: Sébastien Han <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now!
What does this PR do?
This PR fixes issues with the WatsonX provider so it works correctly with LiteLLM.
The main problem was that WatsonX requests failed because the provider data validator didn’t properly handle the API key and project ID. This was fixed by updating the WatsonXProviderDataValidator and ensuring the provider data is loaded correctly.
The openai_chat_completion method was also updated to match the behavior of other providers while adding WatsonX-specific fields like project_id. It still calls await super().openai_chat_completion.func(self, params) to keep the existing setup and tracing logic.
After these changes, WatsonX requests now run correctly.
Test Plan
The changes were tested by running chat completion requests and confirming that credentials and project parameters are passed correctly. I have tested with my WatsonX credentials, by using the cli with
uv run llama-stack-client inference chat-completion --session