Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

robin-ede · 2025-08-17T22:04:31Z

What does this PR do?

This PR fixes two critical bugs in the transformers chat CLI that prevent users from using chat:

Issue 1: Chat CLI doesn't use GPU by default

Problem: The chat CLI was defaulting to CPU inference even when GPUs are available, leading to slow performance.

Root Cause: The device parameter in ChatArguments was hardcoded to "cpu" instead of "auto".

Solution: Changed the default from "cpu" to "auto" in src/transformers/commands/chat.py:249 to match the serving backend behavior.

Issue 2: Chat breaks on second message with validation error

Problem: After the first message, subsequent messages fail with error: "Unexpected keys in the request: {'request_id'}"

Root Cause: The chat client sends a request_id field in the request body, but the server's validation schema (TransformersCompletionCreateParamsStreaming) doesn't recognize this field as valid.

Solution: Added request_id: Optional[str] = None to the schema in src/transformers/commands/serving.py:128 to allow this field to pass validation.

Testing

✅ Code quality checks passed (ruff check, ruff format)
✅ Existing chat CLI tests continue to pass
✅ Verified device default now returns "auto" instead of "cpu"
✅ Verified request_id field is properly added to validation schema

Fixes #40230

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc (CLI and serving infrastructure)
@Rocketknight1 (CLI tools and user experience)

Rocketknight1 · 2025-08-18T14:04:47Z

cc @gante

gante

Thank you for diving into the issues @robin-ede 🤗

One small change and it's good to go 💛

src/transformers/commands/serving.py

…e#40230) This commit addresses two critical bugs in the transformers chat CLI: 1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments - Chat CLI now automatically uses GPU when available instead of defaulting to CPU - Matches the behavior of the underlying serving infrastructure 2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema - Fixes "Unexpected keys in the request: {'request_id'}" error on second message - Allows request_id to be properly sent and validated by the server Both fixes target the exact root causes identified in issue huggingface#40230: - Users will now get GPU acceleration by default when available - Chat sessions will no longer break after the second message

…ramsStreaming

robin-ede · 2025-08-18T18:05:53Z

@gante fixed!

gante reviewed Aug 18, 2025

View reviewed changes

src/transformers/commands/serving.py Outdated Show resolved Hide resolved

robin-ede added 2 commits August 18, 2025 13:02

Remove unrelated request_id field from TransformersCompletionCreatePa…

ca5908e

…ramsStreaming

robin-ede force-pushed the fix-chat-cli-issues-40230 branch from 2a9ab45 to ca5908e Compare August 18, 2025 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

robin-ede commented Aug 17, 2025

Uh oh!

Rocketknight1 commented Aug 18, 2025

Uh oh!

gante left a comment •

edited

Loading

Uh oh!

Uh oh!

robin-ede commented Aug 18, 2025

Uh oh!

Uh oh!

Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

Are you sure you want to change the base?

Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

Conversation

robin-ede commented Aug 17, 2025

What does this PR do?

Issue 1: Chat CLI doesn't use GPU by default

Issue 2: Chat breaks on second message with validation error

Testing

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Aug 18, 2025

Uh oh!

gante left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robin-ede commented Aug 18, 2025

Uh oh!

Uh oh!

gante left a comment •

edited

Loading