Skip to content

Fix chat CLI GPU loading and request_id validation issues (#40230) #40232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

robin-ede
Copy link

What does this PR do?

This PR fixes two critical bugs in the transformers chat CLI that prevent users from using chat:

Issue 1: Chat CLI doesn't use GPU by default

Problem: The chat CLI was defaulting to CPU inference even when GPUs are available, leading to slow performance.

Root Cause: The device parameter in ChatArguments was hardcoded to "cpu" instead of "auto".

Solution: Changed the default from "cpu" to "auto" in src/transformers/commands/chat.py:249 to match the serving backend behavior.

Issue 2: Chat breaks on second message with validation error

Problem: After the first message, subsequent messages fail with error: "Unexpected keys in the request: {'request_id'}"

Root Cause: The chat client sends a request_id field in the request body, but the server's validation schema (TransformersCompletionCreateParamsStreaming) doesn't recognize this field as valid.

Solution: Added request_id: Optional[str] = None to the schema in src/transformers/commands/serving.py:128 to allow this field to pass validation.

Testing

  • ✅ Code quality checks passed (ruff check, ruff format)
  • ✅ Existing chat CLI tests continue to pass
  • ✅ Verified device default now returns "auto" instead of "cpu"
  • ✅ Verified request_id field is properly added to validation schema

Fixes #40230

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@Rocketknight1
Copy link
Member

cc @gante

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for diving into the issues @robin-ede 🤗

One small change and it's good to go 💛

…e#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue huggingface#40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message
@robin-ede robin-ede force-pushed the fix-chat-cli-issues-40230 branch from 2a9ab45 to ca5908e Compare August 18, 2025 18:04
@robin-ede
Copy link
Author

@gante fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transfomers chat cli doesnt load on gpu and breaks on second message
3 participants