-
Notifications
You must be signed in to change notification settings - Fork 327
fix:bug(chat): ensure sanitize_user_prompt
handles UTF-8 boundaries correctly
#3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sanitize_user_prompt
handles UTF-8 boundaries correctlysanitize_user_prompt
handles UTF-8 boundaries correctly
sanitize_user_prompt
handles UTF-8 boundaries correctlysanitize_user_prompt
handles UTF-8 boundaries correctly
// Limit the size of input to first 4096 bytes, respecting character boundaries | ||
const MAX_LEN: usize = 4096; | ||
|
||
let truncated = if input.len() > MAX_LEN { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a util function that does this - https://github.com/aws/amazon-q-developer-cli/blob/main/crates/chat-cli/src/cli/chat/util/mod.rs#L19
Can this be used instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I changed it to use truncate_safe
function
There’s another PR that fixes a similar UTF-8 boundary bug — can you review it as well? |
Thank you for your reviews. Can we merge this PR ? |
Thank you! |
Issue #, if available: #3086
Description of changes:
When executing a chat hook, the user’s prompt is sanitized through the
sanitize_user_prompt(prompt)
function. This function trims the prompt with a substring if its length is too long (over 4096 bytes).However, in the case of multibyte characters such as those in CJK languages, if a character is cut across a UTF-8 boundary, a panic occurs and the program terminates.
This PR removes such crashes and fixes the sanitize_user_prompt function so that it works correctly.
Screen Capture