Skip to content

Conversation

@joshrutkowski
Copy link
Contributor

@joshrutkowski joshrutkowski commented Oct 3, 2025

Issue #, if available: #2339

Description of changes:

Current state

Image support required saving a screenshot and referencing as a file in chat

New state

Image from clipboard is copied to a tmp directory and can be referenced directly via CTRL+V keybinding or a new /paste command to be read in chat. Multiple images are supported (in the case of CTRL+V).

CTRL+V example:

[default] 7% > [Image #1]


> I see you've provided a file path to a PNG image. Let me read and analyze it for you.


🛠️  Using tool: fs_read (trusted)
 ⋮
 ● Reading images: /var/folders/1t/qsms0xsd7zx_lsn66s2zxtch0000gr/T/.tmpyh66w1.png

 ✓ Successfully read image

 ⋮
 ● Completed in 0.88s



> This is an adorable gray and white kitten with striking yellow-green eyes. The kitten has:

• A gray coat with white markings on the face (white blaze down the center and white muzzle)
• Large, expressive yellow-green eyes
• Prominent whiskers
• Large, alert ears
• A sweet, innocent expression looking directly at the camera

The photo appears to be a professional or high-quality portrait shot against a white background, capturing the kitten's charming features
beautifully.

Is there something specific you'd like me to help you with regarding this image?

/paste example:

[default] 7% > /paste



> I see you've provided a file path to a PNG image. Let me read it to see what you'd like help with.


🛠️  Using tool: fs_read (trusted)
 ⋮
 ● Reading images: /var/folders/1t/qsms0xsd7zx_lsn66s2zxtch0000gr/T/.tmpVjC68m.png

 ✓ Successfully read image

 ⋮
 ● Completed in 0.70s



> I can see the image shows a black Labrador Retriever with beautiful amber/brown eyes looking directly at the camera against a white background.
It's a professional-looking portrait photo.

How can I help you with this image? Would you like me to:
• Analyze something specific about it?
• Help you process or manipulate it in some way?
• Extract information from it?
• Something else?

Multiple images example:

[default] 7% > [Image #1][Image #2]


> I'll read these image files to see what you're sharing.


🛠️  Using tool: fs_read (trusted)
 ⋮
 ● Reading images: /var/folders/1t/qsms0xsd7zx_lsn66s2zxtch0000gr/T/.tmpInTzqp.png
/var/folders/1t/qsms0xsd7zx_lsn66s2zxtch0000gr/T/.tmpFGCzWa.png

 ✓ Successfully read image

 ⋮
 ● Completed in 0.94s



> I can see you've shared two images:

1. First image: A black Labrador Retriever with beautiful amber/brown eyes, looking directly at the camera with a calm, gentle expression.

2. Second image: A gray and white kitten with striking yellow-green eyes, also looking at the camera with an alert, curious expression.

Both are professional-looking portrait photos with white backgrounds. Is there something specific you'd like me to help you with regarding these
images?

Help details (showing paste command)

[default] 7% > /help

q (Amazon Q Chat)

Usage: /<COMMAND>

Commands:
  quit        Quit the application
  clear       Clear the conversation history
  agent       Manage agents
  context     Manage context files for the chat session
  editor      Open $EDITOR (defaults to vi) to compose a prompt
  reply       Open $EDITOR with the most recent assistant message quoted for reply
  compact     Summarize the conversation to free up context space
  tools       View tools and permissions
  issue       Create a new Github issue or make a feature request
  logdump     Create a zip file with logs for support investigation
  changelog   View changelog for Amazon Q CLI
  prompts     View and retrieve prompts
  hooks       View context hooks
  usage       Show current session's context window usage
  mcp         See mcp server loaded
  model       Select a model for the current conversation session
  experiment  Toggle experimental features
  subscribe   Upgrade to a Q Developer Pro subscription for increased query limits
  save        Save the current conversation
  load        Load a previous conversation
  todos       View, manage, and resume to-do lists
  paste       Paste an image from clipboard
  help        Print this message or the help of the given subcommand(s)

Options:
  -h, --help
          Print help (see a summary with '-h')

Error cases

Large images (> 10MB)
Screenshot 2025-10-03 at 11 55 49 AM

Too many images (>10)
Screenshot 2025-10-03 at 11 51 12 AM

No image

Screenshot 2025-10-03 at 11 49 51 AM

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@joshrutkowski joshrutkowski marked this pull request as ready for review October 3, 2025 14:18
#[derive(Debug)]
struct PasteStateInner {
paths: Vec<PathBuf>,
count: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a separate paths and count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count field is used to generate the marker text [Image #N] but this probably can be simplified with paths.len()

})?;

// Try to guess format from raw bytes, fallback to PNG
let format = guess_format(&image_data.bytes).unwrap_or(ImageFormat::Png);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression from the previous implementation? Checking the clipboard docs it looks like clipboard content is always encoded as a list of rgba values - https://docs.rs/arboard/latest/arboard/struct.ImageData.html

This will always just be png compatible, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout - let me go back to that approach. You're right, it's always rgba values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, these tests don't seem to be verifying anything relevant in this PR? What are these doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more tests to existing image handling, primarily. If not desired, I can remove

@dingfeli
Copy link
Contributor

Hi Josh. Thanks for the contribution. If I understand this correctly it looks this is:

  1. creating a temporary version of images that are in the clipboard
  2. pasting the paths to these temp files
  3. asking the model to use fs read to view these images

I think the api client already support image types so we can probably just use that instead of relying on another round trip. What do you think?

@kkashilk kkashilk merged commit b6f7819 into aws:main Oct 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants