Skip to content

Conversation

ChinmayBansal
Copy link
Contributor

Related Issues

Proposed Changes:

This PR adds multimodal support to AnthropicChatGenerator, enabling it to handle both text and image inputs
in user messages. The implementation follows the same patterns established by the HuggingFace and Amazon
Bedrock integrations.

Key changes:

  • Modified _convert_messages_to_anthropic_format() to handle ImageContent alongside TextContent
  • Added proper Anthropic vision API format using ImageBlockParam with base64 image encoding
  • Implemented content order preservation for mixed text/image messages
  • Added validation to prevent images in assistant messages (Anthropic API limitation)
  • Updated component docstring with multimodal usage example
  • Added comprehensive type safety with proper Anthropic types

Technical details:

  • Uses ImageBlockParam with proper media type casting for Anthropic's vision API
  • Supports all Anthropic-compatible image formats: JPEG, PNG, GIF, WebP
  • Maintains backward compatibility with existing text-only functionality
  • Integrates seamlessly with existing prompt caching and tool calling features

How did you test it?

Unit Tests:

  • ✅ All existing unit tests pass (49 passed, 4 skipped)
  • ✅ Added test_convert_message_to_anthropic_format_with_image to verify proper message conversion
  • ✅ Added validation test to ensure images in assistant messages raise appropriate errors

Integration Tests:

  • ✅ Added test_live_run_multimodal for real API testing (requires ANTHROPIC_API_KEY)
  • ✅ Uses test image from shared test assets (apple.jpg)

Code Quality:

  • ✅ Linting passes (hatch run fmt)
  • ✅ Type checking passes (hatch run test:types)
  • ✅ Follows existing code patterns and conventions

Manual Verification:

  • Tested with Claude Sonnet 3.5 using various image types
  • Verified proper error handling for unsupported scenarios
  • Confirmed multimodal messages work with different content ordering

Notes for the reviewer

  • The implementation closely follows the Amazon Bedrock integration and HuggingFace integration patterns for consistency
  • Image validation ensures only user messages can contain images (Anthropic API requirement)
  • Type annotations use proper Anthropic ImageBlockParam types for full type safety
  • Test image (apple.jpg) is shared from Bedrock integration's test assets
  • The cast() function is used for media type conversion to satisfy Anthropic's strict typing

Checklist

Suggested PR Title (using conventional commits):
feat: add multimodal support to AnthropicChatGenerator

@ChinmayBansal ChinmayBansal requested a review from a team as a code owner August 12, 2025 22:41
@ChinmayBansal ChinmayBansal requested review from mpangrazzi and removed request for a team August 12, 2025 22:41
@CLAassistant
Copy link

CLAassistant commented Aug 12, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added integration:anthropic type:documentation Improvements or additions to documentation labels Aug 12, 2025
@ChinmayBansal ChinmayBansal changed the title feat: add multimodal support to AnthropicChatGeneratorc feat: add multimodal support to AnthropicChatGenerator Aug 12, 2025
@anakin87 anakin87 self-requested a review August 13, 2025 07:30
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution!

This PR is already good. I left some comments.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@anakin87 anakin87 merged commit 66a8d9b into deepset-ai:main Aug 18, 2025
11 checks passed
@ChinmayBansal ChinmayBansal deleted the feat-anthropic-multimodal-support branch August 18, 2025 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:anthropic type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image support in AnthropicChatGenerator
3 participants