feat: add image support to LlamaCppChatGenerator #2197

ChinmayBansal · 2025-08-19T01:48:27Z

Related Issues

fixes Image support in LlamaCppChatGenerator #2132

Proposed Changes:

This PR adds multimodal (image + text) support to LlamaCppChatGenerator, enabling the component to process
both text and images in chat messages. The implementation follows established patterns from the
AnthropicChatGenerator multimodal support (PR #2186).

Key Features Added:

Image format validation for supported formats (JPEG, PNG, GIF, WebP)
Proper message conversion to LlamaCpp OpenAI-compatible format with base64 data URIs
Support for multimodal models through chat handlers and CLIP models
Enhanced component initialization with chat_handler and clip_model_path parameters
Role-based image restrictions (images only allowed in user messages)
Comprehensive error handling for unsupported formats and edge cases

Implementation Details:

Updated _convert_message_to_llamacpp_format() function to handle multimodal content while preserving
order
Added multimodal model initialization in warm_up() method with Llava15ChatHandler support
Enhanced component docstring with detailed usage examples for multimodal scenarios
Added proper serialization/deserialization support for new parameters

How did you test it?

Unit Tests:

✅ test_convert_message_to_llamacpp_format_with_image() - Tests proper multimodal message conversion
✅ test_convert_message_to_llamacpp_format_with_unsupported_mime_type() - Tests error handling for
unsupported formats
✅ test_convert_message_to_llamacpp_format_with_none_mime_type() - Tests edge case with None mime type
✅ test_convert_message_to_llamacpp_format_image_in_non_user_message() - Tests role-based restrictions
✅ test_multimodal_message_processing() - Tests end-to-end multimodal processing with mocked model

Code Quality Verification:

✅ All linting checks pass: hatch run fmt
✅ All type checking passes: hatch run test:types
✅ All unit tests pass: hatch run test:unit

Manual Verification:

Tested multimodal message creation and conversion
Verified proper error messages for validation failures
Confirmed component initialization with multimodal parameters

Notes for the reviewer

The implementation closely follows the patterns established in AnthropicChatGenerator (lines 137-167 in
anthropic/chat_generator.py)
Image validation uses the same error message format as Anthropic for consistency
The OpenAI-compatible format with image_url structure is required by LlamaCpp for multimodal processing
Added comprehensive test coverage that matches and exceeds the patterns used in Anthropic tests
All edge cases are properly handled including None mime types and role restrictions

Checklist

I have read the contributors
guidelines and the
code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for
my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

ChinmayBansal · 2025-08-19T06:46:26Z

The Linux CI failure, I think it is because of a package download issue rather than an implementation issue since when I opened the PR, all checks were successful.

anakin87 · 2025-08-20T14:45:23Z

Hello @Chimnay, thanks for your work!

In general, I have mixed feeling when I work on this integration.
The main point is that we use llama-cpp-python bindings, a project that is not constantly maintained and up to date with llama.cpp features (for example, tool calling lags significantly behind current llama.cpp capabilities). In the long run, I am not sure that depending on this project is the best way to support Haystack users who want to use llama.cpp.

Diving into this PR:

a core requirement is that the dictionary produced by to_dict() is JSON serializable, so we can't directly use the chat_handler. I would propose using chat_handler_name instead, pointing users to handlers in https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_chat_format.py. I think this can then be imported using importlib.
if users specify a model_clip_path, they must also provide a chat_handler_name
let's choose a small model (maybe moondream2) and add one integration test
let's try to remove type:ignore wherever possible (sometimes LLMs and coding assistants are not bad at that if you force them for a while :-))

I'm writing this based on reading the llama-cpp-python docs, so I might have missed some details. If that's the case, please let me know.

ChinmayBansal · 2025-08-20T16:50:32Z

Hi @anakin87,

I believe I have addressed your feedback in my latest commit. For multimodal support, the chat handlers are established and stable in the current version.

To address your feedback:

I am now using chat_handler_name instead of the handler object.
I added more validation that if users specify chat_handler_name, they must also provide a model_clip_path.
I added an integration test with moondream2 and also skipped when the model file isn't available.
I reduced the instances of type:ignore from 3 to 1. I used cast() for most cases and kept one for the nest image URL structure. This is needed since llama-cpp-python's ChatCompletionMessage type system does not properly handle the nested dictionary structure needed for mutlimodal content.

I think your suggestions were valid and I have implemented them. I did encounter one detail you might have missed which the reason for the remaining type: ignore.

anakin87

I think we are going in a good direction. I left some comments.

...s/llama_cpp/src/haystack_integrations/components/generators/llama_cpp/chat/chat_generator.py

integrations/llama_cpp/tests/test_chat_generator.py

anakin87

Hey, @ChinmayBansal, thank you again!

I felt free to simplify some aspects.
I also took the opportunity for using smaller models in the tests, hoping that CI can get faster. I also did another minor adjustments.

I'll merge this PR as soon as tests pass.

ChinmayBansal · 2025-08-22T17:00:43Z

Hi @anakin87,

I see you removed some complex logic, are we assuming that the user passes in the exact class name?

I wanted to confirm the experience from a user POV since these changes a little less user friendly (requiring exact class name knowledge). Both approaches are valid, just wanted to confirm that this is the right direction.

anakin87 · 2025-08-22T17:07:01Z

Yes, I think that the user should pass the exact class name. This reduces the complexity and maintenance efforts.

Plus, I found this docs page with these names: https://llama-cpp-python.readthedocs.io/en/latest/#multi-modal-models. I linked it in the docstrings, so I hope that this will be clear for users.
(There is a small typo in that page but I opened a PR to fix it).

Thank you!

feat: add multimodal support to LlamaCppChatGenerator

adcbe57

ChinmayBansal requested a review from a team as a code owner August 19, 2025 01:48

ChinmayBansal requested review from davidsbatista and removed request for a team August 19, 2025 01:48

github-actions bot added integration:llama_cpp type:documentation Improvements or additions to documentation labels Aug 19, 2025

davidsbatista requested a review from anakin87 August 19, 2025 05:44

Merge branch 'main' into feat/llama-cpp-multimodal-support

0d23c68

Merge branch 'main' into feat/llama-cpp-multimodal-support

c32d534

address PR feedback

2e9fa07

anakin87 requested changes Aug 21, 2025

View reviewed changes

ChinmayBansal and others added 3 commits August 21, 2025 11:17

feat: address PR feedback

ba1f4c7

Merge branch 'main' into llama-image

bb1883a

simplify; smaller models; workflow maintenance

2e4f46c

github-actions bot added the topic:CI label Aug 22, 2025

type fix

5f525f1

anakin87 changed the title ~~feat: add multimodal support to LlamaCppChatGenerator~~ feat: add image support to LlamaCppChatGenerator Aug 22, 2025

anakin87 approved these changes Aug 22, 2025

View reviewed changes

anakin87 merged commit be68edd into deepset-ai:main Aug 22, 2025
11 checks passed

ChinmayBansal deleted the feat/llama-cpp-multimodal-support branch August 22, 2025 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add image support to LlamaCppChatGenerator #2197

feat: add image support to LlamaCppChatGenerator #2197

Uh oh!

ChinmayBansal commented Aug 19, 2025

Uh oh!

ChinmayBansal commented Aug 19, 2025

Uh oh!

anakin87 commented Aug 20, 2025

Uh oh!

ChinmayBansal commented Aug 20, 2025

Uh oh!

anakin87 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anakin87 left a comment

Uh oh!

Uh oh!

ChinmayBansal commented Aug 22, 2025

Uh oh!

anakin87 commented Aug 22, 2025

Uh oh!

Uh oh!

feat: add image support to LlamaCppChatGenerator #2197

feat: add image support to LlamaCppChatGenerator #2197

Uh oh!

Conversation

ChinmayBansal commented Aug 19, 2025

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

ChinmayBansal commented Aug 19, 2025

Uh oh!

anakin87 commented Aug 20, 2025

Uh oh!

ChinmayBansal commented Aug 20, 2025

Uh oh!

anakin87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anakin87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChinmayBansal commented Aug 22, 2025

Uh oh!

anakin87 commented Aug 22, 2025

Uh oh!

Uh oh!