-
Notifications
You must be signed in to change notification settings - Fork 192
feat: add image support to LlamaCppChatGenerator #2197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add image support to LlamaCppChatGenerator #2197
Conversation
The Linux CI failure, I think it is because of a package download issue rather than an implementation issue since when I opened the PR, all checks were successful. |
Hello @Chimnay, thanks for your work! In general, I have mixed feeling when I work on this integration. Diving into this PR:
I'm writing this based on reading the llama-cpp-python docs, so I might have missed some details. If that's the case, please let me know. |
Hi @anakin87, I believe I have addressed your feedback in my latest commit. For multimodal support, the chat handlers are established and stable in the current version. To address your feedback:
I think your suggestions were valid and I have implemented them. I did encounter one detail you might have missed which the reason for the remaining type: ignore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going in a good direction. I left some comments.
...s/llama_cpp/src/haystack_integrations/components/generators/llama_cpp/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...s/llama_cpp/src/haystack_integrations/components/generators/llama_cpp/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...s/llama_cpp/src/haystack_integrations/components/generators/llama_cpp/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @ChinmayBansal, thank you again!
I felt free to simplify some aspects.
I also took the opportunity for using smaller models in the tests, hoping that CI can get faster. I also did another minor adjustments.
I'll merge this PR as soon as tests pass.
Hi @anakin87, I see you removed some complex logic, are we assuming that the user passes in the exact class name? I wanted to confirm the experience from a user POV since these changes a little less user friendly (requiring exact class name knowledge). Both approaches are valid, just wanted to confirm that this is the right direction. |
Yes, I think that the user should pass the exact class name. This reduces the complexity and maintenance efforts. Plus, I found this docs page with these names: https://llama-cpp-python.readthedocs.io/en/latest/#multi-modal-models. I linked it in the docstrings, so I hope that this will be clear for users. Thank you! |
Related Issues
LlamaCppChatGenerator
#2132Proposed Changes:
This PR adds multimodal (image + text) support to LlamaCppChatGenerator, enabling the component to process
both text and images in chat messages. The implementation follows established patterns from the
AnthropicChatGenerator multimodal support (PR #2186).
Key Features Added:
chat_handler
andclip_model_path
parametersImplementation Details:
_convert_message_to_llamacpp_format()
function to handle multimodal content while preservingorder
warm_up()
method with Llava15ChatHandler supportHow did you test it?
Unit Tests:
test_convert_message_to_llamacpp_format_with_image()
- Tests proper multimodal message conversiontest_convert_message_to_llamacpp_format_with_unsupported_mime_type()
- Tests error handling forunsupported formats
test_convert_message_to_llamacpp_format_with_none_mime_type()
- Tests edge case with None mime typetest_convert_message_to_llamacpp_format_image_in_non_user_message()
- Tests role-based restrictionstest_multimodal_message_processing()
- Tests end-to-end multimodal processing with mocked modelCode Quality Verification:
hatch run fmt
hatch run test:types
hatch run test:unit
Manual Verification:
Notes for the reviewer
anthropic/chat_generator.py)
image_url
structure is required by LlamaCpp for multimodal processingChecklist
guidelines and the
code of conduct
my PR title:
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.