[Contribution - Hackaton_2025] Add VLM model and UI support to Virtual AI Assistant Demo #196 #225

fractalito · 2025-03-07T18:58:09Z

Summary

Enhancement of the existing Virtual AI Assistant demo to understand visual inputs of up to 1,003,520 pixels on any resolution.

Detailed Description

The PR is developed to integrate a Vision-Language Model (VLM) supported into the Virtual AI Assistant demo allowing the it to describe or answer questions of any image, it also updates the User Interface to supports visual inputs allowing users to upload its own images.

The Vision-Language model chosen is Qwen2.5-vl because is an efficient Vision Encoder capable of analyzing texts, charts, icons, graphics, and layouts within images of up to 1,003,520 pixels on any resolution, this model can be integrated in any of the 5 Virtual personalities that has been provided, describing each detail of the image helping us to understand more about our queries.

These changes has been tested in the agrobot personality.
Example of the updated user interface
The updates were:

A new option to load an image in the left section using gradio.
When image is loaded it displays the option chosen whit the question that the user sends to keep in track for what is asking.
The output shows a detailed description of the image:
After an image is used its color will turn grey.

Deploying

To run this demo apply all the steps on the README.

Testing

The inputs considered for the integration of the model to the demo where an image of HD resolution and an input text about the user query.

The chatbot has been tested in a local instance, ensuring that when an image is sent, the model detects it and temporally saves it. The VLM model works sepparately from the principal LLM models with personalities, but the history is then shared with the principal LLM model ensuring context is not lost after the VLM model interaction.

Challenges

Issues related to model optimization and bit input were not tackled in this PR due to time constraints.

Continous development and integration for this Issue were slow due to all chat models dependencies and initializations, there is a great area of opportunity for future developments of this demo.

To run the original model, we used a server with Intel Atom GRR in a contenerized environment, due to dependencies missmatch and system capacity. Testing on a local PC is still pending.

Modifications needed in the future

Webcam UI interfface was added however we found some challenges when accessing the webcam from the client desktop. A Future enhancement would be to modify the demo to fix this issue.

The option for clipboard input that is beside the Webcam input is also out of service.

…l AI Assistant Demo openvinotoolkit#196 Enhancement of the existing Virtual AI Assistant demo to understand visual inputs of up to 1,003,520 pixels on any resolution. Fixes openvinotoolkit#196 * [feat] virtual_ai_assistant/Dockerfile: Personality as build argument and mount model weights as a volume * [fix] virtual_ai_assistant/Dockerfile: Personality after requirements are installed * [feat] virtual_ai_assistant: Added QWEN2.5 VLM dependencies to requirements.txt * [feat] virtual_ai_assistant/Dockerfile: install git in container to install latest optimum-intel version from github * [feat] virtual_ai_assistant/main: Added new vlm_model parameter to script and new load_vlm_model function. Co-authored-by: fjescala <[email protected]> * [fix] virtual_ai_assistant/requirements: transformer dependencies alligned with jupyter notebook * [feat] virtual_ai_assistant: Image input UI * [feat] virtual_ai_assistant: llm_chat & vlm_chat * [feat] virtual_ai_assistant: image_to_grayscale * [feat] virtual_ai_assistant: new variable current_model to choose between llm_chat & vlm_chat * [feat] virtual_ai_assistant/readme: instructions to run demo in docker container --------- Co-authored-by: Armando Cruz <[email protected]> Co-authored-by: Fabian Escalante <[email protected]> Co-authored-by: Katherine Salcido <[email protected]> Signed-off-by: Alondra Parra <[email protected]> Signed-off-by: Eliana Pacheco <[email protected]>

adrianboguszewski · 2025-03-28T15:42:42Z

I like the docker approach here. Let's split it into two PRs. One provides a docker file for this demo and the second has all vlm changes

fractalito · 2025-03-28T22:50:17Z

Glad to hear you see value on the Docker containerization of this demo, it was very helpful for us while continuous developing and testing the enhancements for the Virtual AI Assistant demo.

We will be working on splitting this PR and will let you know in the comments when it's finished, and the new PR is created.

fractalito · 2025-04-05T02:41:26Z

Here is the new PR with the Dockerfile and build instructions #244 I suggest this new PR is merged first, before the one from this conversation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Contribution - Hackaton_2025] Add VLM model and UI support to Virtual AI Assistant Demo #196 #225

[Contribution - Hackaton_2025] Add VLM model and UI support to Virtual AI Assistant Demo #196 #225

Uh oh!

$@fractalito$ fractalito commented Mar 7, 2025

Uh oh!

adrianboguszewski commented Mar 28, 2025

Uh oh!

fractalito commented Mar 28, 2025

Uh oh!

fractalito commented Apr 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Contribution - Hackaton_2025] Add VLM model and UI support to Virtual AI Assistant Demo #196 #225

Are you sure you want to change the base?

[Contribution - Hackaton_2025] Add VLM model and UI support to Virtual AI Assistant Demo #196 #225

Uh oh!

Conversation

fractalito commented Mar 7, 2025

Summary

Detailed Description

Deploying

Testing

Challenges

Modifications needed in the future

Uh oh!

adrianboguszewski commented Mar 28, 2025

Uh oh!

fractalito commented Mar 28, 2025

Uh oh!

fractalito commented Apr 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

$@fractalito$ fractalito commented Mar 7, 2025