Feat: Add Analyze Image Task Type #226

lukasdotcom · 2025-07-01T17:01:26Z

The name for this in my opinion is not great. Some ideas I had: Image Question (Current), or Picture Chat. This should probably also be added as a task type on server.
Relevant server pr: nextcloud/server#53763

Signed-off-by: Lukas Schaefer <[email protected]>

julien-nc

Nice!
The name could be better indeed. How about "Analyze image"?

lib/TaskProcessing/ImageQuestionTaskType.php

lib/TaskProcessing/ImageQuestionProvider.php

kyteinsky

Thanks!

lib/TaskProcessing/ImageQuestionProvider.php

Signed-off-by: Lukas Schaefer <[email protected]>

julien-nc

Optional non-necessary change suggestion.

lib/TaskProcessing/AnalyzeImageProvider.php

kyteinsky

🚀

lib/TaskProcessing/AnalyzeImageTaskType.php

lib/TaskProcessing/AnalyzeImageProvider.php

Signed-off-by: Lukas Schaefer <[email protected]>

lib/TaskProcessing/AnalyzeImageProvider.php

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom · 2025-07-03T13:32:39Z

I changed the task type to allow multiple images as an input like recommended by @kyteinsky to allow for comparing images

kyteinsky

nice!

from the image requirements https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=url#image-input-requirements
the max payload size is 50 MB and the max no. of files is 500.
What do you think, should we let the request fail or adjust the request to the limit and make it succeed? It might not be accurate when some images have been dropped, but past 50 images, can the model even work coherently at that point?

lib/TaskProcessing/AnalyzeImageTaskType.php

lib/TaskProcessing/AnalyzeImageProvider.php

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom · 2025-07-03T15:24:49Z

nice!

from the image requirements https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=url#image-input-requirements the max payload size is 50 MB and the max no. of files is 500. What do you think, should we let the request fail or adjust the request to the limit and make it succeed? It might not be accurate when some images have been dropped, but past 50 images, can the model even work coherently at that point?

I implemented most of the changes given and renamed the file to analyzeimages while updating all that on the server pr to.

About adding a limit. Might also be a good idea to limit the amount of images to prevent the worker from crashing due to running out of memory. Right now all the data is loaded into memory when base64 encoded. Not sure if we should do file size and count though. File size requires a lot more work.

julien-nc · 2025-07-03T15:36:23Z

A simple approach would be to limit the number of images to a relatively low number like 20 or 30 and also limit sum of image sizes to something lower than 50 MB (or 40 MB seems reasonable to keep a margin for the prompt).
The task would fail if the input is above the limits.
The provider could set an informative message when failing, mentioning if it's because of the number of images or the total payload size.
Wdyt?

julien-nc · 2025-07-03T15:37:59Z

I'm not in favor of sending a subset of the input images to make the request to the service succeed. There is no way for the user to know this has happened and why the result is no pertinent.

lukasdotcom · 2025-07-03T16:21:36Z

A simple approach would be to limit the number of images to a relatively low number like 20 or 30 and also limit sum of image sizes to something lower than 50 MB (or 40 MB seems reasonable to keep a margin for the prompt). The task would fail if the input is above the limits. The provider could set an informative message when failing, mentioning if it's because of the number of images or the total payload size. Wdyt?

I think failing is probably the better option definitely after seeing that some models can actually attempt to understand 100s of images.

I did just try out a few questions on about 400 pictures and google's models could answer the questions.
Eg: "There is a picture of a farmers market. What is the name of the market?" or "There is a picture of police officers on horses. How many horses are there?"
So surprisingly many files could be understood in this simple test (might have gotten lucky though).

I'll probably just implement the official openai limits and fail if more than that is given.

Signed-off-by: Lukas Schaefer <[email protected]>

kyteinsky

looks good!

lib/TaskProcessing/AnalyzeImagesProvider.php

Co-authored-by: Anupam Kumar <[email protected]> Signed-off-by: Lukas Schaefer <[email protected]>

feat: add picture question

1c66b8c

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom requested a review from julien-nc July 1, 2025 17:01

julien-nc requested changes Jul 2, 2025

View reviewed changes

lib/TaskProcessing/ImageQuestionTaskType.php Outdated Show resolved Hide resolved

lib/TaskProcessing/ImageQuestionTaskType.php Outdated Show resolved Hide resolved

lib/TaskProcessing/ImageQuestionProvider.php Outdated Show resolved Hide resolved

kyteinsky reviewed Jul 2, 2025

View reviewed changes

lib/TaskProcessing/ImageQuestionProvider.php Outdated Show resolved Hide resolved

lib/TaskProcessing/ImageQuestionProvider.php Outdated Show resolved Hide resolved

lib/TaskProcessing/ImageQuestionProvider.php Outdated Show resolved Hide resolved

fix feedback and rename to AnalyzeImage

9705e8a

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom requested review from julien-nc and kyteinsky July 2, 2025 12:35

julien-nc approved these changes Jul 2, 2025

View reviewed changes

lib/TaskProcessing/AnalyzeImageProvider.php Outdated Show resolved Hide resolved

lukasdotcom changed the title ~~Feat: Add asking question about picture~~ Feat: Add Analyze Image Task Type Jul 2, 2025

kyteinsky approved these changes Jul 2, 2025

View reviewed changes

lib/TaskProcessing/AnalyzeImageTaskType.php Outdated Show resolved Hide resolved

lib/TaskProcessing/AnalyzeImageProvider.php Outdated Show resolved Hide resolved

add feedback and only load task type if needed

4a2841a

Signed-off-by: Lukas Schaefer <[email protected]>

kyteinsky reviewed Jul 2, 2025

View reviewed changes

lib/TaskProcessing/AnalyzeImageProvider.php Outdated Show resolved Hide resolved

correct tasktypeid

9ed215a

Signed-off-by: Lukas Schaefer <[email protected]>

kyteinsky mentioned this pull request Jul 2, 2025

Image generation with Mistral not working #223

Closed

support multiple images

7b78656

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom force-pushed the picture branch from 52f799a to 7b78656 Compare July 3, 2025 13:30

lukasdotcom requested review from kyteinsky and julien-nc July 3, 2025 13:31

lukasdotcom mentioned this pull request Jul 3, 2025

feat(TaskProcessing): Add AnalyzeImage TaskType nextcloud/server#53763

Merged

6 tasks

kyteinsky reviewed Jul 3, 2025

View reviewed changes

lib/TaskProcessing/AnalyzeImageTaskType.php Outdated Show resolved Hide resolved

lib/TaskProcessing/AnalyzeImageTaskType.php Outdated Show resolved Hide resolved

lib/TaskProcessing/AnalyzeImageProvider.php Outdated Show resolved Hide resolved

julien-nc approved these changes Jul 3, 2025

View reviewed changes

implement most feedback

40d91cb

Signed-off-by: Lukas Schaefer <[email protected]>

Add file size and file count limit

8f389f1

Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom added enhancement New feature or request 3. to review labels Jul 3, 2025

julien-nc mentioned this pull request Jul 4, 2025

Context action for images to ask for description nextcloud/text#7393

Open

kyteinsky approved these changes Jul 8, 2025

View reviewed changes

lib/TaskProcessing/AnalyzeImagesProvider.php Outdated Show resolved Hide resolved

Update lib/TaskProcessing/AnalyzeImagesProvider.php

8f6c8cb

Co-authored-by: Anupam Kumar <[email protected]> Signed-off-by: Lukas Schaefer <[email protected]>

lukasdotcom merged commit b717552 into main Jul 8, 2025
32 of 34 checks passed

lukasdotcom deleted the picture branch July 8, 2025 15:11

kyteinsky mentioned this pull request Jul 9, 2025

3.6.0 #232

Merged

Feat: Add Analyze Image Task Type #226

Feat: Add Analyze Image Task Type #226

Uh oh!

Conversation

lukasdotcom commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julien-nc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyteinsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julien-nc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kyteinsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasdotcom commented Jul 3, 2025

Uh oh!

kyteinsky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasdotcom commented Jul 3, 2025

Uh oh!

julien-nc commented Jul 3, 2025

Uh oh!

julien-nc commented Jul 3, 2025

Uh oh!

lukasdotcom commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyteinsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasdotcom commented Jul 1, 2025 •

edited

Loading

kyteinsky left a comment •

edited

Loading

lukasdotcom commented Jul 3, 2025 •

edited

Loading