Skip to content

Feat: Add Analyze Image Task Type #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 8, 2025
Merged

Feat: Add Analyze Image Task Type #226

merged 8 commits into from
Jul 8, 2025

Conversation

lukasdotcom
Copy link
Member

@lukasdotcom lukasdotcom commented Jul 1, 2025

The name for this in my opinion is not great. Some ideas I had: Image Question (Current), or Picture Chat. This should probably also be added as a task type on server.
Relevant server pr: nextcloud/server#53763

Signed-off-by: Lukas Schaefer <[email protected]>
@lukasdotcom lukasdotcom requested a review from julien-nc July 1, 2025 17:01
Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
The name could be better indeed. How about "Analyze image"?

Copy link
Contributor

@kyteinsky kyteinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@julien-nc julien-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional non-necessary change suggestion.

@lukasdotcom lukasdotcom changed the title Feat: Add asking question about picture Feat: Add Analyze Image Task Type Jul 2, 2025
Copy link
Contributor

@kyteinsky kyteinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Signed-off-by: Lukas Schaefer <[email protected]>
Signed-off-by: Lukas Schaefer <[email protected]>
@lukasdotcom
Copy link
Member Author

I changed the task type to allow multiple images as an input like recommended by @kyteinsky to allow for comparing images

Copy link
Contributor

@kyteinsky kyteinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

from the image requirements https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=url#image-input-requirements
the max payload size is 50 MB and the max no. of files is 500.
What do you think, should we let the request fail or adjust the request to the limit and make it succeed? It might not be accurate when some images have been dropped, but past 50 images, can the model even work coherently at that point?

Signed-off-by: Lukas Schaefer <[email protected]>
@lukasdotcom
Copy link
Member Author

nice!

from the image requirements https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=url#image-input-requirements the max payload size is 50 MB and the max no. of files is 500. What do you think, should we let the request fail or adjust the request to the limit and make it succeed? It might not be accurate when some images have been dropped, but past 50 images, can the model even work coherently at that point?

I implemented most of the changes given and renamed the file to analyzeimages while updating all that on the server pr to.

About adding a limit. Might also be a good idea to limit the amount of images to prevent the worker from crashing due to running out of memory. Right now all the data is loaded into memory when base64 encoded. Not sure if we should do file size and count though. File size requires a lot more work.

@julien-nc
Copy link
Member

A simple approach would be to limit the number of images to a relatively low number like 20 or 30 and also limit sum of image sizes to something lower than 50 MB (or 40 MB seems reasonable to keep a margin for the prompt).
The task would fail if the input is above the limits.
The provider could set an informative message when failing, mentioning if it's because of the number of images or the total payload size.
Wdyt?

@julien-nc
Copy link
Member

I'm not in favor of sending a subset of the input images to make the request to the service succeed. There is no way for the user to know this has happened and why the result is no pertinent.

@lukasdotcom
Copy link
Member Author

lukasdotcom commented Jul 3, 2025

A simple approach would be to limit the number of images to a relatively low number like 20 or 30 and also limit sum of image sizes to something lower than 50 MB (or 40 MB seems reasonable to keep a margin for the prompt). The task would fail if the input is above the limits. The provider could set an informative message when failing, mentioning if it's because of the number of images or the total payload size. Wdyt?

I think failing is probably the better option definitely after seeing that some models can actually attempt to understand 100s of images.

I did just try out a few questions on about 400 pictures and google's models could answer the questions.
Eg: "There is a picture of a farmers market. What is the name of the market?" or "There is a picture of police officers on horses. How many horses are there?"
So surprisingly many files could be understood in this simple test (might have gotten lucky though).

I'll probably just implement the official openai limits and fail if more than that is given.

@lukasdotcom lukasdotcom added enhancement New feature or request 3. to review labels Jul 3, 2025
Copy link
Contributor

@kyteinsky kyteinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

Co-authored-by: Anupam Kumar <[email protected]>
Signed-off-by: Lukas Schaefer <[email protected]>
@lukasdotcom lukasdotcom merged commit b717552 into main Jul 8, 2025
32 of 34 checks passed
@lukasdotcom lukasdotcom deleted the picture branch July 8, 2025 15:11
@kyteinsky kyteinsky mentioned this pull request Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants