-
Notifications
You must be signed in to change notification settings - Fork 16
Feat: Add Analyze Image Task Type #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lukas Schaefer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
The name could be better indeed. How about "Analyze image"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Lukas Schaefer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional non-necessary change suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Signed-off-by: Lukas Schaefer <[email protected]>
Signed-off-by: Lukas Schaefer <[email protected]>
Signed-off-by: Lukas Schaefer <[email protected]>
I changed the task type to allow multiple images as an input like recommended by @kyteinsky to allow for comparing images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
from the image requirements https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=url#image-input-requirements
the max payload size is 50 MB and the max no. of files is 500.
What do you think, should we let the request fail or adjust the request to the limit and make it succeed? It might not be accurate when some images have been dropped, but past 50 images, can the model even work coherently at that point?
Signed-off-by: Lukas Schaefer <[email protected]>
I implemented most of the changes given and renamed the file to analyzeimages while updating all that on the server pr to. About adding a limit. Might also be a good idea to limit the amount of images to prevent the worker from crashing due to running out of memory. Right now all the data is loaded into memory when base64 encoded. Not sure if we should do file size and count though. File size requires a lot more work. |
A simple approach would be to limit the number of images to a relatively low number like 20 or 30 and also limit sum of image sizes to something lower than 50 MB (or 40 MB seems reasonable to keep a margin for the prompt). |
I'm not in favor of sending a subset of the input images to make the request to the service succeed. There is no way for the user to know this has happened and why the result is no pertinent. |
I think failing is probably the better option definitely after seeing that some models can actually attempt to understand 100s of images. I did just try out a few questions on about 400 pictures and google's models could answer the questions. I'll probably just implement the official openai limits and fail if more than that is given. |
Signed-off-by: Lukas Schaefer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
Co-authored-by: Anupam Kumar <[email protected]> Signed-off-by: Lukas Schaefer <[email protected]>
The name for this in my opinion is not great. Some ideas I had: Image Question (Current), or Picture Chat. This should probably also be added as a task type on server.
Relevant server pr: nextcloud/server#53763