Skip to content

Vision Service: Improve caption prompt and allow submitting a custom prompt with the request data #11

Open
@lastzero

Description

@lastzero

This task includes improving the default caption generation prompt(s) and allowing a custom prompt to be submitted:

  • For the intended use, image captions should not start with "Here's a detailed description of the image:...", "This image...", or "An image of...", but should simply describe the content of the image (and not be extremely long or markdown formatted, see screenshot below).
  • In addition, it should be possible to submit a custom prompt in the request data along with the model name and version, so that users can customize the prompt for the model they have chosen without touching the photoprism-vision service.

Examples of good/better image captions:

  • A cat sleeping with its head resting on the strings of an instrument.
  • A vibrant, full poppy flower in rich shades of red and pink.
  • A young woman, likely in her late 20s or early 30s. She is smiling broadly, appearing friendly and approachable.

Of course, the caption doesn't have to start with "A" and can be longer than these examples.

To illustrate, here is a screenshot of the captions generated by the gemma3 model with the current prompt:

Image

Related Issues:

Metadata

Metadata

Labels

aiArtificial Intelligence, Machine Learning (ML)captionsCaption Generationhelp wantedHelp with this would be much appreciated!python 🐍Python experience required

Type

Projects

Status

Upcoming ⏳

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions