Skip to content

Conversation

@nikolay-banar
Copy link
Contributor

Following the discussion #3339 (comment)

year = {2025},
}
""",
prompt={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want an English prompt for a Dutch dataset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all our prompts in taskmetada are in English

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think there is any need to force that. For SEB v2 I will probably rework them to their respective languages)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. However, it is usually model-dependent. As I remember, e5-models are trained on English instructions. Also, I guess it should not be an issue for multilingual models because they are usually trained on a large portion of English instructions.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikolay-banar yeah the model providers can always replace the prompt if e.g. they only support English prompts. Ultimately I think it is decision for you to make as the developer of the benchmark. We can def. keep them English if you prefer

(last time I check e5 actually performed slightly better on SEB if you used a Danish prompt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KennethEnevoldsen I will translate the prompts then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KennethEnevoldsen What about multilingual datasets? Would it make sense to create Dutch versions with the corresponding prompts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KennethEnevoldsen What about multilingual datasets? Would it make sense to create Dutch versions with the corresponding prompts?

For multilingual dataset I would probably keep the prompts English to avoid having two versions on the task (we could probably have different prompt on different subsets - feel free to create an issue on this)

class ArguAnaNLv2(AbsTaskRetrieval):
ignore_identical_ids = True

metadata = TaskMetadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to add to the description what has changed between the two versions

@Samoed Samoed enabled auto-merge (squash) November 7, 2025 12:48
@Samoed Samoed merged commit 8f3f806 into embeddings-benchmark:main Nov 7, 2025
10 checks passed
@nikolay-banar nikolay-banar deleted the mteb-nl-prompt-fix branch November 13, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants