Skip to content

Enable URL and binary PDF for Mistral #2267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 23, 2025

Conversation

pintaf
Copy link
Contributor

@pintaf pintaf commented Jul 21, 2025

I tried to use DocumentURLChunk as it was done for Image, but contrarily to Images where ImageUrl is exported by Mistral sdk, there is no export of DocumentUrl...

This has been properly tested with both kinds of PDFs (binary and URL)

Happy to change to [Mistral]DocumentURLChunk if you tell me how to make it work because I did not succeed...

@DouweM
Copy link
Contributor

DouweM commented Jul 21, 2025

@pintaf Thank you! Looks like the mistralai package needs to have its version bumped to get DocumentUrlChunk: https://github.com/mistralai/client-python/blob/main/src/mistralai/models/documenturlchunk.py. It would also be good to have a new test for this. You can look at the tests for other models that document support for inspiration.

pintaf and others added 2 commits July 22, 2025 12:44
I tried to use DocumentURLChunk as it was done for Image, but contrarily to Images where ImageUrl is exported by Mistral sdk, there is no export of DocumentUrl...

This has been properly tested with both kinds of PDFs (binary and URL)

Happy to change to [Mistral]DocumentURLChunk if you tell me how to make it work because I did not succeed...

update mistral deps

Update to use MistralDocumentUrlChunk

fix linting

added document_type

fix linting again

Update uv.lock
@pintaf pintaf force-pushed the feat/enable-pdf-mistral branch from ac34a6e to 4a40240 Compare July 22, 2025 11:26
@pintaf
Copy link
Contributor Author

pintaf commented Jul 22, 2025

Hi.
I have squashed many useless commits to make it more readable.
I have updated the mistral deps by changing pyproject.toml, and running uv lock, but this is the first time I use uv (Im used to poetry), so let me know if any issues.

I added tests, but this is the first time I do tests in python, so complete n00b.
First, I had to deal with a Type "Literal[1704067200] | None" is not assignable to type "int" error. Most probably due to the mistral package update. Earlier the MistralChatCompletionResponse might have allowed None values for created, but this is not the case anymore it seems. I tried to fix that properly, but a second opinion is more than welcome.

In order to add the two tests for binary and URL pdf, I simply copied the tests for binary and URL images from mistral.
I checked how tests were done for example for the gemini model, and I seen they differed significantly.
as @Kludex it seems you participated in both the writing of mistral and gemini model tests, maybe your view on how I designed the test might be valuable.
I see the coverage is not 100%, could you tell be what to do to reach 100% ?

for the binary PDF, I tried to find a really small binary PDF, and found something in here

Looking forward to your remarks.

@pintaf pintaf requested a review from DouweM July 23, 2025 11:59
@DouweM DouweM merged commit f3ad3e6 into pydantic:main Jul 23, 2025
18 checks passed
@DouweM
Copy link
Contributor

DouweM commented Jul 23, 2025

@pintaf Thanks Loïc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants