Skip to content

Conversation

@netomi
Copy link
Contributor

@netomi netomi commented Nov 4, 2025

This fixes #1346 .

A CharacterEncodingFilter is added to enforce the use of UTF-8 encoding for any api request.

@chrisguindon
Copy link
Member

@amvanbaren - At your earliest convenience, could you please take a look at this MR?

@netomi
Copy link
Contributor Author

netomi commented Nov 5, 2025

fyi: this is just one solution to the problem, I am happy to discuss other approaches but we should certainly ensure utf-8 encoding throughout the server imho.

@netomi
Copy link
Contributor Author

netomi commented Nov 6, 2025

another option would be to add that to the application.yaml:

server:
  servlet:
    encoding:
      charset: UTF-8 # its already the default, just to make it clear that this is what we want
      force: true

Other option would be to explicitly set the content encoding to UTF-8 for all responses, but that is tedious and you might miss some occurrences.

The downside of updating the configuration is that you must ensure that it is configured like that for your instance instead of hardcoding it in the application itself.

@amvanbaren
Copy link
Contributor

fyi: this is just one solution to the problem, I am happy to discuss other approaches but we should certainly ensure utf-8 encoding throughout the server imho.

Throughout or only /api?

@netomi
Copy link
Contributor Author

netomi commented Nov 10, 2025

so the change is currently for /api as these routes are most affected, but the whole app should probably default to utf-8. Not sure why its not the case, the spring documentation on this is rather sparse.

Some claim that this is the default, but I failed to find official documentation about it. Maybe just the force parameter is not set, so the default might be UTF-8.

@amvanbaren
Copy link
Contributor

This works for local storage, but not for cloud storage.

@netomi
Copy link
Contributor Author

netomi commented Nov 20, 2025

I could not test yet on a cloud storage, so I feared that it will not work.

Digging more into this topic, you can actually set properties for files stored in a blob: https://learn.microsoft.com/en-us/rest/api/storageservices/set-blob-properties?tabs=microsoft-entra-id

That should also include content type and encoding, so we should change the existing storage provider to set the encoding to utf-8 by default.

The question is how we modify existing files, there are currently 1.3M entries, of which there are several 100k text / json files which should be changed afaict

@netomi netomi marked this pull request as draft December 2, 2025 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

License text in unicode isn't displayed correctly

3 participants