Skip to content

Conversation

@refack
Copy link

@refack refack commented Apr 25, 2025

Fixes #44

@refack
Copy link
Author

refack commented Apr 25, 2025

Left is content of the released tarball
Right is output after this patch
image
image

@PeterFeicht
Copy link
Owner

Sorry for not responding earlier. I meant to check if this is needed elsewhere too, because I think there's more than one place that creates a parser.

@PeterFeicht
Copy link
Owner

Please also change the occurrences in preprocess_cssless.py, index2ddg.py, test/test_preprocess.py, and test/test_preprocess_cssless.py.

It would also be great if you could add a test to test/test_preprocess.py, but that's not a blocker for me.

@refack
Copy link
Author

refack commented Sep 23, 2025

Please also change the occurrences in...

👍

It would also be great if you could add a test to

I'm out of the flow, so I'm not sure how to write a test for this...

@refack
Copy link
Author

refack commented Sep 23, 2025

I added a second commit that also makes opening the files explicitly as UTF-8

with open(src_path, 'r', encoding='utf-8') as a_file:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect handling of UTF-8 encoding during preprocessing

2 participants