Skip to content

Update .gitattributes for the wrongencoding files #13811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ianhi
Copy link
Contributor

@ianhi ianhi commented Aug 4, 2025

Purpose

There was an existing .gitattributes for these files, but it wasn't working on my system (M2 mac). I was unable to restore or stash these files. Which made rebasing difficulty. This approach fixed the git issues for me.

I was getting errors like:

error: failed to encode 'tests/roots/test-root/wrongenc.inc' from UTF-8 to latin-1
error: failed to encode 'tests/roots/test-warnings/wrongenc.inc' from UTF-8 to latin-1

whenever I tried to restore, stash, or rebase.

Full disclosure - this was a level of git that was beyond me so this is an AI assisted PR. Claude summarizes the downsides as:

⚠️ Considerations:

  • Diff viewing - These files will show as "binary" in git diffs instead of text changes
  • Text editors - Some editors might treat them as binary files

Although I was able to open them in neovim without issue.

References

It looks like there was a recent similar change: 5cf62e5

@ianhi ianhi force-pushed the fix-encoding-test-files branch from 0b6ccf6 to bc8f923 Compare August 4, 2025 21:34
Mark wrongenc.inc files as binary in .gitattributes to prevent encoding
conversion issues on different platforms. These files contain intentional
Latin-1 encoded content for testing Sphinx's encoding handling and should
remain byte-for-byte identical.

Fixes git errors like:
- "failed to encode from UTF-8 to latin-1"
- "patch does not apply" in pre-commit hooks
- stash/unstash failures with these files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right; the files are text, not binary content. My attempt here was to properly mark the encoding as I understand that Git's internal index is stored in Unicode, whereas we have a few test files that we intentionally have in different encodings.

Does macOS support the Latin-1 codec in general? I imagine there's no support for Windows-1252? (the default Windows codepage)

A third option would be to remove these files from the repo and have the tests write them to disk each time, which might be somewhat clearer in the intent.

A

@jfbu
Copy link
Contributor

jfbu commented Aug 7, 2025

I am on macOS and I would like to help because I have encountered the issue too. The problem is that I do not know how to reproduce it. I did not encounter it while working on some PRs I merged last few days, but I definitely had the issue about five days ago, and it was a bit painful because one had to be careful to not do git commit -a. It seems it went away after I merged master into a PR of mine using the github web site interface, then I fetched it back from my fork of sphinx-doc/sphinx to my computer.

@jfbu
Copy link
Contributor

jfbu commented Aug 7, 2025

Does macOS support the Latin-1 codec in general? I imagine there's no support for Windows-1252? (the default Windows codepage)

Emacs on macOS definitely support the Latin-1 codec and probably Windows-1252 inclusive of the problem of EOLs. There are some issues with Unicode rather when one copies files to external hard disks where the macOS might automatically change the type of Unicode normalization, but this looks like something else.

@ianhi
Copy link
Contributor Author

ianhi commented Aug 7, 2025

Does macOS support the Latin-1 codec in general? I imagine there's no support for Windows-1252? (the default Windows codepage)

I thought yes

A third option would be to remove these files from the repo and have the tests write them to disk each time, which might be somewhat clearer in the intent.

This seems easiest

Although I also cannot replicate it now :( which is extremely confusing. Maybe this was a temporary bug somewhere else (git, iterm, mac?)?

So maybe this is just resolved, and we can close this, and if it comes up again hopefully someone will find this and not feel despair.

@jfbu
Copy link
Contributor

jfbu commented Aug 7, 2025

I am not 100% sure but it does look as if it is 5cf62e5 which fixed the issue for me once I had updated my locale. But trying to revert it I fail so far to trigger Git into complaining again about those wrongly encoded files.

@ianhi
Copy link
Contributor Author

ianhi commented Aug 7, 2025

I am not 100% sure but it does look as if it is 5cf62e5 which fixed the issue for me once I had updated my locale. But trying to revert it I fail so far to trigger Git into complaining again about those wrongly encoded files.

Confusingly I only made a fork a few days ago, so I would for sure have had this already when I started seeing the issue.

@Ch3ri0ur
Copy link
Contributor

I got the same error (Win11/WSL2). Once I touched those files they were "stuck" and could not be reverted/discarded anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants