Skip to content

Win32 UTF8 Support #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

Win32 UTF8 Support #68

wants to merge 38 commits into from

Conversation

KillerxDBr
Copy link
Contributor

@KillerxDBr KillerxDBr commented May 24, 2025

This PR brings full UTF-8 support (i hope) to Windows, by converting the normal Char Strings to Wide Strings and calling the W versions of the Windows API calls.

Thanks to @yhr0x43 (KillerxDBr#1), he/she did almost everything, i just corrected some mistakes, make a better error checking for the Wide String conversions and better error report for minirent functions (and add a fflush to TODO and UNREACHABLE macros).

According to MS in "Use UTF-8 code pages in Windows apps", you can use a manifest to make the ANSI WinApi calls accept UTF-8 (enabled by default in W10 1903 and newer), but its inconsistent (or it just affects the inputs???) since FormatMessageA was returning ?????????? to yhr0x43 , the proper solution is do the conversion using the string apis (MultiByteToWideChar and WideCharToMultiByte).

EDIT: I forgot to mention the changes to nob_mkdir_if_not_exists and nob_file_exists other than string conversion, those functions were remade using proper WinApi calls.

KillerxDBr and others added 30 commits January 8, 2025 16:15
…xists, using ascii versions of some functions
returning zero if file OR dir not found
After some debate with myself. I am convinced that wchar API functions should be used exclusively for the following reasons:
- Good Windows programming practice. wchar should be used whenever possible
- The rest io interactions uses C standard library, which on windows often can accepts and convert utf-8 string automatically. So I'll argue using ANSI functions is incorrect because library internals (like Nob_Cmd) is assumed to use utf-8 string like what a sane person would do (and also what Linux does).
For correctness's sake, the extra memory allocation and conversion is worth it.
Windows Unicode path support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants