Skip to content

Conversation

@mertcanaltin
Copy link

@mertcanaltin mertcanaltin commented Oct 5, 2025

Implements Windows file path handling as part of the WHATWG URL
spec change (whatwg/url#874).

Changes

  • Detects Windows drive letter paths (e.g., C:\path\file.txt)
  • Converts backslashes to forward slashes
  • Prefixes with file:/// for proper URL parsing
  • Updates WPT commit hash to latest test expectations with
    percent-encoded Unicode

Implementation

When a Windows file path is detected at the start of URL parsing:

  1. Check for drive letter pattern (C:) or UNC path (\server)
  2. Convert all backslashes to forward slashes
  3. Prefix with file:/// (drive letters) or file: (UNC)
  4. Continue with standard URL parsing

@mertcanaltin
Copy link
Author

mertcanaltin commented Oct 11, 2025

Per @annevk's guidance in whatwg/url#874, the implementation is now
complete:

Implementation Summary

Core Rules (Implemented):

  • Single ASCII letter + :\ → Windows file path (C:\path
    file:///C:/path)
  • Invalid drive patterns → Failure (CC:\path → TypeError)
  • UNC paths → file:// URL (\\server\share
    file://server/share)
  • Opaque paths support backslash (non-special:\\opaque is valid)

Test Results:

  • 5342/5381 tests passing (99.3%)
  • All core Windows path scenarios working
  • 38 edge case tests awaiting spec clarification

Changes Made:

  • lib/url-state-machine.js:536-574: Windows path pre-processing
    • Validates single ASCII letter drives
    • Rejects invalid drive patterns (multi-letter, non-alpha)
    • Converts backslashes to forward slashes
    • Handles UNC paths

WPT Tests: Updated in web-platform-tests/wpt#53459

Ready for review! 🚀

@mertcanaltin
Copy link
Author

I applied last changes @annevk f48453a

mertcanaltin added a commit to mertcanaltin/wpt that referenced this pull request Oct 18, 2025
Updated tests to reflect simplified Windows path handling:

1. Multi-letter drives (CC:\, ABC:\) are now parsed as normal URLs
   - CC:\path → scheme: cc, path: \path (not failure)
   - 1:\path → failure (schemes must start with ASCII letter)
   - @:\path → failure (@ not valid in scheme)

2. UNC paths without base URL should fail
   - \\server\share → failure (no special UNC handling)
   - UNC paths with file: base still work via relative parsing

This aligns with whatwg/url#874 guidance:
"Why would we not parse CC:\path as we do today?"

Only single ASCII letter + :\ should be treated as Windows file path.
Everything else uses normal URL parsing.

Related: whatwg/url#874, jsdom/whatwg-url#304
// Only convert single ASCII letter + :\ pattern (e.g., C:\, D:\)
// Note: Only backslash (\), not forward slash (/)
// Everything else goes through normal URL parsing
if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the spec text at https://github.com/whatwg/url/pull/874/files#diff-29243b3b9b716b55c6a61970b0c4864f464b139d397fb961a05bb6e1e2b97cabR2251 . Please translate the spec text directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the !this.stateOverride check that wasn't in the spec. The implementation now translates the
spec text directly, thanks 🙏

// Everything else goes through normal URL parsing
if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) {
const converted = this.input.replace(/\\/gu, "/");
this.input = `file:///${converted}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I edited

@mertcanaltin
Copy link
Author

Fixed in latest commit

// 2. Press "y" on your keyboard to get a permalink
// 3. Copy the commit hash
const commitHash = "40fc257a28faf7c378f59185235685ea8684e8f4";
const commitHash = "072413fba2fef3c16877673af78215174ca8f7c2";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I used last hash

this.buffer += cStr.toLowerCase();
} else if (c === p(":")) {
// Windows drive letter
if (this.buffer.length === 1 && infra.isASCIIAlpha(this.buffer.codePointAt(0)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the spec, as you've split up the else if and added another sub if. The spec has two subsequent else if conditions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated 🙏

this.buffer += cStr.toLowerCase();
} else if (c === p(":")) {
// Windows drive letter
if (this.buffer.length === 1 && infra.isASCIIAlpha(this.buffer.codePointAt(0)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the spec, as it only tests the 0th code point of buffer, not all of buffer like the spec does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated 🙏

this.url.host = "";
this.buffer = "";
this.state = "path";
--this.pointer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the spec, as it includes two pointer decrements which are not in the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated 🙏

this.state = "path";
--this.pointer;
--this.pointer;
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the spec, as it includes a return statement which is not in the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated 🙏

@mertcanaltin
Copy link
Author

@domenic I found the issue! The path state expects buffer to
contain the Windows drive letter (see existing code at line
1037):

if (this.url.scheme === "file" && this.url.path.length === 0 &&

    isWindowsDriveLetterString(this.buffer)) {
  this.buffer = `${this.buffer[0]}:`;
}

But the spec at line 2255 says "Set buffer to the empty
string", which loses the drive letter.

Should line 2255 instead be:Set buffer to buffer + U+003A (:)

this would give us buffer = "C:", which path state can then
process correctly?

Currently getting: file:////path (C is lost)Expected (per WPT): file:///C:/path

@domenic
Copy link
Member

domenic commented Oct 22, 2025

I'm sorry, I can't really provide support for how to code or spec this. You need to figure that out on your own.

In general, I hope you can be more respectful of reviewers' time here. The review I performed above found issues which should have been obvious to anyone who opened the spec and the implementation side by side. Based on some of the work so far, I worry that you might be using AI coding agents, which often have this problem of not following instructions perfectly and thus wasting reviewer time.

Please do your best to follow the requested workflow, of producing an implementation change that follows the spec exactly, and then testing it against the new tests, and getting them passing. Doing that work is your responsibility, and if you do it correctly, review should be quick and not require much of my time.

@mertcanaltin
Copy link
Author

I'm sorry, I can't really provide support for how to code or spec this. You need to figure that out on your own.

In general, I hope you can be more respectful of reviewers' time here. The review I performed above found issues which should have been obvious to anyone who opened the spec and the implementation side by side. Based on some of the work so far, I worry that you might be using AI coding agents, which often have this problem of not following instructions perfectly and thus wasting reviewer time.

Please do your best to follow the requested workflow, of producing an implementation change that follows the spec exactly, and then testing it against the new tests, and getting them passing. Doing that work is your responsibility, and if you do it correctly, review should be quick and not require much of my time.

First of all, I apologize for the time issue I caused. I agree with what you said. I am in the process of learning English, so I am using AI tools to help me respond to messages. That may have created that impression for you. Other than that, I am making progress with my own solutions.

@mertcanaltin
Copy link
Author

I will take this PR as a draft and open it for review once I am sure about everything. This has been a great learning experience for me, thank you, I'm sorry for any problems I may have caused during this process 🙏

@mertcanaltin mertcanaltin marked this pull request as draft October 22, 2025 18:38
@mertcanaltin mertcanaltin force-pushed the feat/windows-file-path-handling branch from 07f0057 to a8c0290 Compare October 22, 2025 20:12
@mertcanaltin mertcanaltin marked this pull request as ready for review October 22, 2025 20:16
@mertcanaltin mertcanaltin marked this pull request as draft October 22, 2025 20:23
@mertcanaltin mertcanaltin force-pushed the feat/windows-file-path-handling branch 4 times, most recently from fd03587 to f6b3fa1 Compare October 26, 2025 16:11
@mertcanaltin mertcanaltin marked this pull request as ready for review October 26, 2025 17:00
@mertcanaltin
Copy link
Author

@domenic , hello again. Thank you very much for your feedback. The tests were not synced with the master branch, so the past IDNA tests were failing in the pipeline. Once I synchronized that part, the issue was resolved. web-platform-tests/wpt#53459

@mertcanaltin mertcanaltin force-pushed the feat/windows-file-path-handling branch from e8e6ed6 to fe79067 Compare October 26, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants