-
-
Notifications
You must be signed in to change notification settings - Fork 103
feat: implement Windows file path handling in URL parsing #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: implement Windows file path handling in URL parsing #304
Conversation
|
Per @annevk's guidance in whatwg/url#874, the implementation is now Implementation SummaryCore Rules (Implemented):
Test Results:
Changes Made:
WPT Tests: Updated in web-platform-tests/wpt#53459 Ready for review! 🚀 |
Updated tests to reflect simplified Windows path handling: 1. Multi-letter drives (CC:\, ABC:\) are now parsed as normal URLs - CC:\path → scheme: cc, path: \path (not failure) - 1:\path → failure (schemes must start with ASCII letter) - @:\path → failure (@ not valid in scheme) 2. UNC paths without base URL should fail - \\server\share → failure (no special UNC handling) - UNC paths with file: base still work via relative parsing This aligns with whatwg/url#874 guidance: "Why would we not parse CC:\path as we do today?" Only single ASCII letter + :\ should be treated as Windows file path. Everything else uses normal URL parsing. Related: whatwg/url#874, jsdom/whatwg-url#304
lib/url-state-machine.js
Outdated
| // Only convert single ASCII letter + :\ pattern (e.g., C:\, D:\) | ||
| // Note: Only backslash (\), not forward slash (/) | ||
| // Everything else goes through normal URL parsing | ||
| if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the spec text at https://github.com/whatwg/url/pull/874/files#diff-29243b3b9b716b55c6a61970b0c4864f464b139d397fb961a05bb6e1e2b97cabR2251 . Please translate the spec text directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the !this.stateOverride check that wasn't in the spec. The implementation now translates the
spec text directly, thanks 🙏
lib/url-state-machine.js
Outdated
| // Everything else goes through normal URL parsing | ||
| if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) { | ||
| const converted = this.input.replace(/\\/gu, "/"); | ||
| this.input = `file:///${converted}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes do not match the spec text at https://github.com/whatwg/url/pull/874/files#diff-29243b3b9b716b55c6a61970b0c4864f464b139d397fb961a05bb6e1e2b97cabR2256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I edited
|
Fixed in latest commit |
scripts/get-latest-platform-tests.js
Outdated
| // 2. Press "y" on your keyboard to get a permalink | ||
| // 3. Copy the commit hash | ||
| const commitHash = "40fc257a28faf7c378f59185235685ea8684e8f4"; | ||
| const commitHash = "072413fba2fef3c16877673af78215174ca8f7c2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/web-platform-tests/wpt/tree/072413fba2fef3c16877673af78215174ca8f7c2/url/resources seems to be a 7-month old commit.
We need to instead run the tests against the updates made in web-platform-tests/wpt#53459 . The latest is https://github.com/mertcanaltin/wpt/blob/bbcb1c372c05e7ec573da9e4f295ecb0e70ca45b/url/resources/urltestdata.json it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I used last hash
lib/url-state-machine.js
Outdated
| this.buffer += cStr.toLowerCase(); | ||
| } else if (c === p(":")) { | ||
| // Windows drive letter | ||
| if (this.buffer.length === 1 && infra.isASCIIAlpha(this.buffer.codePointAt(0)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow the spec, as you've split up the else if and added another sub if. The spec has two subsequent else if conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated 🙏
lib/url-state-machine.js
Outdated
| this.buffer += cStr.toLowerCase(); | ||
| } else if (c === p(":")) { | ||
| // Windows drive letter | ||
| if (this.buffer.length === 1 && infra.isASCIIAlpha(this.buffer.codePointAt(0)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow the spec, as it only tests the 0th code point of buffer, not all of buffer like the spec does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated 🙏
lib/url-state-machine.js
Outdated
| this.url.host = ""; | ||
| this.buffer = ""; | ||
| this.state = "path"; | ||
| --this.pointer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow the spec, as it includes two pointer decrements which are not in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated 🙏
lib/url-state-machine.js
Outdated
| this.state = "path"; | ||
| --this.pointer; | ||
| --this.pointer; | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow the spec, as it includes a return statement which is not in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated 🙏
|
@domenic I found the issue! The path state expects buffer to if (this.url.scheme === "file" && this.url.path.length === 0 &&
isWindowsDriveLetterString(this.buffer)) {
this.buffer = `${this.buffer[0]}:`;
}But the spec at line 2255 says "Set buffer to the empty Should line 2255 instead be:Set buffer to buffer + U+003A (:) this would give us buffer = "C:", which path state can then Currently getting: file:////path (C is lost)Expected (per WPT): file:///C:/path |
|
I'm sorry, I can't really provide support for how to code or spec this. You need to figure that out on your own. In general, I hope you can be more respectful of reviewers' time here. The review I performed above found issues which should have been obvious to anyone who opened the spec and the implementation side by side. Based on some of the work so far, I worry that you might be using AI coding agents, which often have this problem of not following instructions perfectly and thus wasting reviewer time. Please do your best to follow the requested workflow, of producing an implementation change that follows the spec exactly, and then testing it against the new tests, and getting them passing. Doing that work is your responsibility, and if you do it correctly, review should be quick and not require much of my time. |
First of all, I apologize for the time issue I caused. I agree with what you said. I am in the process of learning English, so I am using AI tools to help me respond to messages. That may have created that impression for you. Other than that, I am making progress with my own solutions. |
|
I will take this PR as a draft and open it for review once I am sure about everything. This has been a great learning experience for me, thank you, I'm sorry for any problems I may have caused during this process 🙏 |
07f0057 to
a8c0290
Compare
fd03587 to
f6b3fa1
Compare
|
@domenic , hello again. Thank you very much for your feedback. The tests were not synced with the master branch, so the past IDNA tests were failing in the pipeline. Once I synchronized that part, the issue was resolved. web-platform-tests/wpt#53459 |
e8e6ed6 to
fe79067
Compare
Implements Windows file path handling as part of the WHATWG URL
spec change (whatwg/url#874).
Changes
percent-encoded Unicode
Implementation
When a Windows file path is detected at the start of URL parsing: