Skip to content

Conversation

@jdesrosiers
Copy link
Member

Since draft-07, the hostname format has specified that it includes IDNA2008 A-labels and that all valid hostnames should also be valid idn-hostnames. That means that it's not enough for hostnames to be valid punycode, they also need to be able to decode to a valid U-label including all the complex rules involved. This PR adds an equivalent A-label test for all of the U-label tests from the idn-hostname format tests to the hostname format tests.

I also did a little clean up. I found several tests issues including a few duplicate tests, incorrect test descriptions, and few other issues. I also reordered some of the tests so they were grouped logically. I did the clean in a separate commit to make review a little easier.

@jdesrosiers jdesrosiers requested a review from a team as a code owner October 8, 2025 01:27
Copy link
Member

@jviotti jviotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on hostnames in general, but at least what I see seems to make sense to me

@karenetheridge
Copy link
Member

You've brought in restrictions from RFC5891 (e.g. "contains "--" in the 3rd and 4th position" -> invalid) which I don't think are correct. The hostname format is defined as a union of the requirements from RFC1123 and RFC5891, not an intersection, and "--" is a valid substring under RFC1123 as far as I can tell?

@jdesrosiers
Copy link
Member Author

@karenetheridge The spec says,

Note that all strings valid against the "hostname" attribute are also valid against the "idn-hostname" attribute. Section 7.3.3

That means that any label in a hostname that starts with xn-- must be a valid IDN A-label. Otherwise it wouldn't be a valid idn-hostname.

The hostname format is defined as a union of the requirements from RFC1123 and RFC5891, not an intersection

I don't think that interpretation makes sense. A-labels (labels produced using the Punycode algorithm) are always valid RFC 1123 labels. So, that requirement would be meaningless if there were no requirement to validate those A-labels. That and the language that all hostnames are idn-hostnames is pretty convincing to me that hostnames need to be validated as ASCII-only idn-hostnames. I don't think it was a good idea to define it that way, but I think that's what the spec currently requires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants