Skip to content

Conversation

zcorpan
Copy link
Member

@zcorpan zcorpan commented Sep 8, 2025

Meta: Preparation for source reformatting

  • Add data-noreformat attributes
  • Fix double spaces and incorrect wrapping around tags
  • Add a Check Formatting workflow
  • Update CONTRIBUTING.md

Meta: Source reformatting: switch to no line-breaking

This applies reformahtml 0.1.9. See https://github.com/zcorpan/reformahtml

Fixes #1059.


/acknowledgements.html ( diff )
/browsers.html ( diff )
/browsing-the-web.html ( diff )
/canvas.html ( diff )
/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/comms.html ( diff )
/custom-elements.html ( diff )
/dnd.html ( diff )
/document-lifecycle.html ( diff )
/document-sequences.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/embedded-content.html ( diff )
/form-control-infrastructure.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iana.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/images.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/infrastructure.html ( diff )
/input.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/introduction.html ( diff )
/links.html ( diff )
/media.html ( diff )
/microdata.html ( diff )
/named-characters.html ( diff )
/nav-history-apis.html ( diff )
/obsolete.html ( diff )
/parsing.html ( diff )
/popover.html ( diff )
/references.html ( diff )
/rendering.html ( diff )
/scripting.html ( diff )
/sections.html ( diff )
/semantics-other.html ( diff )
/semantics.html ( diff )
/server-sent-events.html ( diff )
/speculative-loading.html ( diff )
/structured-data.html ( diff )
/syntax.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/urls-and-fetching.html ( diff )
/web-messaging.html ( diff )
/webappapis.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )
/worklets.html ( diff )
/xhtml.html ( diff )

@zcorpan
Copy link
Member Author

zcorpan commented Sep 8, 2025

The reformatting tool supports opting out for specific elements with a data-noreformat attribute. I haven't added such an attribute yet, but some things should probably have it, e.g. the Dependencies section.

@domenic
Copy link
Member

domenic commented Sep 9, 2025

Very cool!

The other major section to not reformat is the acknowledgements section.

Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?

@jmdyck
Copy link
Contributor

jmdyck commented Sep 9, 2025

In the paragraph that begins "Here is an example of color space conversion", the status quo:
... <code data-x=\n "dom-....
is converted to:
... <code data-x= "dom-...
but that space after the = probably shouldn't be there.

(This is the only case where the status quo has a line-break after an attribute's =.)

@annevk
Copy link
Member

annevk commented Sep 9, 2025

Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.

@zcorpan zcorpan force-pushed the apply-reformahtml branch 2 times, most recently from c7dbbc0 to d81a7bb Compare September 9, 2025 14:28
@zcorpan
Copy link
Member Author

zcorpan commented Sep 9, 2025

@jmdyck fixed the = issue, thanks!

@domenic

Very cool!

Thanks!

The other major section to not reformat is the acknowledgements section.

I've marked elements I found with custom indentation with data-noreformat now in a separate commit.

Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?

file=source                
a=$(git show HEAD:"$file" | tr -d '[:space:]' | git hash-object --stdin)
b=$(tr -d '[:space:]' < "$file" | git hash-object --stdin)

if [ "$a" = "$b" ]; then
  echo "Only whitespace differences"
else
  echo "Real changes"
fi

Shows "Only whitespaces differences" after running the script.

@zcorpan zcorpan marked this pull request as ready for review September 9, 2025 15:07
@zcorpan
Copy link
Member Author

zcorpan commented Sep 9, 2025

I think I'm happy with the output now. LMK if there are more elements that should have data-noreformat or if there are bugs in reformahtml.py.

@jmdyck
Copy link
Contributor

jmdyck commented Sep 9, 2025

Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.

Would you prefer a separate PR to fix such bugs?

@annevk
Copy link
Member

annevk commented Sep 10, 2025

As far as I can tell we only have one instance. We could fix it separately or incorporate it in a random PR. I don't think it matters much?

@zcorpan
Copy link
Member Author

zcorpan commented Sep 10, 2025

No need to fix = since the issue goes away with the reformatting.

@jmdyck
Copy link
Contributor

jmdyck commented Sep 10, 2025

As far as I can tell we only have one instance.

There's only one instance of 'newline after equals', yes, but there's more than one instance of 'newline where whitespace wouldn't "naturally" occur'.

@jmdyck
Copy link
Contributor

jmdyck commented Sep 10, 2025

In this PR, there are many cases where, if the first non-blank on a line is an element, it isn't joined up with the previous line, but presumably should be. (If you're looking for examples, searching for /^ *<\(var\|code\|ref\|span\)/ in vim will find a lot of them.)

@zcorpan
Copy link
Member Author

zcorpan commented Sep 19, 2025

@jmdyck, thanks, fixed now.

@annevk
Copy link
Member

annevk commented Sep 30, 2025

@zcorpan it seems this branch still has conflicts?

@zcorpan zcorpan added the do not merge yet Pull request must not be merged per rationale in comment label Sep 30, 2025
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubber stamp, assuming this is going to be followed up with a "Meta:" commit to fix blame and hoping that this doesn't introduce new issues.

@jmdyck
Copy link
Contributor

jmdyck commented Sep 30, 2025

  • A lot of the <ref> tags are preceded by 2-10 spaces, where it should presumably be only 1.
  • Sometimes a <ref> isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be.
  • Sometimes a </div> tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line.
  • The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?

@zcorpan
Copy link
Member Author

zcorpan commented Oct 2, 2025

  • A lot of the <ref> tags are preceded by 2-10 spaces, where it should presumably be only 1.

  • Sometimes a <ref> isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be.

  • Sometimes a </div> tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line.

Thanks, these should be fixed now.

  • The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?

Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.

This means some comments still have internal linebreaks that could arguably have internal linebreaks removed, but I think we can leave them as-is.

@zcorpan zcorpan force-pushed the apply-reformahtml branch 3 times, most recently from ddabc39 to e3e813a Compare October 3, 2025 11:14
@zcorpan zcorpan removed the do not merge yet Pull request must not be merged per rationale in comment label Oct 3, 2025
@zcorpan
Copy link
Member Author

zcorpan commented Oct 3, 2025

I'm happy with the result now. I've looked through the source and added some more data-noreformat attributes. I've also added a workflow step to run the script so future changes use correct wrapping. I've also updated the documentation in CONTRIBUTING.md.

@zcorpan zcorpan force-pushed the apply-reformahtml branch from e3e813a to 2ac969d Compare October 3, 2025 11:31
@zcorpan zcorpan added the do not merge yet Pull request must not be merged per rationale in comment label Oct 3, 2025
@zcorpan
Copy link
Member Author

zcorpan commented Oct 3, 2025

I noticed multiple spaces before comments.

Edit: fixed.

@zcorpan zcorpan force-pushed the apply-reformahtml branch from 2ac969d to ebd8e22 Compare October 3, 2025 13:00
@zcorpan zcorpan removed the do not merge yet Pull request must not be merged per rationale in comment label Oct 3, 2025
<p>A <code>base</code> element that is the first <code>base</code> element with an <code
data-x="attr-base-href">href</code> content attribute <span>in a document tree</span> has a
```
You can run https://github.com/zcorpan/reformahtml locally to remove inter-paragraph line wrapping. This script is also run as a GitHub workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can run https://github.com/zcorpan/reformahtml locally to remove inter-paragraph line wrapping. This script is also run as a GitHub workflow.
You can run https://github.com/zcorpan/reformahtml locally to remove intra-paragraph line wrapping. This script is also run as a GitHub workflow.

@jmdyck
Copy link
Contributor

jmdyck commented Oct 4, 2025

I'm puzzled about the semantics of data-noreformat.

E.g., in source at line 2449 (under "terms defined in URL"), there's a <ul> marked data-noreformat, so I expected that the whole of that element (down to line 2509) would be identical in the reformatted file. But there are 3 spots within the element where a linebreak was eliminated.

(Mind you, I think that's the only example where data-noreformat seems to have been disrespected.)

@jmdyck
Copy link
Contributor

jmdyck commented Oct 4, 2025

There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.

  • line 1324: sets of<span data-x="plugin">plugins
  • line 6292: The<dfn id="lazy-load-root-margin">lazy load scroll margin
  • line 8670: the document is a<span>cookie-averse

@jmdyck
Copy link
Contributor

jmdyck commented Oct 4, 2025

There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:

  • </p> on lines 7163, 27872, 37167, 42547, 50845. (In some cases, maybe the original should be marked data-noreformat.)

  • </dd> on lines 30446, 30451, 98337, 107255+following.

  • </p></dd> on 52561, 73068.

  • </p></li> on 84068.

@jmdyck
Copy link
Contributor

jmdyck commented Oct 4, 2025

Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.

Thanks.

In the source, there are 4 cases where two comments are immediately adjacent. (Scan for --><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.

@domfarolino
Copy link
Member

Sorry to jump in late, but I'm still not sure what the value add here is. Is it just lower friction for contributors to not have to worry about line breaks? What is the rule we're trying to enforce after this PR? Is it:

  • you can break wherever you want, and have arbitrarily long paragraphs; or
  • no line breaks within a paragraph are allowed

I think I still prefer the 100 character line break rule, and using a tool like specfmt to keep this specification's source more consistent with other WHATWG specs, whereas it seems this change drives more inconsistency across our specs.

@annevk
Copy link
Member

annevk commented Oct 7, 2025

The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.

@zcorpan
Copy link
Member Author

zcorpan commented Oct 7, 2025

Generally, no line breaks within a paragraph are allowed. Exceptions are "custom" linebreaks for readability and easier editing, e.g. HTML uses one line per name in the Acknowledgements paragraph. These must have a data-noreformat attribute.

The benefit of this approach is that contributors don't have to use a tool like specfmt, but instead just write the paragraph and then commit. The current line-wrapping rule for HTML also makes searching for terms annoying, which is fixed by this change.

@zcorpan zcorpan force-pushed the apply-reformahtml branch from ebd8e22 to 81cc2a5 Compare October 8, 2025 08:19
@zcorpan
Copy link
Member Author

zcorpan commented Oct 8, 2025

@jmdyck thank you for reviewing!

There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.

This was a bug in the script, now fixed.

There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:

These were due to inconsistent formatting in source, now fixed (in the second commit, to be squashed into the first).

In the source, there are 4 cases where two comments are immediately adjacent. (Scan for --><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.

Fixed.

@jmdyck
Copy link
Contributor

jmdyck commented Oct 10, 2025

I'm still puzzled about the semantics of data-noreformat (see #11640 (comment)).


At line 65870 of source, there's:

<p class="note">The requirement that <span data-x="data block">data blocks</span>
<!--non-normative-->must be denoted using a <span>valid MIME type string</span>

In the reformatted result (line 48421), these lines don't get joined up, but they probably should be.


Generally, the reformatter squashes runs of spaces down to a single space, but there are a few places where it doesn't:

  • line 10057: whitespace</span>. <!--<code>Text
  • line 44229: <!-- or "cell"? -->
  • line 62897: the <dfn

* Add data-noreformat attributes
* Fix double spaces and incorrect wrapping around tags
* Add a Check Formatting workflow
* Update CONTRIBUTING.md
@zcorpan
Copy link
Member Author

zcorpan commented Oct 14, 2025

@jmdyck thanks, I think those are also fixed now.

@jmdyck
Copy link
Contributor

jmdyck commented Oct 14, 2025

Okay, I think this is ready to go.

@domfarolino
Copy link
Member

The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.

What are some good, recommended editor settings to visually wrap existing long lines to ~100 characters, and preserve the indentation level? Specifically for vim, I can't find a good way to visually wrap long lines tighter than the vim window itself, which leaves the spec pretty unreadable to me. Does anyone know the trick in vim?

@domfarolino
Copy link
Member

Also, is there a way to preserve indentation among other things in GitHub's review panel? Or is the plan to just live with wonky indents like:
Screenshot 2025-10-14 at 6 09 17 PM

I worry about this degrading readability during reviews.

Also, I worry this would make PRs harder to review, since you can't anchor comments in the middle of a visually wrapped line on GitHub. It will be hard to reference specific text in the middle of a paragraph, since your comment will just be anchored at the first visual line of the whole block, which is really weird. Suggestions won't be harder, since you an make modifications in the middle of a block of text (albeit the block will be larger now), but I think the comment experience would be pretty annoying when trying to target very specific text.

@jmdyck
Copy link
Contributor

jmdyck commented Oct 15, 2025

What are some good, recommended editor settings to visually wrap existing long lines to ~100 characters, and preserve the indentation level? Specifically for vim, I can't find a good way to visually wrap long lines tighter than the vim window itself, which leaves the spec pretty unreadable to me. Does anyone know the trick in vim?

:set wrap linebreak breakindent breakindentopt=shift:1

will visually wrap long lines at word-boundaries and indent the wrapped lines at 1 more than the indent of the actual line. But the wrapping-limit is still defined by the width of the window -- I don't think you can visually wrap to a specified width, but I might have missed something.

@domfarolino
Copy link
Member

But the wrapping-limit is still defined by the width of the window -- I don't think you can visually wrap to a specified width, but I might have missed something.

Yeah, this is the problem I’m having.

@annevk
Copy link
Member

annevk commented Oct 15, 2025

@zcorpan
Copy link
Member Author

zcorpan commented Oct 15, 2025

For people who use VS Code, I found that you can set Editor: Word Wrap to wordWrapColumn and then set Editor: Word Wrap Column to the desired number of columns.

Also, is there a way to preserve indentation among other things in GitHub's review panel? Or is the plan to just live with wonky indents like:

I think we'd have to live with this for now, but can file a request to GitHub to improve the wrapping. It should be possible in CSS (but a bit ugly without full attr() support): https://software.hixie.ch/utilities/js/live-dom-viewer/saved/14231

Also, I worry this would make PRs harder to review, since you can't anchor comments in the middle of a visually wrapped line on GitHub. It will be hard to reference specific text in the middle of a paragraph, since your comment will just be anchored at the first visual line of the whole block, which is really weird.

I think this is true. A workaround could be to quote the part of the line you want to comment on. We can also file another request for GitHub to support review comments on part of a line. I also note that some specs e.g. https://github.com/whatwg/compression and https://github.com/whatwg/urlpattern already use no line breaks and it seems to me this hasn't been an issue for PR reviews.

@zcorpan zcorpan added the agenda+ To be discussed at a triage meeting label Oct 23, 2025
@zcorpan
Copy link
Member Author

zcorpan commented Oct 23, 2025

Pros of no-line-breaks:

  • No need for contributors to think about where to put line breaks, or to run a script to normalize line breaks.
  • Searching for text is easier, because text is not wrapped across multiple lines. (In VS Code, I need to use \n?\s+ between words. In some other editors, \s+ is enough.)
  • Diffs in GitHub should more reliably highlight only changed words. When wrapping to 100 cols, sometimes a whole block is marked as modified (without the changed words highlighted).

Cons:

  • Some editors might not have satisfactory soft wrapping support.
  • GitHub UI's soft wrapping doesn't indent wrapped lines. (This is fixable by GitHub.)
  • GitHub code review: it's harder to comment on a specific part of a paragraph. (Maybe this is fixable by GitHub.)

@domfarolino suggested running specfmt as part of html-build. I think this still introduces friction for contributors. Running html-build is often not needed (for small changes I usually don't build locally). specfmt also requires changes to be committed before running. The current build system doesn't modify the source file. If it's an opt-in to reformat, it seems not so different from the status quo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agenda+ To be discussed at a triage meeting spec tooling

Development

Successfully merging this pull request may close these issues.

Source formatting script

5 participants