Skip to content

Add --link-targets-dir argument to linkchecker #143883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pietroalbini
Copy link
Member

@pietroalbini pietroalbini commented Jul 13, 2025

In my release notes API list tool (#143053) I want to check whether all links generated by the tool are actually valid, and using linkchecker seems to be the most sensible choice.

Linkchecker currently has a fairly big limitation though: it can only check a single directory, it checks all of the files within it, and link targets must point inside that same directory. This works great when checking the whole documentation package, but in my case I only need to check that one file contains valid links to the standard library docs.

To solve that, this PR adds a new --link-targets-dir flag to linkchecker. Directories passed to it will be valid link targets (with lower priority than the root being checked), but links within them will not be checked.

I'm not that happy with the name of the flag, happy for it to be bikeshedded.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Jul 13, 2025
@pietroalbini pietroalbini marked this pull request as ready for review July 13, 2025 11:00
@rustbot
Copy link
Collaborator

rustbot commented Jul 13, 2025

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jul 13, 2025

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@rust-log-analyzer

This comment has been minimized.

@pietroalbini pietroalbini force-pushed the pa-linkchecker-extra-target branch 2 times, most recently from 739a113 to d51bfa1 Compare July 13, 2025 11:12
@pietroalbini pietroalbini force-pushed the pa-linkchecker-extra-target branch from d51bfa1 to 6831c80 Compare July 13, 2025 11:16
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand a little more about how your tool works? Is it generating some intermediate files with relative paths, and then translating them to absolute paths? What do the directory structures look like?

@@ -10,3 +10,4 @@ path = "main.rs"
[dependencies]
regex = "1"
html5ever = "0.29.0"
clap = { version = "4.5.40", features = ["derive"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uneasy adding this because this tool intentionally tries to keep its dependencies down to a minimum because it is used in many repositories.

Are there maybe some alternatives here? Do this really need a full feature CLI parser? At a minimum, can this drop the default features? Perhaps use the builder API instead of derive?

@pietroalbini
Copy link
Member Author

Can you help me understand a little more about how your tool works? Is it generating some intermediate files with relative paths, and then translating them to absolute paths? What do the directory structures look like?

The relnotes-api-list tool generates a JSON file with all stabilized APIs in the standard library, and their documentation URLs. These URLs look like std/option/enum.Option.html#method.unwrap for Option::unwrap.

I want to add a step to the tool verifying all those links are correct. To do so, my current implementation generates a temporary HTML file with an <a> tag for each link. I then generate the standard library docs, and I want to ensure all the links in the temporary file are valid.

In practice, this would be like placing my temporary file in build/host/doc, and running linkchecker on that directory. There are two downsides I found when doing that:

  • I need to be either extra careful in ensuring the temporary file is removed from build/host/doc, or I need to copy all of build/host/doc in a temporary directory and run linkchecker on that (which I guess won't be the fastest thing on Windows).
  • Pointing linkchecker to build/host/doc will also check the links in the standard library docs, which requires also generating books, and the pages the books point to.

That's why I decided to add the --link-targets-dir flag: doing --link-targets-dir build/host/doc avoids the need to copy it around, and prevents linkchecker from checking its links (while still allowing other links to link to it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants