Skip to content

Conversation

ranger-ross
Copy link
Member

@ranger-ross ranger-ross commented Sep 10, 2025

What does this PR try to resolve?

This PR re-organizes the build-dir file layout structure to a layout organized by "build unit" when -Zbuild-dir-new-layout is enabled.
See #15010 for the motivations and design discussions.

Below is file structure generated for a foo crate with a single dependency on syn.

$ tree -a target
target
├── CACHEDIR.TAG
├── debug
│   ├── .cargo-lock
│   ├── examples
│   ├── foo
│   └── foo.d
├── .rustc_info.json
└── x86_64-unknown-linux-gnu
    ├── CACHEDIR.TAG
    └── debug
        ├── build
        │   ├── foo-9a570e8d4e3b9c02
        │   │   ├── deps
        │   │   │   ├── foo-9a570e8d4e3b9c02
        │   │   │   └── foo-9a570e8d4e3b9c02.d
        │   │   └── fingerprint
        │   │       ├── bin-foo
        │   │       ├── bin-foo.json
        │   │       ├── dep-bin-foo
        │   │       └── invoked.timestamp
        │   ├── proc-macro2-11d646d30b48934a
        │   │   ├── build-script-execution
        │   │   │   ├── invoked.timestamp
        │   │   │   ├── out
        │   │   │   ├── output
        │   │   │   ├── root-output
        │   │   │   └── stderr
        │   │   ├── deps
        │   │   └── fingerprint
        │   │       ├── run-build-script-build-script-build
        │   │       └── run-build-script-build-script-build.json
        │   ├── proc-macro2-c39225434fa6ebb9
        │   │   ├── build-script
        │   │   │   ├── build-script-build
        │   │   │   ├── build_script_build-c39225434fa6ebb9
        │   │   │   └── build_script_build-c39225434fa6ebb9.d
        │   │   ├── deps
        │   │   └── fingerprint
        │   │       ├── build-script-build-script-build
        │   │       ├── build-script-build-script-build.json
        │   │       ├── dep-build-script-build-script-build
        │   │       └── invoked.timestamp
        │   ├── proc-macro2-ee66340aaf816e44
        │   │   ├── deps
        │   │   │   ├── libproc_macro2-ee66340aaf816e44.rlib
        │   │   │   ├── libproc_macro2-ee66340aaf816e44.rmeta
        │   │   │   └── proc_macro2-ee66340aaf816e44.d
        │   │   └── fingerprint
        │   │       ├── dep-lib-proc_macro2
        │   │       ├── invoked.timestamp
        │   │       ├── lib-proc_macro2
        │   │       └── lib-proc_macro2.json
        │   ├── quote-60d1025cc5981bf9
        │   │   ├── deps
        │   │   │   ├── libquote-60d1025cc5981bf9.rlib
        │   │   │   ├── libquote-60d1025cc5981bf9.rmeta
        │   │   │   └── quote-60d1025cc5981bf9.d
        │   │   └── fingerprint
        │   │       ├── dep-lib-quote
        │   │       ├── invoked.timestamp
        │   │       ├── lib-quote
        │   │       └── lib-quote.json
        │   ├── syn-8bb86a6e710a28f8
        │   │   ├── deps
        │   │   │   ├── libsyn-8bb86a6e710a28f8.rlib
        │   │   │   ├── libsyn-8bb86a6e710a28f8.rmeta
        │   │   │   └── syn-8bb86a6e710a28f8.d
        │   │   └── fingerprint
        │   │       ├── dep-lib-syn
        │   │       ├── invoked.timestamp
        │   │       ├── lib-syn
        │   │       └── lib-syn.json
        │   └── unicode-ident-138e100e88d70a5a
        │       ├── deps
        │       │   ├── libunicode_ident-138e100e88d70a5a.rlib
        │       │   ├── libunicode_ident-138e100e88d70a5a.rmeta
        │       │   └── unicode_ident-138e100e88d70a5a.d
        │       └── fingerprint
        │           ├── dep-lib-unicode_ident
        │           ├── invoked.timestamp
        │           ├── lib-unicode_ident
        │           └── lib-unicode_ident.json
        ├── .cargo-lock
        ├── examples
        └── incremental
            └── foo-2h2dgfe3yi12y
                ├── s-hb1pgt29w3-0lknhgc-3etpflv3aav9943mg0t1wvtn2
                │   ├── 3tfn04wtjsm8vx5ycuvtcomcd.o
                │   ├── 9l4brhvtkja3n8ij1yvt5do9w.o
                │   ├── c36nxw76mzqhzlspm7b3ltxbl.o
                │   ├── c9nttna0krfbyozrk9hoy3wr4.o
                │   ├── dep-graph.bin
                │   ├── dsh75udcajb2zbok4s8yxa8un.o
                │   ├── e615jsbjhqsi7cyikfn2w3qnh.o
                │   ├── query-cache.bin
                │   └── work-products.bin
                └── s-hb1pgt29w3-0lknhgc.lock

34 directories, 64 files

How to test and review this PR?

This PR still needs to be more thoroughly tested. Thus far I have been testing on simple test crates.
Also see #15874 for potential test harness improvements that could be used by this PR.


Still have a good amount of testing + documenting to do before marking this PR as ready, but early feedback is welcome :D

@rustbot rustbot added A-build-execution Area: anything dealing with executing the compiler A-layout Area: target output directory layout, naming, and organization Command-clean labels Sep 10, 2025
}

/// Directory where incremental output for the given unit should go.
pub fn incremental_dir(&self, unit: &Unit) -> PathBuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just raise the awareness that this PR changed the incremental directory as well. See the relevant discussion: #15010 (comment).

We'll need to investigate the impact of this, or whether incremental compilation is still working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#t-compiler > Cargo switching to one `-C incremental` directory per crate

Just opened a discussion on Zulip.

This is not a blocker BTW, as we are still experimenting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got a quick answer from Mark-Simulacrum. So no issue on loading incremental artifacts side.

simulacrum: AFAIK, rustc always loads incremental artifacts out of the directory only for the local crate - cross-crate state is always from rmeta

Weihang Lo: Ah nice. So it shouldn't be an issue, and Cargo doesn't need to add flock there because it already has one, right?

simulacrum: That sounds right to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I think we should probably start off conservatively and only have one incremental/, like today (which may run into problems with #4282). We can experiment with it later as changing it should have little impact.

The incremental directory takes up a significant chunk of the build-dir size. If we make it unique by -Cextra-filename then we will end up with multiple of them in the build, ballooning the build-dir size.

Its unclear what the performance impact would be. Having a single directory while changing inputs to -Cextra-filename could mean faster rebuilds if it can reuse a lot. Or it throws out a lot and thrashes the caches and is benefited by unique incremental/s.

For CI, its also a benefit to make it easy to clear to keep caching in CI easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this sounds reasonable.
I was worried about a shared incremental being a point of lock contention when we introduce fine grain locking.

But thinking about a bit more, cargo only enables incremental for workspace and path crates so generally only a small subset would need to lock on this directory.
Also since build-dir internals are not public interface, we can change it in the future if we find another approach to be optimal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have update the implementation to return to a single incremental/ directory.
See the updated PR description for the file layout.

@epage
Copy link
Contributor

epage commented Sep 10, 2025

CC @Kobzol due to your work on rust-lang/rust#145408. If reducing duplicate search paths speeds up builds, I wonder what the impact will be of having more but pin point focused search paths will be.

@weihanglo
Copy link
Member

weihanglo commented Sep 10, 2025

If reducing duplicate search paths speeds up builds, I wonder what the impact will be of having more but pin point focused search paths will be.

Probably doesn't matter much because you still need to go through all of them to find what a crate need?

Edit: doesn't matter for primary crate, but for dependencies at the very root (like syn), it would be helpful.

@epage
Copy link
Contributor

epage commented Sep 10, 2025

Probably doesn't matter much because you still need to go through all of them to find what a crate need?

Currently, the search path includes each -Cextra-filename variant of a package's build. With this change, it will only see the variants relevant for this build.

@Kobzol
Copy link
Member

Kobzol commented Sep 10, 2025

I would be a bit worried about perf. in large scenarios (e.g. 1000 crates, which is not that uncommon), as I suspect that rustc does a bunch of linear (hopefully not quadratic) searches through these directories and files in them. I would suggest benchmarking on https://github.com/zed-industries/zed 😆

@weihanglo

This comment was marked as off-topic.

@weihanglo
Copy link
Member

Basically files under a search directory are preloaded and sorted and then binary search on them, so shouldn't be too bad? It may incur more opendir/readdir syscall though.

Like epage mentioned, it also help for transitive dependency loading less files.

But yeah worth some benchmark for larger projects.

@Kobzol
Copy link
Member

Kobzol commented Sep 11, 2025

Ah, I forgot that we do binary search already. In that case it will be probably fine, yeah.

Kobzol added a commit to Kobzol/rust that referenced this pull request Sep 11, 2025
@Kobzol
Copy link
Member

Kobzol commented Sep 11, 2025

I tried it on Zed and didn't see any perf. difference vs master, neither for clean builds nor for incremental rebuilds.

@epage
Copy link
Contributor

epage commented Sep 11, 2025

I tried it on Zed and didn't see any perf. difference vs master, neither for clean builds nor for incremental rebuilds.

If there is a different, it will most likely appear if you have multiple unique versions for each package, e.g. from

  • cargo check
  • cargo clippy
  • cargo build
  • cargo doc
  • changing RUSTFLAGS
  • changing features

@Kobzol
Copy link
Member

Kobzol commented Sep 11, 2025

I didn't see the rebuild time getting higher for Zed when I added multiple versions from different cargo invocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target/x86_64-unknown-linux-gnu/debug/build/proc-macro2-ee66340aaf816e44

So while this reduces the "max content per directory" (since proc-macro2-ee66340aaf816e44 will be a dir, rather than multiple files), we also have more flexibility for handling this.

Should we change from proc-macro2-ee66340aaf816e44 to proc/-macro2/ee66340aaf816e44?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change from proc-macro2-ee66340aaf816e44 to proc/-macro2/ee66340aaf816e44?

Could we expand a bit more on the benefit of the proposed change?

Copy link
Contributor

@epage epage Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be performance issues on Windows when a directory has a lot of content. We do this prefix-directory stuff for the index and for the build-dir workspace hash. This would be extending it to the build units within the package dir.

As Ross brought up, we don't have guidance on how big is big, what the growth will look like, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we share this layout with the shared cache, then it will likely be more important.

@epage
Copy link
Contributor

epage commented Sep 17, 2025

As a follow up to this PR, we may want to remove -Cextra-filename where possible since uniqueness is now guaranteed by the directory. Unsure if rustc relies on this for loading of rlibs or if we can only drop the hash from non-rlibs.

@epage
Copy link
Contributor

epage commented Sep 17, 2025

Another reason we might want to remove -Cextra-filename where possible is to reduce the risk of hitting windows path length issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need a test for cargo clean -p foo. Haven't looked at how thats implemented but might at least be a reason for name/hash rather than name-hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler A-layout Area: target output directory layout, naming, and organization Command-clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants