-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat(build-dir): Reorganize build-dir layout #15947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
/// Directory where incremental output for the given unit should go. | ||
pub fn incremental_dir(&self, unit: &Unit) -> PathBuf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just raise the awareness that this PR changed the incremental directory as well. See the relevant discussion: #15010 (comment).
We'll need to investigate the impact of this, or whether incremental compilation is still working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#t-compiler > Cargo switching to one `-C incremental` directory per crate
Just opened a discussion on Zulip.
This is not a blocker BTW, as we are still experimenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got a quick answer from Mark-Simulacrum. So no issue on loading incremental artifacts side.
simulacrum: AFAIK, rustc always loads incremental artifacts out of the directory only for the local crate - cross-crate state is always from rmeta
Weihang Lo: Ah nice. So it shouldn't be an issue, and Cargo doesn't need to add flock there because it already has one, right?
simulacrum: That sounds right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I think we should probably start off conservatively and only have one incremental/
, like today (which may run into problems with #4282). We can experiment with it later as changing it should have little impact.
The incremental directory takes up a significant chunk of the build-dir size. If we make it unique by -Cextra-filename
then we will end up with multiple of them in the build, ballooning the build-dir size.
Its unclear what the performance impact would be. Having a single directory while changing inputs to -Cextra-filename
could mean faster rebuilds if it can reuse a lot. Or it throws out a lot and thrashes the caches and is benefited by unique incremental/
s.
For CI, its also a benefit to make it easy to clear to keep caching in CI easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this sounds reasonable.
I was worried about a shared incremental
being a point of lock contention when we introduce fine grain locking.
But thinking about a bit more, cargo only enables incremental for workspace and path crates so generally only a small subset would need to lock on this directory.
Also since build-dir
internals are not public interface, we can change it in the future if we find another approach to be optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have update the implementation to return to a single incremental/
directory.
See the updated PR description for the file layout.
CC @Kobzol due to your work on rust-lang/rust#145408. If reducing duplicate search paths speeds up builds, I wonder what the impact will be of having more but pin point focused search paths will be. |
Probably doesn't matter much because you still need to go through all of them to find what a crate need? Edit: doesn't matter for primary crate, but for dependencies at the very root (like syn), it would be helpful. |
Currently, the search path includes each |
I would be a bit worried about perf. in large scenarios (e.g. 1000 crates, which is not that uncommon), as I suspect that rustc does a bunch of linear (hopefully not quadratic) searches through these directories and files in them. I would suggest benchmarking on https://github.com/zed-industries/zed 😆 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Basically files under a search directory are preloaded and sorted and then binary search on them, so shouldn't be too bad? It may incur more opendir/readdir syscall though. Like epage mentioned, it also help for transitive dependency loading less files. But yeah worth some benchmark for larger projects. |
Ah, I forgot that we do binary search already. In that case it will be probably fine, yeah. |
I tried it on Zed and didn't see any perf. difference vs master, neither for clean builds nor for incremental rebuilds. |
If there is a different, it will most likely appear if you have multiple unique versions for each package, e.g. from
|
I didn't see the rebuild time getting higher for Zed when I added multiple versions from different cargo invocations. |
854b259
to
2355753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target/x86_64-unknown-linux-gnu/debug/build/proc-macro2-ee66340aaf816e44
So while this reduces the "max content per directory" (since proc-macro2-ee66340aaf816e44
will be a dir, rather than multiple files), we also have more flexibility for handling this.
Should we change from proc-macro2-ee66340aaf816e44
to proc/-macro2/ee66340aaf816e44
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change from proc-macro2-ee66340aaf816e44 to proc/-macro2/ee66340aaf816e44?
Could we expand a bit more on the benefit of the proposed change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be performance issues on Windows when a directory has a lot of content. We do this prefix-directory stuff for the index and for the build-dir workspace hash. This would be extending it to the build units within the package dir.
As Ross brought up, we don't have guidance on how big is big, what the growth will look like, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we share this layout with the shared cache, then it will likely be more important.
As a follow up to this PR, we may want to remove |
Another reason we might want to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a test for cargo clean -p foo
. Haven't looked at how thats implemented but might at least be a reason for name/hash
rather than name-hash
What does this PR try to resolve?
This PR re-organizes the
build-dir
file layout structure to a layout organized by "build unit" when-Zbuild-dir-new-layout
is enabled.See #15010 for the motivations and design discussions.
Below is file structure generated for a
foo
crate with a single dependency onsyn
.How to test and review this PR?
This PR still needs to be more thoroughly tested. Thus far I have been testing on simple test crates.
Also see #15874 for potential test harness improvements that could be used by this PR.
Still have a good amount of testing + documenting to do before marking this PR as ready, but early feedback is welcome :D