-
Notifications
You must be signed in to change notification settings - Fork 112
[CI] Add design document for post submit testing #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This patch adds the design document outlining how we plan on implementing post submit testing for the premerge configuration along with the motivation and alternatives considered.
premerge/post-submit-testing.md
Outdated
llvm-zorg repository under the `buildbot/` folder. These configurations are run | ||
on Buildbot workers that are hosted by the community. | ||
|
||
It is important that we test the premerge configuration postcommit as well. We |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-work this paragraph. Motivations should start by stating what the problem is you're trying to solve (sending false positive notifications to authors about builds/tests already failing at head). Once the problem is explained (including WHY it's important/impactful to solve this problem), then you explain how you plan to solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the phrase "for certain types of automation" is so vague as to be virtually useless. I would either add more details or omit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked the paragraph to talk about what we want to do in the end at the top and then go into the how.
Removed the vague wording.
premerge/post-submit-testing.md
Outdated
postcommit testing is performed through the Buildbot infrastructure. The main | ||
Buildbot instance for LLVM has a web instance hosted at | ||
[lab.llvm.org](https:/lab.llvm.org). When a new commit lands in `main` the | ||
Buildbot instance (sometimes referred to as the Buildbot master) will trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also mention that the current postcommit buildbots don't run separate instances for each commit -- they bundle multiple commits together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends upon the configuration and bundling commits is explicitly discouraged in https://llvm.org/docs/HowToAddABuilder.html. I've added some text to describe this.
premerge/post-submit-testing.md
Outdated
available (able to tolerate an entire cluster going down), similar to the | ||
premerge testing. The builders will be configured to use a script that will | ||
launch builds on each commit to `main` in a similar configuration to the one run | ||
premerge. The test configuration is intended to be close to the premerge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a similar configuration to the one run premerge" => "the same configuration used for premerge testing (for PR commits); for direct git commit
commits, it will use that same configuration that would have been used with PR testing, if the commit had gone through a PR."
Replace the sentence "The test configuration is intended to be close to the premerge configuration but will be different in some key ways." with:
"The post submit testing is intended to be very close to the premerge testing, but will be different in some key ways."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the distinction between directly pushed commits/commits from PRs is useful here. I've changed up the wording to say the commit will be tested as if it is going through the premerge pipeline, with some minor differences.
Ack on the second point, changed up the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments from just skimming it, I don't know too much about the implementation myself.
Really like that you're documenting all this, I see those incident reports too, great stuff.
|
||
## Background/Motivation | ||
|
||
LLVM has two types of testing upstream: premerge and postcommit. The premerge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default assumption for this document is going to be that "LLVM" refers to upstream llvm, so you can just say "testing" I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit redundant, but I'd prefer to keep it in there. There will probably be internal (to Google) audiences reading this who might not be as familiar with everything upstream and we have a lot of downstream CI, so keeping it explicit might make it more clear for someone and only be redundant for someone else.
premerge/post-submit-testing.md
Outdated
`main` is also important for certain kinds of automation, like the planned | ||
premerge testing advisor that will let contributors know if tests failing in | ||
their PR are already failing at `main` and that it should be safe to merge | ||
despite the given failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some very large paragraphs. The content is fine but a line break between each "point" made would make it easier to read through.
premerge/post-submit-testing.md
Outdated
new cluster to efficiently utilize resources. To do this, we plan on having the | ||
AnnotatedBuilder script launch builds as kubernetes pods. This allows for | ||
kubernetes to assign the builds to nodes and also allows autoscaling through | ||
the same mechanism that Github autoscales. This allows for us to quickly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think you mean the same mechanism that LLVM testing on GitHub uses, not GitHub the company specifically uses it.
premerge/post-submit-testing.md
Outdated
problematic only when combined. This means we need to test the premerge | ||
configuration postcommit as well so that we can determine the state of `main` | ||
(in terms of whether the build passed/failed and what tests failed, if any) at | ||
any given point in time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendation: Add a sentence at the end of this paragraph about the premerge advisor (which you reference in the Data Storage section as if it had been mentioned previously). Something like "We can then use this data to implement a premerge advisor, to help prevent sending false positive test failure notifications to users."
I think this is looking pretty good now; I've just left a few more minor comments inline. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please do not merge until others have had a chance to review and give feedback.
Does anybody else have any feedback for this design? If not, I will go ahead an approve it... |
premerge/post-submit-testing.md
Outdated
had dependencies that changed. We propose this for two main reasons. Firstly, | ||
Buildbot does not have support for heterogenous build configurations. This means | ||
that testing a different set of projects within a single builder depending upon | ||
the contents of the commit could easily cause problems. More notifications could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of problems?
From buildbot point of view, you would track the entire project (so no path-based-filtering), but the script could very well disable some tests based on the commit diff. Buildbot should be oblivious to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified it to be good support. We definitely could just run stuff like we do in the premerge pipeline on Github with path based filtering, but we run into problems described below.
supports a [REST API](https://docs.buildbot.net/latest/developer/rest.html) that | ||
would allow for easily querying the state of a commit in `main`. | ||
|
||
For the proposed premerge advisor that tells the user what tests/build failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this refers to a "proposed premerge advisor" but that's the first time this is mentioned in the doc, and the casual reader having no context (me) will be lost. Is there a link to some proposal that should be added here?
This section feel like a bit out-of-place for a document describing the "post-merge testing" flow IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this feature looks really nice otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned briefly in the last paragraph of the background/motivation section.
There is no proposal yet, but I'm hoping to write an RFC/document soon to share with the community once I've fleshed out the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me it would fit better the flow here to phrase it this way I think:
For the proposed premerge advisor that tells the user what tests/build failures | |
In the future, we may implement a "premerge advisor" that tells the user what tests/build failures |
Looks nice, thanks for the write-up! (and all the work that went behind setting up the infra) |
premerge/post-submit-testing.md
Outdated
Each pull request is tested as if it was merged into main, which means the | ||
commit underneath the PR is very recent. If a premerge run fails, the premerge | ||
advisor will find the commit from `main` the PR is being tested on. It will then | ||
query the Buildbot master using the REST API for the status of that commit. | ||
It can then report the appropriate status to the user. Having the status will | ||
let the premerge advisor avoid pestering LLVM developers with failures | ||
unrelated to their changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the PR build finished & fails before the main post-commit build finished? Is there a queue/poll mechanism for the advisor? Or will it report "don't know" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to just have it query for the preceding commits until it hits something. For the purpose of excluding test failures at head, that should be fine.
I'll flesh out the details more for this when I write up a document/RFC on the premerge advisor specifically.
supports a [REST API](https://docs.buildbot.net/latest/developer/rest.html) that | ||
would allow for easily querying the state of a commit in `main`. | ||
|
||
For the proposed premerge advisor that tells the user what tests/build failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me it would fit better the flow here to phrase it this way I think:
For the proposed premerge advisor that tells the user what tests/build failures | |
In the future, we may implement a "premerge advisor" that tells the user what tests/build failures |
This patch adds the design document outlining how we plan on implementing post submit testing for the premerge configuration along with the motivation and alternatives considered.