chore: Fix flaky deploy tests #565
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
[sc-123004]
The merge of #558 has introduced some flakiness into the deploy_test.go test file.
Investigation
The failures in our build pipeline seems to be very intermittent and would easily resolve be rerunning the action. Running the test locally to reproduce the action requires rerunning the test over and over until a failure is raised.
Go allows us to do this by setting the
countproperty when running the tests. The following command was used to raise the error:go test ./pkg/cmd/release/deploy/deploy_test.go -count=200 -failfast. Thefailfastproperty was used to stop the iteration when a failure occured. The result of this command generally looks as follows:In general the failures where always related to one of the following tests:
prompted variabletenant and tags in a definitely tenanted projecttenant and tags in a mabye tenanted project (choosing tenanted)prompt if feature toggle is on and a release has missing packagesFrom the test failure output it was displaying both the general deploy information as well as the additional options, where the test was only expecting the general information. This is surprising for two reasons:
PrintAdvancedSummary, that can be found here. This function is always called during the deployment process, so its information should always be displayed to the user.Findings
The fact that the additional options is always called leads me to believe that what is displayed to the screen is actually correct and should not result in a failure of the test. I believe that the addition of ephemeral environments to the deployment command and its subsequent refactor didn't introduce flakiness but merely surfaced a brittleness in the way the test was structured.
The comparison of the text between the general deploy information and the additional options is done differently in the tests. The general information checks for an exact match (using
assert.Equal), where the additional options only checks for a regular expression match (usingassert.Regexp). The test thus required the exact text to be displayed to the buffer at a specific time, it would then clear it before looking for more text. With the addition of the new code this meant that in some of the runs the timing would be off and more information would be displayed than was expected.In order to solve this, the PR changes the exact match to a contains, to check that the information it expects appears somewhere in the output. We are still checking for the same information to be displayed, but we do not care if there is additional options also contained in the buffer. I've also removed the clearing of the buffer further down in the test to account for the timing of additional checks to not be affected by a clearing of the display buffer.
Result
Running the command to reproduce the error locally now results in no error being displayed.