Invariant Signal Collection for Kubernetes Testing #5197

aojea · 2025-03-12T12:02:26Z

One-line PR description: Create and define a system for reporting and tracking invariant violations detected during Kubernetes testing.
Issue link: Invariant Signal Collection for Kubernetes Testing #5196
Other comments:

Change-Id: Ic5d1b1f8c0e33bf1ef9ad320f46982f1c5587451

k8s-ci-robot · 2025-03-12T12:02:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-testing/OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-03-12T12:05:57Z

@aojea: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-verify	`3d7c326`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pohly · 2025-03-12T12:06:36Z

keps/sig-testing/5196-invariants/README.md

+
+## Summary
+
+This proposal defines a system to gather and analyze invariant signals about the Kubernetes cluster during test execution. We want to see how key parts of the system are behaving in a way that's similar to how real users would see it. This will help us find problems that normal tests might miss, and give us a better picture of how stable and reliable Kubernetes is.


Nit: please wrap lines.

#5085

Commented in #5085.

You can set your browser width to control wrapping client side, or review the rendered markdown which will flow independently of any manual line wrap anyhow.

(This also allows you to control the width, and avoids having the lines awkwardly broken up if I choose a different width)

Viewing wasn't the reason - see #5085 (comment)

hahha, I just put the PR as placeholder to not forget and copied pasted from my google doc, sorry about that.
My editor wraps the lines, is that this time I didn't use it

to be clear, this is a mistake and I agree with Patrick on line wrapping

BenTheElder · 2025-03-13T22:29:53Z

keps/sig-testing/5196-invariants/README.md

+
+3. Storing and Processing Invariant Data:
+
+We will use existing tools to store the invariant data in a database that is easy to search.


I don't think that's sufficient, we can't tell people "go pay to run bigquery to search for results". (and do so periodically)

This is why I'm still hesitant to abandon the approach of reporting a pass/fail, which we already disseminate in various forms including email alerts.

If we're going to commit to building a frontend for these results instead, we should be discussing that here.

reporting of tests is making this reporting a problem of everyone, and when is a problem of everyone then is a problem of noone ... if there is interest on this I expect people to participate on creating and maintaining it , otherwise we'll add more complexity and flakiness to an under maintained area

BenTheElder · 2025-03-13T22:40:09Z

keps/sig-testing/5196-invariants/README.md

+
+4. (Future) Looking at the Invariant Data:
+
+We will create dashboards to show the invariant data in a way that's easy to understand.


Which look like ..? And use what?

metrics/ is ~dead. There are no dashboards available for this. It lacked maintainers. There are ... some questionable JSON files, that nobody uses, that are not rendered anywhere.

and why there will be the reason that adding it to the test we'll not end in the same situation?
I don't find sustainable to add new failures modes to the CI without a plan of how to maintain it

BenTheElder

we should have alternatives considered, e.g. an opt-in "runs last" test that reports when the metric is tripped

there's a substantial tradeoff in latency of reporting this result, and complexity of implementing the system, versus attempting to silently gather from all jobs using barely maintained systems and having people poll for results

aojea · 2025-03-23T05:40:05Z

/close

@BenTheElder I'm going to close this and reassign to you, since you have a better proposal

k8s-ci-robot · 2025-03-23T05:40:10Z

@aojea: Closed this PR.

In response to this:

/close

@BenTheElder I'm going to close this and reassign to you, since you have a better proposal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Invariant Signal Collection for Kubernetes Testing

3d7c326

Change-Id: Ic5d1b1f8c0e33bf1ef9ad320f46982f1c5587451

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 12, 2025

k8s-ci-robot requested review from alvaroaleman and xmcqueen March 12, 2025 12:02

pohly reviewed Mar 12, 2025

View reviewed changes

BenTheElder assigned pohly and BenTheElder Mar 13, 2025

BenTheElder reviewed Mar 13, 2025

View reviewed changes

k8s-ci-robot closed this Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Invariant Signal Collection for Kubernetes Testing #5197

Invariant Signal Collection for Kubernetes Testing #5197

Uh oh!

aojea commented Mar 12, 2025

Uh oh!

k8s-ci-robot commented Mar 12, 2025

Uh oh!

k8s-ci-robot commented Mar 12, 2025

Uh oh!

pohly Mar 12, 2025

Uh oh!

BenTheElder Mar 13, 2025

Uh oh!

BenTheElder Mar 13, 2025

Uh oh!

pohly Mar 14, 2025

Uh oh!

aojea Mar 14, 2025

Uh oh!

aojea Mar 14, 2025

Uh oh!

BenTheElder Mar 13, 2025 •

edited

Loading

Uh oh!

aojea Mar 14, 2025 •

edited

Loading

Uh oh!

BenTheElder Mar 13, 2025

Uh oh!

aojea Mar 14, 2025

Uh oh!

BenTheElder left a comment

Uh oh!

aojea commented Mar 23, 2025

Uh oh!

k8s-ci-robot commented Mar 23, 2025

Uh oh!

Uh oh!


		## Summary

		This proposal defines a system to gather and analyze invariant signals about the Kubernetes cluster during test execution. We want to see how key parts of the system are behaving in a way that's similar to how real users would see it. This will help us find problems that normal tests might miss, and give us a better picture of how stable and reliable Kubernetes is.


		3. Storing and Processing Invariant Data:

		We will use existing tools to store the invariant data in a database that is easy to search.


		4. (Future) Looking at the Invariant Data:

		We will create dashboards to show the invariant data in a way that's easy to understand.

Invariant Signal Collection for Kubernetes Testing #5197

Invariant Signal Collection for Kubernetes Testing #5197

Uh oh!

Conversation

aojea commented Mar 12, 2025

Uh oh!

k8s-ci-robot commented Mar 12, 2025

Uh oh!

k8s-ci-robot commented Mar 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenTheElder Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aojea Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenTheElder left a comment

Choose a reason for hiding this comment

Uh oh!

aojea commented Mar 23, 2025

Uh oh!

k8s-ci-robot commented Mar 23, 2025

Uh oh!

Uh oh!

BenTheElder Mar 13, 2025 •

edited

Loading

aojea Mar 14, 2025 •

edited

Loading