Skip to content

Conversation

cnvergence
Copy link
Member

@cnvergence cnvergence commented Jul 28, 2025

Summary

Introduce kcp_logicalcluster_count metric, that will hold the value of currently running logical clusters in the shard.

What Type of PR Is This?

/kind feature

Related Issue(s)

Fixes #3479

Release Notes

Add metrics for logical clusters count

@kcp-ci-bot kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Jul 28, 2025
@kcp-ci-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kcp-ci-bot kcp-ci-bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 28, 2025
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 8, 2025
@cnvergence cnvergence marked this pull request as ready for review August 8, 2025 15:01
@kcp-ci-bot kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2025
Signed-off-by: Karol Szwaj <[email protected]>

On-behalf-of: @SAP [email protected]
@kcp-ci-bot kcp-ci-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 8, 2025
return
}

if logicalCluster.Status.Phase == corev1alpha1.LogicalClusterPhaseReady {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add metrics for all states:

const (
	LogicalClusterPhaseScheduling   LogicalClusterPhaseType = "Scheduling"
	LogicalClusterPhaseInitializing LogicalClusterPhaseType = "Initializing"
	LogicalClusterPhaseReady        LogicalClusterPhaseType = "Ready"
	// LogicalClusterPhaseUnavailable phase is used to indicate that the logical cluster is unavailable to be used.
	// It will not be served via front-proxy when in this state.
	// Possible state transitions are from Ready to Unavailable and from Unavailable to Ready.
	// This should be used when we really can't serve the logical cluster content and not some
	// temporary flakes, like readiness probe failing.
	LogicalClusterPhaseUnavailable LogicalClusterPhaseType = "Unavailable"
)

Basically if I have bug where my logicalcluster hangs in failed or initializing state - I want to know

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid point, generic count can be helpful in this situation, let's keep it simpler as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look implemented @cnvergence or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the Readiness check - we just fetch the count of all logical clusters.
Or do we want to add a logical cluster count metric for ready and other separately? 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the idea. So it would be a phase label on the metric:

kcp_logicalcluster_count{phase="Ready"} 5
kcp_logicalcluster_count{phase="Scheduling"} 3
kcp_logicalcluster_count{phase="Unavailable"} 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider adding the phase label and processing it during updates. Possibly, we can also split the metrics.
This will align with how, for example, kube-state-metrics processes the count of the pods

@embik
Copy link
Member

embik commented Aug 13, 2025

/kind feature

@kcp-ci-bot kcp-ci-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Aug 13, 2025
return
}

if logicalCluster.Status.Phase == corev1alpha1.LogicalClusterPhaseReady {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look implemented @cnvergence or am I missing something?

@mjudeikis
Copy link
Contributor

/lgtm
/approve

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 22, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f8834b926c94a3030f3347a5a60926207023dfb8

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2025
@cnvergence
Copy link
Member Author

/retest

1 similar comment
@cnvergence
Copy link
Member Author

/retest

@kcp-ci-bot kcp-ci-bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2025
@ntnn
Copy link
Member

ntnn commented Aug 28, 2025

/retest

integration flake, it's been getting worse =/

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a4fb857ee54430d476313708e0f3142af893c4d7

@cnvergence
Copy link
Member Author

/retest

@kcp-ci-bot kcp-ci-bot merged commit 5b85ea1 into kcp-dev:main Aug 28, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: logical cluster count metrics per shard
5 participants