Skip to content

Conversation

@fxiang1
Copy link
Collaborator

@fxiang1 fxiang1 commented Oct 16, 2025

https://issues.redhat.com/browse/ACM-24726

  • Add custom informer for ManagedClusterAddOns to reduce memory usage in large environments (2500 clusters)
  • Increase cache resync period to 10 mins

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.84%. Comparing base (3267edb) to head (0e04278).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
- Coverage   65.86%   59.84%   -6.02%     
==========================================
  Files           2        3       +1     
  Lines         706      924     +218     
==========================================
+ Hits          465      553      +88     
- Misses        218      340     +122     
- Partials       23       31       +8     
Flag Coverage Δ
unit 59.84% <ø> (-6.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: fxiang1 <[email protected]>
// Start the custom informer in a goroutine
go func() {
if err := customInformer.Start(); err != nil {
log.Log.Error(err, "Failed to start custom ManagedClusterAddOn informer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to exit/panic here or mca won't be watched and it will be hard to detect afterwards.
Does it make sense to retry a few times then exit out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mike! Yes, the AI suggested to retry.

Yes, it makes sense to add retry logic here! The informer's Start() method can fail if the cache doesn't sync, which could happen due to:
1. Temporary API server unavailability during startup
2. Network issues
3. The ManagedClusterAddOn CRD not being installed yet

// Call the reconcile function directly
_, err := r.Reconcile(ctx, req)
if err != nil {
log.Error(err, "Failed to reconcile ClusterPermission", "name", cp.Name, "namespace", cp.Namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reconcile error the general pattern should be requeue. I don't see it here or the caller side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI said since we are calling Reconcile manually from the custom informer event handler (not through the normal controller queue), we can only retry here. So AI added retries for this as well 😅

Signed-off-by: fxiang1 <[email protected]>
Signed-off-by: fxiang1 <[email protected]>
fxiang1 added a commit to fxiang1/cluster-permission-io that referenced this pull request Oct 16, 2025
…r-management-io#73)

* Red Hat Konflux update cluster-permission-acm-214 Signed-off-by: red-hat-konflux <[email protected]>

* Bump Go to 1.23

Signed-off-by: fxiang1 <[email protected]>

* Add microdnf update -y

Signed-off-by: fxiang1 <[email protected]>

---------

Signed-off-by: fxiang1 <[email protected]>
Co-authored-by: red-hat-konflux <[email protected]>
Co-authored-by: fxiang1 <[email protected]>
Copy link
Member

@mikeshng mikeshng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Oct 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fxiang1, mikeshng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fxiang1
Copy link
Collaborator Author

fxiang1 commented Oct 16, 2025

Merging manually, as codecov is not taking account of the e2e test coverage.

@fxiang1 fxiang1 merged commit 49671db into open-cluster-management-io:main Oct 16, 2025
6 of 8 checks passed
@fxiang1 fxiang1 deleted the feng-informer branch October 16, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants