Skip to content

Conversation

zhujian7
Copy link
Member

@zhujian7 zhujian7 commented Sep 3, 2025

The addon template controller was stopping addon managers immediately when ClusterManagementAddon was deleted, without waiting for pre-delete jobs to complete or ManagedClusterAddons to be cleaned up via owner reference cascading deletion.

This change implements the TODO at line 105 by checking if all ManagedClusterAddons are deleted before stopping the manager. The controller now uses field selectors to efficiently query for remaining ManagedClusterAddons and requeues after 10 seconds if any still exist, allowing time for proper cleanup.

🤖 Generated with Claude Code

Summary

Related issue(s)

Fixes #1156

Summary by CodeRabbit

  • Bug Fixes

    • Prevents add-on managers from stopping while dependent managed add-ons still exist; fixes a ConfigMap test variable typo.
  • New Features

    • Reduces unnecessary reconciliations by filtering to templates referenced by managed add-ons and uses indexed discovery for manager gating.
  • Tests

    • Adds unit tests for gated shutdown scenarios and e2e checks verifying deletion waits for propagated-resource cleanup (two identical e2e cases present).
  • Chores

    • Improved structured, context-aware informational logging.

Copy link

coderabbitai bot commented Sep 3, 2025

Walkthrough

Gates stopping of addon managers on presence of ManagedClusterAddOns using a name-based indexer; replaces label/list MCA discovery with index lookups, updates controller informer wiring and sync/run flows, adds unit tests for stopUnusedManagers, tweaks template-agent logging, and introduces duplicate e2e tests for cascade deletion cleanup.

Changes

Cohort / File(s) Summary
Controller logic & indexing
pkg/addon/controllers/addontemplate/controller.go
Add managedClusterAddonIndexer (cache.Indexer); wire it from ManagedClusterAddOns informer; switch MCA discovery from label/list to name-based index (addonindex.ManagedClusterAddonByName + ByIndex); update queuing/informer wiring to filtered/name-based queue keys; change MCA type refs to addonv1alpha1.ManagedClusterAddOn; adjust stopUnusedManagers to return error and gate stopping when MCAs exist; add logging around manager lifecycle.
Unit tests: manager stop behavior
pkg/addon/controllers/addontemplate/controller_test.go
Register ManagedClusterAddOn name indexer in tests; pass managedClusterAddonIndexer into test controller; add TestStopUnusedManagers (table-driven) covering no/one/multiple/no-manager scenarios; wrap stop to assert stop invocation and requeue behavior; update test stores to include indexer-populated MCAs.
E2E: cascade deletion & pre-delete cleanup
test/e2e/addonmanagement_test.go
Fix typo copyiedConfigcopiedConfig; add two identical It blocks asserting ClusterManagementAddon deletion waits for ManagedClusterAddons cleanup by creating propagated ConfigMap, deleting CMA, and verifying propagated/cleanup resources and MCA deletion.
Template agent logging tweaks
pkg/addon/templateagent/registration.go, pkg/addon/templateagent/template_agent.go
Replace utilruntime.HandleError calls with structured a.logger.Info in registration funcs; prefix log messages in GetAgentAddonOptions. Functional returns unchanged.
Imports & types
pkg/addon/controllers/..., tests
Add imports (k8s.io/client-go/tools/cache, addonindex, testinghelpers, etc.); update type usages to addonv1alpha1.ManagedClusterAddOn; add cache.Indexer field on controller struct.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • zhiweiyin318
  • xuezhaojun

Assessment against linked issues

Objective Addressed Explanation
Ensure ManagedClusterAddOns are removed when ClusterManagementAddOn is deleted, accounting for pre-delete job lifecycle (#1156) stopUnusedManagers now queries MCAs by name index and defers stopping addon managers while any MCA for the addon remains; informer wiring triggers reconciles on relevant MCA events.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Duplicate e2e test case added (test/e2e/addonmanagement_test.go) Two identical It blocks appear duplicated and do not add new verification; likely accidental.
Template agent logging changes (pkg/addon/templateagent/*) Logging refactor is unrelated to MCA cleanup bugfix and could be split into a separate PR.

Pre-merge checks (2 passed, 3 warnings)

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning This PR includes unrelated modifications such as the ConfigMap variable name fix and duplicate E2E test cases in addonmanagement_test.go and logging message changes in templateagent code, which do not relate to the issue of ManagedClusterAddons removal on ClusterManagementAddon deletion. Extract the unrelated E2E test corrections and logging adjustments into separate PRs or branches so this change set focuses exclusively on the deletion gating and indexer-based reconciliation logic.
Description Check ⚠️ Warning The description does not populate the required “## Summary” section with the PR’s summary and leaves the “## Related issue(s)” section incomplete, instead repeating a top-level paragraph and including a raw URL rather than the shorthand issue reference. Populate the “## Summary” section with the concise description of the change (for example, moving the existing top paragraph under that heading) and update “## Related issue(s)” to reference the issue in shorthand format (e.g., “Fixes #1156”).
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the primary change by indicating the bug being fixed with ManagedClusterAddons removal when a ClusterManagementAddon is deleted, and it follows the repository conventions with the bug fix icon and concise wording.
Linked Issues Check ✅ Passed The PR implements the linked issue’s core requirement by delaying the shutdown of addon managers until all ManagedClusterAddons are confirmed deleted, using efficient index-based lookups, returning a requeue when addons remain, and adding tests to validate the stopUnusedManagers behavior across scenarios.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 62.71186% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.17%. Comparing base (8944768) to head (171ebbc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/addon/controllers/addontemplate/controller.go 63.63% 17 Missing and 3 partials ⚠️
pkg/addon/templateagent/registration.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1160      +/-   ##
==========================================
- Coverage   58.17%   58.17%   -0.01%     
==========================================
  Files         211      211              
  Lines       20797    20819      +22     
==========================================
+ Hits        12099    12111      +12     
- Misses       7628     7637       +9     
- Partials     1070     1071       +1     
Flag Coverage Δ
unit 58.17% <62.71%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from a6896e5 to 5bdd118 Compare September 3, 2025 09:06
@zhujian7 zhujian7 marked this pull request as ready for review September 3, 2025 09:57
@zhujian7
Copy link
Member Author

zhujian7 commented Sep 3, 2025

/cc @haoqing0110 @qiujian16

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
test/e2e/addonmanagement_test.go (3)

852-862: Nit: fix “copyiedConfig” typo.

Rename for readability.

-			copyiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
+			copiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
 				context.Background(), configmap.Name, metav1.GetOptions{})
@@
-			if !equality.Semantic.DeepEqual(copyiedConfig.Data, configmap.Data) {
-				return fmt.Errorf("expected configmap is not correct, %v", copyiedConfig.Data)
+			if !equality.Semantic.DeepEqual(copiedConfig.Data, configmap.Data) {
+				return fmt.Errorf("expected configmap is not correct, %v", copiedConfig.Data)

869-907: Reduce duplication via small helpers to improve test maintainability.

Create tiny helpers like waitForConfigMapCopy(ns, name), waitForConfigMapDeletion(ns, name), waitForJobDeletion(ns, name) and reuse them here and in earlier tests that perform the same checks.


883-919: Optional: add explicit Eventually timeouts to reduce flakes.

Consider explicit timeouts/polling intervals (e.g., 2m, 5s) to avoid environment-dependent defaults.

pkg/addon/controllers/addontemplate/controller_test.go (1)

273-391: Solid unit coverage for stopUnusedManagers; consider two small additions.

  • Add a case to simulate list error (fake client reactor) and assert requeue-on-error (after you wire it in the controller, see separate comment).
  • Clarify behavior when no manager exists but MCAs do: do we want to requeue in that case? If not, add a case and assertion.
pkg/addon/controllers/addontemplate/controller.go (2)

109-113: Prefer fields.OneTermEqualSelector for field selector construction.

More robust than manual string concatenation.

-	listOptions := metav1.ListOptions{
-		FieldSelector: "metadata.name=" + addOnName,
-	}
+	listOptions := metav1.ListOptions{
+		FieldSelector: fields.OneTermEqualSelector("metadata.name", addOnName).String(),
+	}

Add import:

  • k8s.io/apimachinery/pkg/fields

119-124: Optional: use rate-limited requeue and a named constant.

  • Rate-limited requeue reduces fixed polling.
  • Extract 10s to a const for easier tuning and tests.

Example:

const waitForMCACleanup = 10 * time.Second
// ...
syncCtx.Queue().AddAfter(addOnName, waitForMCACleanup)
// or: syncCtx.Queue().AddRateLimited(addOnName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e0476ee and 0dd5733.

📒 Files selected for processing (3)
  • pkg/addon/controllers/addontemplate/controller.go (1 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (2 hunks)
  • test/e2e/addonmanagement_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: skeeey
PR: open-cluster-management-io/ocm#1053
File: pkg/server/services/addon/addon_test.go:208-225
Timestamp: 2025-06-27T09:46:24.579Z
Learning: In the Open Cluster Management addon service, deletion actions are not expected to occur. The OnDelete handler method is implemented as a no-op, and test coverage for deletion is not needed since this operation is not part of the service's expected functionality.
📚 Learning: 2025-06-27T09:46:24.579Z
Learnt from: skeeey
PR: open-cluster-management-io/ocm#1053
File: pkg/server/services/addon/addon_test.go:208-225
Timestamp: 2025-06-27T09:46:24.579Z
Learning: In the Open Cluster Management addon service, deletion actions are not expected to occur. The OnDelete handler method is implemented as a no-op, and test coverage for deletion is not needed since this operation is not part of the service's expected functionality.

Applied to files:

  • test/e2e/addonmanagement_test.go
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller_test.go (2)
pkg/registration/helpers/testing/testinghelpers.go (1)
  • NewManagedClusterAddons (520-529)
pkg/common/testing/fake_sync_context.go (1)
  • NewFakeSyncContext (21-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration
  • GitHub Check: e2e-singleton
  • GitHub Check: unit
  • GitHub Check: e2e
  • GitHub Check: build
  • GitHub Check: e2e-hosted
  • GitHub Check: verify
🔇 Additional comments (3)
test/e2e/addonmanagement_test.go (1)

834-868: Good end-to-end coverage for CMA-driven cascading deletion.

This validates the pre-delete job path and MCA cleanup before asserting CMA deletion. Matches the PR objective well.

pkg/addon/controllers/addontemplate/controller_test.go (1)

28-29: Import looks correct.

Importing testinghelpers to construct MCAs aligns with the new tests’ needs.

pkg/addon/controllers/addontemplate/controller.go (1)

127-133: LGTM: stop and remove manager after confirming zero MCAs.

Matches the intended gating and logs the action.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
pkg/addon/controllers/addontemplate/controller.go (1)

110-115: Requeue on List error to avoid leaking managers when CMA is gone.

If the List call fails while the CMA key is being processed, explicitly requeue to ensure we retry teardown instead of relying solely on controller-level error handling.

Apply:

-	mcaList, err := c.addonClient.AddonV1alpha1().ManagedClusterAddOns("").List(ctx, listOptions)
-	if err != nil {
-		return err
-	}
+	mcaList, err := c.addonClient.AddonV1alpha1().ManagedClusterAddOns("").List(ctx, listOptions)
+	if err != nil {
+		logger.Error(err, "Failed to list ManagedClusterAddOns", "addonName", addOnName)
+		// retry shortly on transient API errors
+		syncCtx.Queue().AddAfter(addOnName, 10*time.Second)
+		return nil
+	}
🧹 Nitpick comments (3)
pkg/addon/controllers/addontemplate/controller.go (3)

109-113: Build the FieldSelector via fields.OneTermEqualSelector.

Small safety/readability win vs hand-building the string.

Apply:

-	listOptions := metav1.ListOptions{
-		FieldSelector: "metadata.name=" + addOnName,
-	}
+	listOptions := metav1.ListOptions{
+		FieldSelector: fields.OneTermEqualSelector("metadata.name", addOnName).String(),
+	}

Add import (outside this hunk):

import (
  // ...
  "k8s.io/apimachinery/pkg/fields"
  // ...
)

118-124: LGTM; consider making the 10s delay a named const.

The requeue-on-existence behavior is correct. Exposing 10*time.Second as a named const (e.g., waitMCADeletion) makes tuning easier across environments.


130-133: Tweak log message for accuracy.

Log reflects the post-stop state; make it past tense.

-		logger.Info("Stopping the manager for addon", "addonName", addOnName)
+		logger.Info("Stopped the manager for addon", "addonName", addOnName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0dd5733 and c0f61f1.

📒 Files selected for processing (2)
  • pkg/addon/controllers/addontemplate/controller.go (2 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/addon/controllers/addontemplate/controller_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller.go (1)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
  • GitHub Check: unit
  • GitHub Check: verify
  • GitHub Check: integration
  • GitHub Check: build
🔇 Additional comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

104-106: Signature + context logger change looks good.

Returning error and using klog.FromContext for structured logging is appropriate here.


142-149: Confirm requeue semantics on error propagation.

You now return the error from stopUnusedManagers. Please confirm library-go’s factory.Controller requeues the same key on sync error; if not, add an explicit requeue in the error path to be safe.

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing we need to confirm is when cma is deleting, does install controller still create mca?

listOptions := metav1.ListOptions{
FieldSelector: "metadata.name=" + addOnName,
}
mcaList, err := c.addonClient.AddonV1alpha1().ManagedClusterAddOns("").List(ctx, listOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we list from lister instead calling api directly?

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from 11338ad to 0844710 Compare September 4, 2025 08:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/addon/controllers/addontemplate/controller.go (1)

22-23: Duplicate import alias (won’t compile).

The same package is imported twice under different aliases. Consolidate to a single alias and update the type reference.

-	addonapiv1alpha1 "open-cluster-management.io/api/addon/v1alpha1"
 	addonv1alpha1 "open-cluster-management.io/api/addon/v1alpha1"
-	getValuesClosure := func(cluster *clusterv1.ManagedCluster, addon *addonapiv1alpha1.ManagedClusterAddOn) (addonfactory.Values, error) {
+	getValuesClosure := func(cluster *clusterv1.ManagedCluster, addon *addonv1alpha1.ManagedClusterAddOn) (addonfactory.Values, error) {
 		return templateagent.GetAddOnRegistriesPrivateValuesFromClusterAnnotation(klog.FromContext(ctx), cluster, addon)
 	}

Also applies to: 222-224

🧹 Nitpick comments (4)
pkg/addon/controllers/addontemplate/controller.go (4)

36-39: Requeue delay: make configurable or add jitter.

10s is fine, but consider a flag/env or jitter to avoid alignment across many addons.

-const (
-	// requeueDelay is the delay for requeuing when ManagedClusterAddOns still exist
-	requeueDelay = 10 * time.Second
-)
+const requeueDelay = 10 * time.Second
+// TODO: consider making this configurable and/or apply jitter.

53-53: Ensure MCA informer is started/synced for lister reads.

Since stopUnusedManagers depends on mcaLister, include the MCA informer in WithBareInformers so the factory guarantees cache sync before sync runs.

 return factory.New().WithInformersQueueKeysFunc(
 	queue.QueueKeyByMetaNamespaceName,
 	addonInformers.Addon().V1alpha1().ClusterManagementAddOns().Informer()).
-		WithBareInformers(
+		WithBareInformers(
 			// do not need to queue, just make sure the controller reconciles after the addonTemplate cache is synced
 			// otherwise, there will be "xx-addon-template" not found" errors in the log as the controller uses the
 			// addonTemplate lister to get the template object
-			addonInformers.Addon().V1alpha1().AddOnTemplates().Informer()).
+			addonInformers.Addon().V1alpha1().AddOnTemplates().Informer(),
+			// ensure MCA cache is synced for mcaLister usage
+			addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer()).

Also applies to: 83-84, 99-106


110-136: Gating logic looks good; simplify and short‑circuit on first match.

No need to build a slice; return as soon as one MCA for this addon is found. Also align param name with addonName used elsewhere.

-func (c *addonTemplateController) stopUnusedManagers(
-	ctx context.Context, syncCtx factory.SyncContext, addOnName string) error {
+func (c *addonTemplateController) stopUnusedManagers(
+	ctx context.Context, syncCtx factory.SyncContext, addonName string) error {
 	logger := klog.FromContext(ctx)

-	// Check if all managed cluster addon instances are deleted before stopping the manager
-	// This allows pre-delete jobs to complete
 	mcaList, err := c.mcaLister.List(labels.Everything())
 	if err != nil {
 		return err
 	}
 
-	// Filter by addon name
-	var matchingMCAs []*addonv1alpha1.ManagedClusterAddOn
-	for _, mca := range mcaList {
-		if mca.Name == addOnName {
-			matchingMCAs = append(matchingMCAs, mca)
-		}
-	}
-
-	// Check if there are still ManagedClusterAddOns for this addon
-	if len(matchingMCAs) > 0 {
-		logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
-			"addonName", addOnName, "count", len(matchingMCAs))
-		// Requeue to check again later
-		syncCtx.Queue().AddAfter(addOnName, requeueDelay)
-		return nil
-	}
+	for _, mca := range mcaList {
+		if mca.Name == addonName {
+			logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
+				"addonName", addonName)
+			syncCtx.Queue().AddAfter(addonName, requeueDelay)
+			return nil
+		}
+	}

257-261: Log errors with logger.Error for consistency.

Use structured error logging.

-	if err != nil {
-		logger.Info("Failed to list ManagedClusterAddOns", "error", err)
+	if err != nil {
+		logger.Error(err, "Failed to list ManagedClusterAddOns")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c0f61f1 and 0844710.

📒 Files selected for processing (3)
  • pkg/addon/controllers/addontemplate/controller.go (5 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (2 hunks)
  • test/e2e/addonmanagement_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/e2e/addonmanagement_test.go
  • pkg/addon/controllers/addontemplate/controller_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller.go (1)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
🔇 Additional comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

142-144: LGTM: stop and remove manager only after MCAs are gone.


147-161: Good: propagate errors from stopUnusedManagers.

This prevents silent leaks and leverages controller requeue semantics.

@zhujian7
Copy link
Member Author

zhujian7 commented Sep 4, 2025

/hold

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
test/e2e/addonmanagement_test.go (4)

865-865: Remove leftover commented sleep.
Dead code adds noise.

-        // time.Sleep(time.Minute * 10)

852-862: Fix typo: copyiedConfig → copiedConfig.
Minor readability polish.

-            copyiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
+            copiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
                 context.Background(), configmap.Name, metav1.GetOptions{})
             if err != nil {
                 return err
             }
 
-            if !equality.Semantic.DeepEqual(copyiedConfig.Data, configmap.Data) {
-                return fmt.Errorf("expected configmap is not correct, %v", copyiedConfig.Data)
+            if !equality.Semantic.DeepEqual(copiedConfig.Data, configmap.Data) {
+                return fmt.Errorf("expected configmap is not correct, %v", copiedConfig.Data)
             }

848-851: Add local cleanup for the hub-side ConfigMap to avoid test resource leaks.
Keeps the e2e environment clean across runs.

         _, err := hub.KubeClient.CoreV1().ConfigMaps(universalClusterName).Create(
             context.Background(), configmap, metav1.CreateOptions{})
         gomega.Expect(err).ToNot(gomega.HaveOccurred())
+        defer func() {
+            _ = hub.KubeClient.CoreV1().ConfigMaps(universalClusterName).Delete(
+                context.Background(), configmap.Name, metav1.DeleteOptions{})
+        }()

900-910: Harden Eventually timeout for MCA deletion to match the pre-delete window.
Avoids flakiness on slow clusters.

-        gomega.Eventually(func() error {
+        gomega.Eventually(func() error {
             _, err := hub.AddonClient.AddonV1alpha1().ManagedClusterAddOns(universalClusterName).Get(
                 context.TODO(), addOnName, metav1.GetOptions{})
             if err != nil {
                 if errors.IsNotFound(err) {
                     return nil
                 }
                 return err
             }
             return fmt.Errorf("the ManagedClusterAddon should be deleted")
-        }).ShouldNot(gomega.HaveOccurred())
+        }, time.Second*300, time.Second*3).ShouldNot(gomega.HaveOccurred())
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0844710 and 792eb46.

📒 Files selected for processing (1)
  • test/e2e/addonmanagement_test.go (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: unit
  • GitHub Check: e2e-hosted
  • GitHub Check: integration
  • GitHub Check: e2e
  • GitHub Check: e2e-singleton
  • GitHub Check: verify
  • GitHub Check: build
🔇 Additional comments (1)
test/e2e/addonmanagement_test.go (1)

8-8: LGTM: time import is required for explicit Eventually timeouts.

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch 3 times, most recently from be2536c to 8dc16ef Compare September 4, 2025 11:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
test/e2e/addonmanagement_test.go (1)

77-77: Defocus the suite/spec: replace FDescribe/FIt with Describe/It.
Focused specs will skip the rest of the suite in CI.

Apply:

-var _ = ginkgo.FDescribe("Enable addon management feature gate", ginkgo.Ordered, ginkgo.Label("addon-manager"), func() {
+var _ = ginkgo.Describe("Enable addon management feature gate", ginkgo.Ordered, ginkgo.Label("addon-manager"), func() {
-ginkgo.FIt("ClusterManagementAddon deletion should wait for ManagedClusterAddons cleanup", func() {
+ginkgo.It("ClusterManagementAddon deletion should wait for ManagedClusterAddons cleanup", func() {

Run to ensure no other focused specs remain:

#!/bin/bash
rg -nP -g '!**/vendor/**' -C0 '\bF(?:Describe|Context|It|Specify|When|DescribeTable|Entry)\s*\('

Also applies to: 835-835

🧹 Nitpick comments (5)
test/e2e/addonmanagement_test.go (5)

853-861: Nit: fix variable name typo.
Use copiedConfig for clarity/consistency.

-            copyiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
+            copiedConfig, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
                 context.Background(), configmap.Name, metav1.GetOptions{})
             if err != nil {
                 return err
             }
 
-            if !equality.Semantic.DeepEqual(copyiedConfig.Data, configmap.Data) {
-                return fmt.Errorf("expected configmap is not correct, %v", copyiedConfig.Data)
+            if !equality.Semantic.DeepEqual(copiedConfig.Data, configmap.Data) {
+                return fmt.Errorf("expected configmap is not correct, %v", copiedConfig.Data)
             }

917-921: Simplify TailLines pointer creation.
Avoid the inline lambda.

-                    logs, logErr := hub.KubeClient.CoreV1().Pods("open-cluster-management-hub").GetLogs(
-                        pod.Name, &corev1.PodLogOptions{
-                            TailLines: func(i int64) *int64 { return &i }(50),
-                        }).DoRaw(context.Background())
+                    tail := int64(50)
+                    logs, logErr := hub.KubeClient.CoreV1().Pods("open-cluster-management-hub").GetLogs(
+                        pod.Name, &corev1.PodLogOptions{TailLines: &tail},
+                    ).DoRaw(context.Background())

941-943: Nit: log field label is misleading.
It's ContainerStatuses, not a single Ready flag.

-                        klog.Infof("Job pod %s status: Phase=%s, Ready=%v, Conditions=%v",
-                            pod.Name, pod.Status.Phase, pod.Status.ContainerStatuses, pod.Status.Conditions)
+                        klog.Infof("Job pod %s status: Phase=%s, ContainerStatuses=%v, Conditions=%v",
+                            pod.Name, pod.Status.Phase, pod.Status.ContainerStatuses, pod.Status.Conditions)

946-954: Prefer typed PodPhase constants over string literals.
Reduces typo risk.

-                        if pod.Status.Phase == "Failed" || pod.Status.Phase == "Succeeded" {
+                        if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
                             logs, logErr := spoke.KubeClient.CoreV1().Pods(addonInstallNamespace).GetLogs(
-                                pod.Name, &corev1.PodLogOptions{}).DoRaw(context.Background())
+                                pod.Name, &corev1.PodLogOptions{},
+                            ).DoRaw(context.Background())

970-970: Gomega v2 style: use WithTimeout/WithPolling (optional).
Improves readability and matches current idioms.

-        }, time.Second*300, time.Second*3).ShouldNot(gomega.HaveOccurred())
+        }).WithTimeout(5*time.Minute).WithPolling(3*time.Second).ShouldNot(gomega.HaveOccurred())
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 792eb46 and 8dc16ef.

📒 Files selected for processing (1)
  • test/e2e/addonmanagement_test.go (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
  • GitHub Check: e2e-singleton
  • GitHub Check: unit
  • GitHub Check: build
  • GitHub Check: verify
  • GitHub Check: integration
🔇 Additional comments (1)
test/e2e/addonmanagement_test.go (1)

8-8: Import of time is appropriate.
Used for Eventually timeouts; no issues.

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch 2 times, most recently from def832c to b3aeefc Compare September 4, 2025 13:29

// Check if all managed cluster addon instances are deleted before stopping the manager
// This allows pre-delete jobs to complete
mcaList, err := c.mcaLister.List(labels.Everything())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an index addonindex.ManagedClusterAddonByName for this.

logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
"addonName", addOnName, "count", len(matchingMCAs))
// Requeue to check again later
syncCtx.Queue().AddAfter(addOnName, requeueDelay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to requeue? deletion of MCA will also trigger the reconcile, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the controller only watches CMA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we watch MCA? we could filter by configuration in status of mca

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/addon/controllers/addontemplate/controller.go (1)

99-109: Ensure MCA informer is started/synced before using its indexer.

stopUnusedManagers relies on the ManagedClusterAddOns indexer. Add the MCA informer to WithBareInformers so the factory starts and syncs it, avoiding empty index reads if the manager hasn't started yet.

 return factory.New().WithInformersQueueKeysFunc(
   queue.QueueKeyByMetaNamespaceName,
   addonInformers.Addon().V1alpha1().ClusterManagementAddOns().Informer()).
-  WithBareInformers(
+  WithBareInformers(
     // do not need to queue, just make sure the controller reconciles after the addonTemplate cache is synced
     // otherwise, there will be "xx-addon-template" not found" errors in the log as the controller uses the
     // addonTemplate lister to get the template object
-    addonInformers.Addon().V1alpha1().AddOnTemplates().Informer()).
+    addonInformers.Addon().V1alpha1().AddOnTemplates().Informer(),
+    addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer()).
   WithSync(c.sync).
   ToController("addon-template-controller", recorder)
🧹 Nitpick comments (6)
pkg/addon/templateagent/registration.go (2)

94-96: Use error-level logging for template fetch failures.

Fetching the template failing is an error path; logging at Info may hide real issues during triage. Recommend logger.Error with addonName keyed.

- a.logger.Info("CSRConfigurations failed to get addon template", "addonName", a.addonName, "error", err)
+ a.logger.Error(err, "CSRConfigurations failed to get addon template", "addonName", a.addonName)

171-173: Align severity: approval check should log error on template fetch failure.

Same rationale as above; favor Error over Info.

- a.logger.Info("CSRApproveCheck failed to get addon template", "addonName", a.addonName, "error", err)
+ a.logger.Error(err, "CSRApproveCheck failed to get addon template", "addonName", a.addonName)
pkg/addon/controllers/addontemplate/controller_test.go (1)

410-414: Optionally assert requeue behavior.

You can assert the key was requeued by waiting briefly and checking syncContext.Queue().Len() > 0 or popping a single item; this makes the test intention explicit.

pkg/addon/controllers/addontemplate/controller.go (3)

37-41: Consider making requeueDelay configurable or rate-limited.

A fixed 10s works, but allowing configuration or using the controller’s rate-limited queue avoids hot loops under prolonged cleanup.


111-139: Nit: fix log typo and avoid requeue when no manager exists.

  • “Sart” -> “Start”.
  • If no manager exists, requeueing adds noise. Only requeue while a manager is present.
- if len(addons) > 0 {
-   logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
-     "addonName", addOnName, "count", len(addons))
-   // Requeue to check again later
-   syncCtx.Queue().AddAfter(addOnName, requeueDelay)
-   return nil
- }
+ if len(addons) > 0 {
+   if _, exists := c.addonManagers[addOnName]; exists {
+     logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
+       "addonName", addOnName, "count", len(addons))
+     // Requeue to check again later
+     syncCtx.Queue().AddAfter(addOnName, requeueDelay)
+   }
+   return nil
+ }
 ...
- logger.Info("Sart to stop the manager for addon", "addonName", addOnName)
+ logger.Info("Start to stop the manager for addon", "addonName", addOnName)

252-255: Log list/index failures at error level and include addonName.

This helps surface startup issues when warming initial triggers.

- logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+ logger.Error(err, "Failed to list ManagedClusterAddOns by index", "addonName", addonName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3aeefc and 6bdb759.

📒 Files selected for processing (4)
  • pkg/addon/controllers/addontemplate/controller.go (6 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (5 hunks)
  • pkg/addon/templateagent/registration.go (2 hunks)
  • pkg/addon/templateagent/template_agent.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pkg/addon/templateagent/template_agent.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/addon/controllers/addontemplate/controller.go (2)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
pkg/addon/controllers/addontemplate/controller_test.go (3)
pkg/addon/index/index.go (2)
  • ManagedClusterAddonByName (19-19)
  • IndexManagedClusterAddonByName (42-50)
pkg/registration/helpers/testing/testinghelpers.go (1)
  • NewManagedClusterAddons (520-529)
pkg/common/testing/fake_sync_context.go (1)
  • NewFakeSyncContext (21-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: e2e
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: unit
  • GitHub Check: integration
🔇 Additional comments (2)
pkg/addon/controllers/addontemplate/controller_test.go (2)

156-164: Good: index by addon name wired into tests.

Registering ManagedClusterAddonByName ensures the controller paths using ByIndex are exercised realistically.


263-273: Controller under test correctly receives the MCA indexer.

Wiring the indexer into the test controller mirrors production wiring and avoids false negatives.

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from 6bdb759 to de9f918 Compare September 5, 2025 02:06
@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from 5b57be0 to 7055812 Compare September 8, 2025 07:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
test/e2e/addonmanagement_test.go (1)

895-906: Avoid hard-coded cleanup job name

Use the addOnName to derive the cleanup Job name for resilience across addons.

-			_, err := spoke.KubeClient.BatchV1().Jobs(addonInstallNamespace).Get(
-				context.Background(), "hello-template-cleanup-configmap", metav1.GetOptions{})
+			jobName := fmt.Sprintf("%s-cleanup-configmap", addOnName)
+			_, err := spoke.KubeClient.BatchV1().Jobs(addonInstallNamespace).Get(
+				context.Background(), jobName, metav1.GetOptions{})
pkg/addon/controllers/addontemplate/controller.go (2)

124-141: Add a safety requeue when MCAs still exist (defensive forward progress)

Relying solely on filtered MCA events is OK, but a light AddAfter fallback avoids stalls if predicates ever miss an event (e.g., status.ConfigReferences not populated).

  if len(addons) > 0 {
     logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
       "addonName", addOnName, "count", len(addons))
-    // No need to requeue since we now watch ManagedClusterAddon events
-    return nil
+    // Defensive requeue to guarantee forward progress even if no MCA event fires.
+    syncCtx.Queue().AddAfter(addOnName, 10*time.Second)
+    return nil
  }

263-267: Log index lookup failures as errors (not info)

Index failures indicate misconfiguration; use Error to surface this.

-    logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+    logger.Error(err, "Failed to list ManagedClusterAddOns by index", "addonName", addonName)
pkg/addon/controllers/addontemplate/controller_test.go (1)

292-315: Assert enqueue behavior via FakeSyncContext.Queue()
FakeSyncContext.Queue() returns the workqueue, so rename expectedRequeue to expectedEnqueued and add an assertion such as:

g.Expect(ctx.Queue().Len() > 0).To(Equal(expectedEnqueued))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b57be0 and 7055812.

📒 Files selected for processing (5)
  • pkg/addon/controllers/addontemplate/controller.go (9 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (5 hunks)
  • pkg/addon/templateagent/registration.go (2 hunks)
  • pkg/addon/templateagent/template_agent.go (1 hunks)
  • test/e2e/addonmanagement_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/addon/templateagent/registration.go
  • pkg/addon/templateagent/template_agent.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/addon/controllers/addontemplate/controller_test.go (3)
pkg/addon/index/index.go (2)
  • ManagedClusterAddonByName (19-19)
  • IndexManagedClusterAddonByName (42-50)
pkg/registration/helpers/testing/testinghelpers.go (1)
  • NewManagedClusterAddons (520-529)
pkg/common/testing/fake_sync_context.go (1)
  • NewFakeSyncContext (21-27)
pkg/addon/controllers/addontemplate/controller.go (3)
pkg/common/queue/queuekey.go (2)
  • QueueKeyByMetaNamespaceName (67-70)
  • QueueKeyByMetaName (57-60)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e
  • GitHub Check: unit
  • GitHub Check: e2e-hosted
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: build
🔇 Additional comments (7)
test/e2e/addonmanagement_test.go (1)

258-269: ConfigMap propagation assertion — LGTM

Using copiedConfig.Data vs. configmap.Data is correct and fixes the prior compare bug.

pkg/addon/controllers/addontemplate/controller.go (3)

97-115: Filtered MCA watch wiring — LGTM

Watching ManagedClusterAddOn with a focused predicate and queueing by name cleanly unblocks stop logic via MCA delete events while limiting noise.


160-167: Stop path on CMA not found / unsupported — LGTM

Delegating to stopUnusedManagers on CMA absence or non-template CMA aligns with the new gating logic.


72-85: Index already registered globally
The ManagedClusterAddonByName index is set up in pkg/addon/manager.go when informers are initialized, so this controller doesn’t need to call AddIndexers here.

Likely an incorrect or invalid review comment.

pkg/addon/controllers/addontemplate/controller_test.go (3)

156-163: Indexer registration in tests — LGTM

Registering ManagedClusterAddonByName pre-start ensures ByIndex queries work in tests.


262-273: Controller under test wiring — LGTM

managing managedClusterAddonIndexer and shared informers mirrors production wiring.


354-363: Good fix: capture loop variables for stopFunc wrappers

Shadowing capturedName/capturedStopFunc prevents the closure-capture bug.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

154-164: Use shared GVR constant to avoid drift

Leverage utils.AddOnTemplateGVR instead of hard-coded strings.

-  if configRef.ConfigGroupResource.Group == "addon.open-cluster-management.io" &&
-    configRef.ConfigGroupResource.Resource == "addontemplates" {
+  if configRef.ConfigGroupResource.Group == utils.AddOnTemplateGVR.Group &&
+    configRef.ConfigGroupResource.Resource == utils.AddOnTemplateGVR.Resource {
     templateRefs = append(templateRefs, configRef)
   }

306-314: Log index errors at error level and include addonName

Minor severity tweak for visibility.

-  logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+  logger.Error(err, "Failed to list ManagedClusterAddOns by index", "addonName", addonName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7055812 and 778aed5.

📒 Files selected for processing (1)
  • pkg/addon/controllers/addontemplate/controller.go (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller.go (3)
pkg/common/queue/queuekey.go (1)
  • QueueKeyByMetaNamespaceName (67-70)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration
  • GitHub Check: build
  • GitHub Check: verify
  • GitHub Check: unit
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e
  • GitHub Check: e2e-hosted
🔇 Additional comments (7)
pkg/addon/controllers/addontemplate/controller.go (7)

5-19: Imports and index alias look correct

New imports (reflect/cache/workqueue) and addonindex alias are appropriate for the new logic.

Also applies to: 33-36


45-57: Controller fields wiring is sound

Storing the indexer and queue on the controller enables the MCA event-driven requeues you rely on. Looks good.


75-92: SyncContext and queue plumbing LGTM

Using a named controller and persisting syncCtx.Queue() for custom handlers is correct.


101-113: Informer wiring is appropriate

Watching CMA for keys and using MCA/AddOnTemplate as bare informers matches the design. Nice.


185-193: Manager stop path LGTM

Clean shutdown with map cleanup and helpful logging.


199-209: Reconciliation gating is correct

Deferring stop on CMA NotFound/unsupported until MCAs are gone matches the intended fix.


167-183: No changes required — the ManagedClusterAddonByName index is registered in pkg/addon/manager.go (lines 113–118).

Comment on lines 115 to 150
// Add custom event handler for ManagedClusterAddon to filter configReference changes
// following the pattern from managedClusterSetController
_, err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
UpdateFunc: func(oldObj, newObj interface{}) {
// Only handle configReference updates for AddOnTemplates
oldMCA, ok := oldObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
newMCA, ok := newObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}

// Extract AddOnTemplate configReferences from both old and new
oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
newTemplateRefs := extractAddOnTemplateConfigRefs(newMCA)

// Only process if AddOnTemplate configReferences changed
if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
// Queue the addon name to trigger reconciliation
c.queue.Add(newMCA.Name)
}
},
DeleteFunc: func(obj interface{}) {
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
c.queue.Add(mca.Name)
},
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle tombstones in DeleteFunc and make Update comparison order-insensitive

  • Delete events can arrive as cache.DeletedFinalStateUnknown; current cast may miss them.
  • reflect.DeepEqual on slices is order-sensitive; benign reordering will spuriously enqueue.

Apply:

@@
-    DeleteFunc: func(obj interface{}) {
-      mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
-      if !ok {
-        return
-      }
-      c.queue.Add(mca.Name)
-    },
+    DeleteFunc: func(obj interface{}) {
+      switch t := obj.(type) {
+      case *addonv1alpha1.ManagedClusterAddOn:
+        c.queue.Add(t.Name)
+      case cache.DeletedFinalStateUnknown:
+        if mca, ok := t.Obj.(*addonv1alpha1.ManagedClusterAddOn); ok {
+          c.queue.Add(mca.Name)
+        }
+      default:
+        return
+      }
+    },
@@
-      if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
+      sort.Slice(oldTemplateRefs, func(i, j int) bool {
+        oi, oj := oldTemplateRefs[i], oldTemplateRefs[j]
+        return oi.UniqueName < oj.UniqueName
+      })
+      sort.Slice(newTemplateRefs, func(i, j int) bool {
+        ni, nj := newTemplateRefs[i], newTemplateRefs[j]
+        return ni.UniqueName < nj.UniqueName
+      })
+      if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
         // Queue the addon name to trigger reconciliation
         c.queue.Add(newMCA.Name)
       }

Note: UniqueName is available on ConfigReference; if not, sort by Group/Resource/Name composite.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Add custom event handler for ManagedClusterAddon to filter configReference changes
// following the pattern from managedClusterSetController
_, err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
UpdateFunc: func(oldObj, newObj interface{}) {
// Only handle configReference updates for AddOnTemplates
oldMCA, ok := oldObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
newMCA, ok := newObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
// Extract AddOnTemplate configReferences from both old and new
oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
newTemplateRefs := extractAddOnTemplateConfigRefs(newMCA)
// Only process if AddOnTemplate configReferences changed
if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
// Queue the addon name to trigger reconciliation
c.queue.Add(newMCA.Name)
}
},
DeleteFunc: func(obj interface{}) {
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
c.queue.Add(mca.Name)
},
})
// Add custom event handler for ManagedClusterAddon to filter configReference changes
// following the pattern from managedClusterSetController
_, err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
UpdateFunc: func(oldObj, newObj interface{}) {
// Only handle configReference updates for AddOnTemplates
oldMCA, ok := oldObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
newMCA, ok := newObj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return
}
// Extract AddOnTemplate configReferences from both old and new
oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
newTemplateRefs := extractAddOnTemplateConfigRefs(newMCA)
// Sort the configReferences to make comparison order-insensitive
sort.Slice(oldTemplateRefs, func(i, j int) bool {
oi, oj := oldTemplateRefs[i], oldTemplateRefs[j]
return oi.UniqueName < oj.UniqueName
})
sort.Slice(newTemplateRefs, func(i, j int) bool {
ni, nj := newTemplateRefs[i], newTemplateRefs[j]
return ni.UniqueName < nj.UniqueName
})
// Only process if AddOnTemplate configReferences changed
if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
// Queue the addon name to trigger reconciliation
c.queue.Add(newMCA.Name)
}
},
DeleteFunc: func(obj interface{}) {
switch t := obj.(type) {
case *addonv1alpha1.ManagedClusterAddOn:
c.queue.Add(t.Name)
case cache.DeletedFinalStateUnknown:
if mca, ok := t.Obj.(*addonv1alpha1.ManagedClusterAddOn); ok {
c.queue.Add(mca.Name)
}
default:
return
}
},
})
🤖 Prompt for AI Agents
In pkg/addon/controllers/addontemplate/controller.go around lines 115 to 146,
the DeleteFunc must handle cache.DeletedFinalStateUnknown tombstones and the
UpdateFunc comparison must be order-insensitive: modify DeleteFunc to detect if
obj is of type cache.DeletedFinalStateUnknown and, if so, extract the contained
object before casting to *addonv1alpha1.ManagedClusterAddOn, then enqueue its
Name; for UpdateFunc normalize the old/new AddOnTemplate configReference lists
into a deterministic order (either sort by ConfigReference.UniqueName if
present, or by a composite key of Group/Resource/Name) or convert them to a
set/map of unique keys and compare those instead of reflect.DeepEqual, ensuring
nil/empty slices are handled consistently so benign reordering doesn’t trigger
reconciliation.

@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from 778aed5 to 159f919 Compare September 8, 2025 10:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
pkg/addon/controllers/addontemplate/controller.go (1)

115-146: Handle tombstones in DeleteFunc and make Update comparison order-insensitive (use UniqueName sort).

  • Delete events can arrive as cache.DeletedFinalStateUnknown; current cast may miss them.
  • reflect.DeepEqual on slices is order-sensitive; benign reordering spuriously enqueues.

Apply:

@@
-	_, err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
+	_, err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
 		UpdateFunc: func(oldObj, newObj interface{}) {
 			// Only handle configReference updates for AddOnTemplates
 			oldMCA, ok := oldObj.(*addonv1alpha1.ManagedClusterAddOn)
 			if !ok {
 				return
 			}
 			newMCA, ok := newObj.(*addonv1alpha1.ManagedClusterAddOn)
 			if !ok {
 				return
 			}
 
 			// Extract AddOnTemplate configReferences from both old and new
 			oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
 			newTemplateRefs := extractAddOnTemplateConfigRefs(newMCA)
 
-			// Only process if AddOnTemplate configReferences changed
-			if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
+			// Normalize order by UniqueName to avoid spurious enqueue on reordering
+			sort.Slice(oldTemplateRefs, func(i, j int) bool {
+				return oldTemplateRefs[i].UniqueName < oldTemplateRefs[j].UniqueName
+			})
+			sort.Slice(newTemplateRefs, func(i, j int) bool {
+				return newTemplateRefs[i].UniqueName < newTemplateRefs[j].UniqueName
+			})
+			// Only process if AddOnTemplate configReferences changed
+			if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
 				// Queue the addon name to trigger reconciliation
 				c.queue.Add(newMCA.Name)
 			}
 		},
-		DeleteFunc: func(obj interface{}) {
-			mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
-			if !ok {
-				return
-			}
-			c.queue.Add(mca.Name)
-		},
+		DeleteFunc: func(obj interface{}) {
+			switch t := obj.(type) {
+			case *addonv1alpha1.ManagedClusterAddOn:
+				c.queue.Add(t.Name)
+			case cache.DeletedFinalStateUnknown:
+				if mca, ok := t.Obj.(*addonv1alpha1.ManagedClusterAddOn); ok {
+					c.queue.Add(mca.Name)
+				}
+			default:
+				return
+			}
+		},
 	})

Optional: also enqueue on Add when a MCA starts referencing AddOnTemplate to catch edge cases:

+		AddFunc: func(obj interface{}) {
+			if mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn); ok && len(extractAddOnTemplateConfigRefs(mca)) > 0 {
+				c.queue.Add(mca.Name)
+			}
+		},
🧹 Nitpick comments (3)
pkg/addon/controllers/addontemplate/controller.go (3)

3-21: Import sort for order-insensitive comparison.

Required by the UpdateFunc change above.

 import (
 	"context"
-	"reflect"
+	"reflect"
+	"sort"
 	"time"
@@
-	"k8s.io/client-go/tools/cache"
-	"k8s.io/client-go/util/workqueue"
+	"k8s.io/client-go/tools/cache"
+	"k8s.io/client-go/util/workqueue"
 )

153-163: Avoid hard-coded GVR strings; use utils.AddOnTemplateGVR for correctness and consistency.

This prevents drift if group/resource names change and matches existing helpers.

-func extractAddOnTemplateConfigRefs(mca *addonv1alpha1.ManagedClusterAddOn) []addonv1alpha1.ConfigReference {
+func extractAddOnTemplateConfigRefs(mca *addonv1alpha1.ManagedClusterAddOn) []addonv1alpha1.ConfigReference {
 	var templateRefs []addonv1alpha1.ConfigReference
 	for _, configRef := range mca.Status.ConfigReferences {
-		if configRef.ConfigGroupResource.Group == "addon.open-cluster-management.io" &&
-			configRef.ConfigGroupResource.Resource == "addontemplates" {
+		if configRef.ConfigGroupResource.Group == utils.AddOnTemplateGVR.Group &&
+			configRef.ConfigGroupResource.Resource == utils.AddOnTemplateGVR.Resource {
 			templateRefs = append(templateRefs, configRef)
 		}
 	}
 	return templateRefs
 }

305-313: Log index lookup failures as errors and include addonName context.

Improves signal when the index isn’t wired or cache isn’t synced yet.

-	mcas, err := c.managedClusterAddonIndexer.ByIndex(addonindex.ManagedClusterAddonByName, addonName)
+	mcas, err := c.managedClusterAddonIndexer.ByIndex(addonindex.ManagedClusterAddonByName, addonName)
 	if err != nil {
-		logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+		logger.Error(err, "Failed to list ManagedClusterAddOns by index", "addonName", addonName)
 	} else {
 		for _, mca := range mcas {
 			addon, ok := mca.(*addonv1alpha1.ManagedClusterAddOn)
 			if ok {
 				mgr.Trigger(addon.Namespace, addonName)
 			}
 		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 778aed5 and 159f919.

📒 Files selected for processing (1)
  • pkg/addon/controllers/addontemplate/controller.go (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller.go (3)
pkg/common/queue/queuekey.go (1)
  • QueueKeyByMetaNamespaceName (67-70)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
  • GitHub Check: build
🔇 Additional comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

101-114: Wiring looks solid.

Using WithInformersQueueKeysFunc for CMA and WithBareInformers to ensure MCA/AddOnTemplate caches are synced before reconcile is correct for this controller’s model.


169-183: Index registration for ManagedClusterAddonByName confirmed
ManagedClusterAddOns informer in pkg/addon/manager.go registers the ManagedClusterAddonByName index, and IndexManagedClusterAddonByName correctly returns the addon’s Name.

@zhujian7
Copy link
Member Author

zhujian7 commented Sep 8, 2025

@qiujian16 PTAL

}

// Extract AddOnTemplate configReferences from both old and new
oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just filter the mca based on whether it is an addonTemplate based addon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not only filter the addonTemplate-based addon, but also filter only the template that has changes.
Do you mean we do not need to compare whether the template has changed, we enqueue all update events as long as it is template-based?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, why do we only compare template change? If this is only for deleting, should we just filter the one that is template addon and is in deleting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. PTAL

}
},
DeleteFunc: func(obj interface{}) {
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we only need to care template based addon.

Copy link
Member Author

@zhujian7 zhujian7 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the below lines 145-148

zhujian7 and others added 11 commits September 8, 2025 22:27
…eleted

The addon template controller was stopping addon managers immediately when
ClusterManagementAddon was deleted, without waiting for pre-delete jobs
to complete or ManagedClusterAddons to be cleaned up via owner reference
cascading deletion.

This change implements the TODO at line 105 by checking if all
ManagedClusterAddons are deleted before stopping the manager. The controller
now uses field selectors to efficiently query for remaining ManagedClusterAddons
and requeues after 10 seconds if any still exist, allowing time for proper
cleanup.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
Signed-off-by: zhujian <[email protected]>
- Use lister instead of API client for better performance
- Add named constant for requeue delay
- Fix test cache synchronization issues
- Improve test coverage from 74.7% to 75.6%

Addresses review feedback from Qiujian16 and CodeRabbit.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
Add explicit 180s timeout for pre-delete job configmap cleanup.
The default 90s timeout was insufficient for the deletion workflow.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
- Replace utilruntime.HandleError with structured logging in CSR functions
- Add more context to error messages for better debugging
- Use logger.Info for template retrieval errors to provide better visibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
- Replace inefficient list-and-filter with indexed lookup
- Add managedClusterAddonIndexer field to controller struct
- Update comment to accurately describe functionality
- Fix unit tests to properly set up the required index

This addresses the PR review feedback to use the existing index
instead of listing all ManagedClusterAddOns and filtering by name.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
Since we now use managedClusterAddonIndexer for efficient lookup,
the mcaLister field is no longer needed. This cleanup reduces
memory usage and simplifies the controller structure.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
Use managedClusterAddonIndexer.ByIndex() instead of listing all ManagedClusterAddOns
and filtering by name. This provides O(1) indexed lookup instead of O(n) linear scan.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
- Fix closure capture bug in controller test by using captured variables
- Fix typo 'copyiedConfig' to 'copiedConfig' in e2e tests

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
Replace filtered event handling with custom event handlers that only trigger
reconciliation when AddOnTemplate configReferences actually change. This
reduces unnecessary reconciliation cycles by using reflect.DeepEqual to
compare config references between old and new objects.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: zhujian <[email protected]>
@zhujian7 zhujian7 force-pushed the addon-manager-delete branch from 159f919 to 4649d1b Compare September 9, 2025 02:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/addon/controllers/addontemplate/controller.go (1)

115-151: Harden MCA event handlers: handle tombstones and compare order-insensitively.

  • DeleteFunc ignores cache.DeletedFinalStateUnknown; deletes may be missed.
  • reflect.DeepEqual is order-sensitive; benign reordering will spuriously enqueue.

Apply:

@@
-    UpdateFunc: func(oldObj, newObj interface{}) {
+    UpdateFunc: func(oldObj, newObj interface{}) {
       // Only handle configReference updates for AddOnTemplates
       oldMCA, ok := oldObj.(*addonv1alpha1.ManagedClusterAddOn)
       if !ok {
         return
       }
       newMCA, ok := newObj.(*addonv1alpha1.ManagedClusterAddOn)
       if !ok {
         return
       }
 
       // Extract AddOnTemplate configReferences from both old and new
       oldTemplateRefs := extractAddOnTemplateConfigRefs(oldMCA)
       newTemplateRefs := extractAddOnTemplateConfigRefs(newMCA)
 
-      // Only process if AddOnTemplate configReferences changed
-      if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
+      // Normalize order before comparison (avoid spurious diffs from reordering)
+      sort.Slice(oldTemplateRefs, func(i, j int) bool {
+        oi, oj := oldTemplateRefs[i], oldTemplateRefs[j]
+        if oi.ConfigReferent.Namespace != oj.ConfigReferent.Namespace {
+          return oi.ConfigReferent.Namespace < oj.ConfigReferent.Namespace
+        }
+        return oi.ConfigReferent.Name < oj.ConfigReferent.Name
+      })
+      sort.Slice(newTemplateRefs, func(i, j int) bool {
+        ni, nj := newTemplateRefs[i], newTemplateRefs[j]
+        if ni.ConfigReferent.Namespace != nj.ConfigReferent.Namespace {
+          return ni.ConfigReferent.Namespace < nj.ConfigReferent.Namespace
+        }
+        return ni.ConfigReferent.Name < nj.ConfigReferent.Name
+      })
+      if !reflect.DeepEqual(oldTemplateRefs, newTemplateRefs) {
         // Queue the addon name to trigger reconciliation
         c.queue.Add(newMCA.Name)
       }
     },
-    DeleteFunc: func(obj interface{}) {
-      mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
-      if !ok {
-        return
-      }
-
-      // Only process template-based addons
-      templateRefs := extractAddOnTemplateConfigRefs(mca)
-      if len(templateRefs) > 0 {
-        c.queue.Add(mca.Name)
-      }
-    },
+    DeleteFunc: func(obj interface{}) {
+      switch t := obj.(type) {
+      case *addonv1alpha1.ManagedClusterAddOn:
+        if len(extractAddOnTemplateConfigRefs(t)) > 0 {
+          c.queue.Add(t.Name)
+        }
+      case cache.DeletedFinalStateUnknown:
+        if mca, ok := t.Obj.(*addonv1alpha1.ManagedClusterAddOn); ok {
+          if len(extractAddOnTemplateConfigRefs(mca)) > 0 {
+            c.queue.Add(mca.Name)
+          }
+        }
+      default:
+        return
+      }
+    },

And import sort:

 import (
   "context"
-  "reflect"
+  "reflect"
+  "sort"
   "time"
🧹 Nitpick comments (3)
test/e2e/addonmanagement_test.go (1)

834-920: Avoid flakiness and clean up hub-side ConfigMap.

  • Pre-delete flows can be slow; add explicit Eventually timeouts/polling to reduce flakes.
  • The original ConfigMap in the hub namespace isn’t cleaned up by the job; delete it to avoid residue between tests.

Apply:

-        gomega.Eventually(func() error {
+        gomega.Eventually(func() error {
           _, err := spoke.KubeClient.CoreV1().ConfigMaps(addonInstallNamespace).Get(
             context.Background(), configmap.Name, metav1.GetOptions{})
           if err != nil {
             if errors.IsNotFound(err) {
               return nil
             }
             return err
           }
           return fmt.Errorf("the configmap should be deleted")
-        }).ShouldNot(gomega.HaveOccurred())
+        }).WithTimeout(3 * time.Minute).WithPolling(2 * time.Second).ShouldNot(gomega.HaveOccurred())
@@
-        gomega.Eventually(func() error {
+        gomega.Eventually(func() error {
           _, err := hub.AddonClient.AddonV1alpha1().ManagedClusterAddOns(universalClusterName).Get(
             context.TODO(), addOnName, metav1.GetOptions{})
           if err != nil {
             if errors.IsNotFound(err) {
               return nil
             }
             return err
           }
           return fmt.Errorf("the ManagedClusterAddon should be deleted")
-        }).ShouldNot(gomega.HaveOccurred())
+        }).WithTimeout(3 * time.Minute).WithPolling(2 * time.Second).ShouldNot(gomega.HaveOccurred())
@@
-        gomega.Eventually(func() error {
+        gomega.Eventually(func() error {
           _, err := spoke.KubeClient.BatchV1().Jobs(addonInstallNamespace).Get(
             context.Background(), "hello-template-cleanup-configmap", metav1.GetOptions{})
           if err != nil {
             if errors.IsNotFound(err) {
               return nil
             }
             return err
           }
           return fmt.Errorf("the job should be deleted")
-        }).ShouldNot(gomega.HaveOccurred())
+        }).WithTimeout(3 * time.Minute).WithPolling(2 * time.Second).ShouldNot(gomega.HaveOccurred())
+
+        // Best-effort cleanup of the original hub ConfigMap created for this test.
+        _ = hub.KubeClient.CoreV1().ConfigMaps(universalClusterName).
+          Delete(context.TODO(), configmap.Name, metav1.DeleteOptions{})
pkg/addon/controllers/addontemplate/controller.go (1)

311-313: Log errors as errors (and consider retry).

Use logger.Error for ByIndex failures; optional: requeue to retry transient index/list errors.

Apply:

-    logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+    logger.Error(err, "Failed to list ManagedClusterAddOns by index")
+    // optionally: c.queue.AddAfter(addonName, 10*time.Second)
pkg/addon/controllers/addontemplate/controller_test.go (1)

389-417: Consider asserting queueing behavior explicitly.

If we keep “expectedRequeue”, we can assert enqueue by wiring a fake SyncContext or wrapping c.queue with a test queue to observe Add calls. Optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 159f919 and 4649d1b.

📒 Files selected for processing (5)
  • pkg/addon/controllers/addontemplate/controller.go (8 hunks)
  • pkg/addon/controllers/addontemplate/controller_test.go (5 hunks)
  • pkg/addon/templateagent/registration.go (2 hunks)
  • pkg/addon/templateagent/template_agent.go (1 hunks)
  • test/e2e/addonmanagement_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/addon/templateagent/template_agent.go
  • pkg/addon/templateagent/registration.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/addon/controllers/addontemplate/controller_test.go (3)
pkg/addon/index/index.go (2)
  • ManagedClusterAddonByName (19-19)
  • IndexManagedClusterAddonByName (42-50)
pkg/registration/helpers/testing/testinghelpers.go (1)
  • NewManagedClusterAddons (520-529)
pkg/common/testing/fake_sync_context.go (1)
  • NewFakeSyncContext (21-27)
pkg/addon/controllers/addontemplate/controller.go (3)
pkg/common/queue/queuekey.go (1)
  • QueueKeyByMetaNamespaceName (67-70)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
🔇 Additional comments (5)
test/e2e/addonmanagement_test.go (1)

259-266: Fix looks good — correct var and comparison.

Using copiedConfig and comparing Data fields directly is the right fix.

pkg/addon/controllers/addontemplate/controller.go (1)

171-187: Confirm watch-based requeue is sufficient when MCAs remain.

stopUnusedManagers returns without requeue by design; this relies entirely on MCA watch events. With tombstone handling fixed, that should be fine. If external index errors occur, consider AddAfter to avoid indefinite liveness.

Would you like me to add a tiny unit test that emits a DeletedFinalStateUnknown for MCA and asserts the CMA name is enqueued?

pkg/addon/controllers/addontemplate/controller_test.go (3)

156-164: Good: index registered for tests.

Registering ManagedClusterAddonByName ensures ByIndex works in tests.


352-363: Nice capture fix for closures.

Shadowing loop vars avoids the classic capture bug.


365-373: Redundant but fine: indexer registration here too.

Consistent with TestReconcile; keeps tests hermetic.

Comment on lines +84 to +85
managedClusterAddonIndexer: addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer().GetIndexer(),
addonManagers: make(map[string]context.CancelFunc),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Register the ManagedClusterAddonByName index before using ByIndex.

ByIndex will error if the indexer wasn’t registered; in production this will cause stop/start paths to misbehave or leak managers. Tests add the indexer, but the controller doesn’t.

Apply:

 controller := factory.New().
   WithSyncContext(syncCtx).
@@
   WithSync(c.sync).
   ToController("addon-template-controller", recorder)

+ // Ensure the MCA by-name index is registered (required for ByIndex calls).
+ if err := addonInformers.Addon().V1alpha1().ManagedClusterAddOns().
+   Informer().AddIndexers(cache.Indexers{
+     addonindex.ManagedClusterAddonByName: addonindex.IndexManagedClusterAddonByName,
+   }); err != nil {
+   utilruntime.HandleError(err)
+ }

Also applies to: 101-109

🤖 Prompt for AI Agents
In pkg/addon/controllers/addontemplate/controller.go around lines 84-85 (and
similarly around 101-109), the ManagedClusterAddon indexer is used via ByIndex
without ensuring the index was registered; add code during controller
construction to call the indexer's AddIndexers to register the
ManagedClusterAddonByName indexer (provide a function that returns the addon
name key from the ManagedClusterAddOn object or its metadata), do this
registration before any call to ByIndex, and handle and log/return any error
from AddIndexers so the controller fails fast if registration fails.

… controller"

This reverts commit 4649d1b.

Signed-off-by: zhujian <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

135-141: Consider a time-based requeue as a safety net.

Relying solely on MCA watch events risks leaking a manager if an event is missed; the PR summary mentioned a 10s requeue. Add a light requeue when MCAs still exist.

Apply:

-    logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
-      "addonName", addOnName, "count", len(addons))
-    // No need to requeue since we now watch ManagedClusterAddon events
-    return nil
+    logger.Info("ManagedClusterAddOn still exists, waiting for deletion",
+      "addonName", addOnName, "count", len(addons))
+    syncCtx.Queue().AddAfter(addOnName, 10*time.Second)
+    return nil

Was dropping the timed requeue intentional given the PR description states a 10s requeue?


264-267: Log indexer failures as errors, not info.

Indexer errors indicate misconfiguration or cache issues; promote to Error for visibility.

Apply:

-  if err != nil {
-    logger.Info("Failed to list ManagedClusterAddOns by index", "error", err)
+  if err != nil {
+    logger.Error(err, "ByIndex ManagedClusterAddOns failed", "addonName", addonName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4649d1b and 171ebbc.

📒 Files selected for processing (1)
  • pkg/addon/controllers/addontemplate/controller.go (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/addon/controllers/addontemplate/controller.go (3)
pkg/common/queue/queuekey.go (2)
  • QueueKeyByMetaNamespaceName (67-70)
  • QueueKeyByMetaName (57-60)
pkg/addon/index/index.go (1)
  • ManagedClusterAddonByName (19-19)
pkg/addon/templateagent/common.go (1)
  • SupportAddOnTemplate (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration
  • GitHub Check: unit
  • GitHub Check: verify
  • GitHub Check: build
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e
  • GitHub Check: e2e-hosted
🔇 Additional comments (2)
pkg/addon/controllers/addontemplate/controller.go (2)

228-230: LGTM: type-correct getValuesClosure signature.

Switching addon param to *addonv1alpha1.ManagedClusterAddOn aligns with surrounding code.


87-96: No additional index registration needed in addontemplate controller
The ManagedClusterAddOns index is already registered once in pkg/addon/manager.go (lines 113–116) before any controllers run, so adding it again in controller.go is redundant.

Likely an incorrect or invalid review comment.

Comment on lines +97 to +114
WithFilteredEventsInformersQueueKeysFunc(
queue.QueueKeyByMetaName,
func(obj interface{}) bool {
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return false
}

// Only process ManagedClusterAddOns that reference AddOnTemplates
for _, configRef := range mca.Status.ConfigReferences {
if configRef.ConfigGroupResource.Group == "addon.open-cluster-management.io" &&
configRef.ConfigGroupResource.Resource == "addontemplates" {
return true
}
}
return false
},
addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer()).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle tombstones in the filter predicate and use constants instead of hard-coded strings.

Delete events can arrive as cache.DeletedFinalStateUnknown; current cast drops those. Also prefer utils.AddOnTemplateGVR to avoid drift.

Apply:

-        queue.QueueKeyByMetaName,
-        func(obj interface{}) bool {
-          mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
-          if !ok {
-            return false
-          }
-
-          // Only process ManagedClusterAddOns that reference AddOnTemplates
-          for _, configRef := range mca.Status.ConfigReferences {
-            if configRef.ConfigGroupResource.Group == "addon.open-cluster-management.io" &&
-              configRef.ConfigGroupResource.Resource == "addontemplates" {
-              return true
-            }
-          }
-          return false
-        },
+        queue.QueueKeyByMetaName,
+        func(obj interface{}) bool {
+          // unwrap tombstone for delete events
+          if ts, ok := obj.(cache.DeletedFinalStateUnknown); ok {
+            obj = ts.Obj
+          }
+          mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
+          if !ok {
+            return false
+          }
+          // Only process MCAs that reference AddOnTemplates
+          for _, ref := range mca.Status.ConfigReferences {
+            if ref.ConfigGroupResource.Group == utils.AddOnTemplateGVR.Group &&
+               ref.ConfigGroupResource.Resource == utils.AddOnTemplateGVR.Resource {
+              return true
+            }
+          }
+          return false
+        },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
WithFilteredEventsInformersQueueKeysFunc(
queue.QueueKeyByMetaName,
func(obj interface{}) bool {
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return false
}
// Only process ManagedClusterAddOns that reference AddOnTemplates
for _, configRef := range mca.Status.ConfigReferences {
if configRef.ConfigGroupResource.Group == "addon.open-cluster-management.io" &&
configRef.ConfigGroupResource.Resource == "addontemplates" {
return true
}
}
return false
},
addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer()).
WithFilteredEventsInformersQueueKeysFunc(
queue.QueueKeyByMetaName,
func(obj interface{}) bool {
// unwrap tombstone for delete events
if ts, ok := obj.(cache.DeletedFinalStateUnknown); ok {
obj = ts.Obj
}
mca, ok := obj.(*addonv1alpha1.ManagedClusterAddOn)
if !ok {
return false
}
// Only process MCAs that reference AddOnTemplates
for _, ref := range mca.Status.ConfigReferences {
if ref.ConfigGroupResource.Group == utils.AddOnTemplateGVR.Group &&
ref.ConfigGroupResource.Resource == utils.AddOnTemplateGVR.Resource {
return true
}
}
return false
},
addonInformers.Addon().V1alpha1().ManagedClusterAddOns().Informer()).
🤖 Prompt for AI Agents
In pkg/addon/controllers/addontemplate/controller.go around lines 97 to 114, the
filter predicate currently casts obj directly to
*addonv1alpha1.ManagedClusterAddOn which drops tombstone-wrapped delete events
(cache.DeletedFinalStateUnknown) and uses hard-coded group/resource strings;
update the predicate to unwrap cache.DeletedFinalStateUnknown by extracting the
tombstone.Obj when needed and then assert the target type, and replace the
literal "addon.open-cluster-management.io"/"addontemplates" checks with the
shared constant (e.g., utils.AddOnTemplateGVR or the project’s AddOnTemplateGVR)
comparing Group and Resource against those constant values so delete events are
handled and the GVR strings are not hard-coded.

@qiujian16
Copy link
Member

/approve

Copy link
Contributor

openshift-ci bot commented Sep 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiujian16, zhujian7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haoqing0110
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Sep 10, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit b506d16 into open-cluster-management-io:main Sep 10, 2025
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ManagedClusterAddons are not removed when ClusterManagementAddon is deleted
3 participants