Skip to content

Conversation

elgnay
Copy link
Contributor

@elgnay elgnay commented Sep 1, 2025

Summary

Related issue(s)

Fixes #

Summary by CodeRabbit

  • Bug Fixes

    • Safeguards shared addon namespace from accidental removal in multi-agent environments; it’s deleted only when no other agent namespaces remain.
    • Ensures the klusterlet’s own agent namespace is always deleted during cleanup.
    • Improves error handling when discovering agent namespaces, halting cleanup on discovery errors.
  • Tests

    • Added tests covering multiple/no-agent scenarios to verify cleanup decisions and shared-resource protection.

Copy link

coderabbitai bot commented Sep 1, 2025

Walkthrough

Adds a helper to detect other active klusterlet agent namespaces and uses it in managed reconcile to conditionally skip deleting the shared addon namespace; also adds tests verifying the helper and cleanup behavior across multiple namespace/label scenarios.

Changes

Cohort / File(s) Summary
Managed reconcile logic update
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go
Added method hasActiveKlusterletAgentNamespaces(ctx, currentAgentNamespace) on managedReconcile; call it during cleanup to decide whether to include the shared addon namespace for deletion; propagate errors from namespace listing.
Test coverage additions
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go
Added TestHasActiveKlusterletAgentNamespaces (various namespace/label scenarios) and TestCleanWithMultipleKlusterletAgentNamespaces (verify addon and agent namespace deletion behavior using fake kube client and deletion timestamps).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (1)

188-197: Wrap the propagated error with %w for proper error chaining.

Use %w instead of %v to preserve the original error in the chain.

-            return klusterlet, reconcileStop, fmt.Errorf("failed to check for active klusterlet agent namespaces: %v", err)
+            return klusterlet, reconcileStop, fmt.Errorf("failed to check for active klusterlet agent namespaces: %w", err)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go (1)

1625-1771: Add a case for “other namespace is terminating” and prefer early-fail assertions.

  • Include a test where the “other” klusterlet namespace has DeletionTimestamp set; expected: helper returns false.
  • Consider testify/require for shorter, fail-fast assertions.

I can push a test case snippet if you’d like.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d70dd30 and f55d55d.

📒 Files selected for processing (2)
  • pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go (1 hunks)
  • pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (1)
pkg/operator/helpers/helpers.go (2)
  • KlusterletNamespace (603-610)
  • DefaultAddonNamespace (55-55)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go (2)
pkg/operator/helpers/helpers.go (1)
  • DefaultAddonNamespace (55-55)
pkg/common/testing/fake_sync_context.go (1)
  • NewFakeSyncContext (21-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: unit
  • GitHub Check: build
  • GitHub Check: e2e-hosted
  • GitHub Check: integration
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e
🔇 Additional comments (1)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go (1)

1773-1891: LGTM: cleanup behavior covered for both branches.

Good verification that the addon namespace is preserved only when other agents exist and that the agent namespace is always deleted.

Copy link

codecov bot commented Sep 1, 2025

Codecov Report

❌ Patch coverage is 70.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.23%. Comparing base (d70dd30) to head (7383e26).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...usterletcontroller/klusterlet_managed_reconcile.go 70.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1153      +/-   ##
==========================================
+ Coverage   57.94%   58.23%   +0.28%     
==========================================
  Files         211      211              
  Lines       20765    20809      +44     
==========================================
+ Hits        12033    12118      +85     
+ Misses       7670     7624      -46     
- Partials     1062     1067       +5     
Flag Coverage Δ
unit 58.23% <70.00%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@elgnay elgnay force-pushed the fix-addon-ns-deletion branch from f55d55d to 0ae6d1f Compare September 1, 2025 06:09
@elgnay
Copy link
Contributor Author

elgnay commented Sep 1, 2025

/hold


// Count namespaces other than the one being deleted
activeCount := 0
for _, ns := range namespaces.Items {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems for loop is not needed? should we check
if (namespaces[0].Name != currentAgentNamespace && len(namespaces.Items) == 1) || len(namespaces.Items) >1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@elgnay elgnay force-pushed the fix-addon-ns-deletion branch from 0ae6d1f to 7383e26 Compare September 1, 2025 09:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (1)

242-261: Ignore terminating namespaces, locally verify labels (fake client), early-return, and wrap errors.

Current length-based check can treat terminating namespaces as “active,” and relying solely on server-side selectors can miscount under fake clients. Iterate, skip terminating/self, defensively check the label, and return early; also wrap errors.

 func (r *managedReconcile) hasActiveKlusterletAgentNamespaces(ctx context.Context, currentAgentNamespace string) (bool, error) {
-	// Look for namespaces with klusterlet labels
-	namespaces, err := r.managedClusterClients.kubeClient.CoreV1().Namespaces().List(ctx, metav1.ListOptions{
-		LabelSelector: klusterletNamespaceLabelKey,
-	})
-	if err != nil {
-		return false, fmt.Errorf("failed to list klusterlet agent namespaces: %v", err)
-	}
-
-	// check if there exist namespaces other than the one being deleted
-	if len(namespaces.Items) > 1 || len(namespaces.Items) == 1 && namespaces.Items[0].Name != currentAgentNamespace {
-		return true, nil
-	}
-
-	return false, nil
+	// Prefer server-side label filtering for real clusters; still locally verify for fake clients.
+	namespaces, err := r.managedClusterClients.kubeClient.CoreV1().Namespaces().List(ctx, metav1.ListOptions{
+		LabelSelector: klusterletNamespaceLabelKey, // exists(key) selector
+	})
+	if err != nil {
+		return false, fmt.Errorf("failed to list klusterlet agent namespaces: %w", err)
+	}
+	for i := range namespaces.Items {
+		ns := &namespaces.Items[i]
+		if ns.Name == currentAgentNamespace {
+			continue
+		}
+		if ns.DeletionTimestamp != nil {
+			continue // ignore terminating
+		}
+		// Defensive: fake clients may not enforce label selectors; ensure the label exists.
+		if ns.Labels == nil {
+			continue
+		}
+		if _, ok := ns.Labels[klusterletNamespaceLabelKey]; !ok {
+			continue
+		}
+		return true, nil
+	}
+	return false, nil
 }

Verification (optional):

#!/bin/bash
# Ensure tests cover terminating namespaces and label presence cases.
rg -nP -C3 --type=go 'hasActiveKlusterletAgentNamespaces\('
rg -nP -C2 --type=go 'DeletionTimestamp'
rg -nP -C2 --type=go 'Labels\[klusterletNamespaceLabelKey\]'
🧹 Nitpick comments (1)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (1)

188-197: Wrap the propagated error with %w.

Flow looks good. Use %w to preserve the original error for callers/tools.

-            return klusterlet, reconcileStop, fmt.Errorf("failed to check for active klusterlet agent namespaces: %v", err)
+            return klusterlet, reconcileStop, fmt.Errorf("failed to check for active klusterlet agent namespaces: %w", err)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ae6d1f and 7383e26.

📒 Files selected for processing (2)
  • pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go (1 hunks)
  • pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_controller_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/operator/operators/klusterlet/controllers/klusterletcontroller/klusterlet_managed_reconcile.go (1)
pkg/operator/helpers/helpers.go (2)
  • KlusterletNamespace (603-610)
  • DefaultAddonNamespace (55-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
  • GitHub Check: build

@qiujian16
Copy link
Member

/approve
/lgtm

Copy link
Contributor

openshift-ci bot commented Sep 1, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elgnay, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Sep 1, 2025
@elgnay elgnay closed this Sep 2, 2025
@elgnay elgnay reopened this Sep 2, 2025
@qiujian16
Copy link
Member

@elgnay are we ok to merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants