Skip to content

Conversation

@panslava
Copy link
Contributor

@panslava panslava commented Sep 3, 2025

  • Centralize informer lifecycle with InformerSet for simpler, shared wiring.
  • Harden ProviderConfigControllersManager:
    • Add NEG cleanup finalizer on start; roll back on failures.
    • Idempotent, concurrency-safe start; stop closes channel and removes finalizer using latest object.
    • Introduce startNEGController test seam; unexport controllerSet; wrap errors; add docs.
  • Add README explaining multiproject structure and flow.
  • Tests: pkg/multiproject/manager/manager_test.go covers finalizer add/rollback, idempotent start, stop behavior, and concurrent single-shot.
  • No happy-path behavior change; improves robustness and testability.

/assign @swetharepakula

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 3, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 3, 2025
@panslava panslava force-pushed the improve-multiproject branch 2 times, most recently from 69aec65 to 438d992 Compare September 3, 2025 13:21
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2025
@panslava panslava force-pushed the improve-multiproject branch from 72a34fc to 2eb739c Compare September 7, 2025 10:30
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2025
@panslava panslava force-pushed the improve-multiproject branch from 2eb739c to 3f6e8d2 Compare October 10, 2025 02:25
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2025
@panslava panslava force-pushed the improve-multiproject branch from 3f6e8d2 to 291e886 Compare October 20, 2025 11:47
pcLatest, err := pccm.providerConfigClient.CloudV1().ProviderConfigs().Get(context.Background(), pc.Name, metav1.GetOptions{})
if err != nil {
pcCopy := pc.DeepCopy()
pcCopy.Finalizers = append(pcCopy.Finalizers, finalizer.ProviderConfigNEGCleanupFinalizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should optionally add this finalizer if it doesn't exist.
Is this to cover the case our cache hasn't updated to after us putting it on ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a bit overengineering, but this function is only used to get PC on deletion

Idea is -- if by some case "get" failed (idk, some transient error), we don't want to leave PC hanging and still want to try to remove finalizer, so that's the logic besides this function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should instead throw an error in this situation. Is there a reason we don't think the version in the cache is not up to date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree actually, removed this, inlined getting providerconfig

@swetharepakula swetharepakula mentioned this pull request Nov 26, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2025
Replace factory-based informer creation with centralized InformerSet that:
- Creates base SharedIndexInformers directly without factories
- Manages lifecycle and synchronization in one place
- Provides filtered views for ProviderConfig-specific controllers
- Reduces boilerplate and improves maintainability

Using informers directly without factories simplifies the logic and
eliminates potential mistakes from unnecessary factory usage, such as
cidentally creating duplicate informers or incorrect factory scoping.
- Add manager_test.go covering:
  - Start adds NEG cleanup finalizer and is idempotent
  - Start failure and GCE client creation failure roll back finalizer
  - Stop closes controller channel and removes finalizer
  - Concurrent starts for same ProviderConfig are single-shot
- Improve manager.go:
  - Introduce test seam via package var startNEGController
  - Ensure finalizer is added before start; roll back on any startup error
  - Delete finalizer using latest ProviderConfig from API to avoid stale updates
  - Wrap errors with %w and add GoDoc comments
  - Rename exported ControllerSet to unexported controllerSet
- No expected behavior change in the happy path; robustness and testability improved.
@panslava panslava force-pushed the improve-multiproject branch from 291e886 to ea200f4 Compare December 2, 2025 04:11
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025
- Fix bug: preserve pre-existing finalizer on startup failure
- Expand informerset tests to verify multiple informers sync
- Add tests for controller restart with existing finalizer
- Add comment explaining closeProvider bool in neg_test
@panslava panslava force-pushed the improve-multiproject branch from ea200f4 to ce5c037 Compare December 2, 2025 04:12
@panslava
Copy link
Contributor Author

panslava commented Dec 2, 2025

Ready for re-review @swetharepakula

pcLatest, err := pccm.providerConfigClient.CloudV1().ProviderConfigs().Get(context.Background(), pc.Name, metav1.GetOptions{})
if err != nil {
pcCopy := pc.DeepCopy()
pcCopy.Finalizers = append(pcCopy.Finalizers, finalizer.ProviderConfigNEGCleanupFinalizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should instead throw an error in this situation. Is there a reason we don't think the version in the cache is not up to date?

// happens in that window, cleanup could be skipped. We roll this back on
// any subsequent startup error.
// any subsequent startup error only if we added it.
err := finalizer.EnsureProviderConfigNEGCleanupFinalizer(pc, pccm.providerConfigClient, logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to right before we start the controller, so we ensure all clients are up. This way we don't have to remove because GCE client didn't initialize.

Copy link
Contributor Author

@panslava panslava Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Create GCE client before adding finalizer (no rollback needed on fail)
- Return errors from StopControllersForProviderConfig for requeue
- Remove silent fallback in latestPCWithCleanupFinalizer
@panslava
Copy link
Contributor Author

panslava commented Dec 4, 2025

Ready for re-review @swetharepakula

Copy link
Member

@swetharepakula swetharepakula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit and this PR lgtm. Thanks!

Copy link
Member

@swetharepakula swetharepakula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: panslava, swetharepakula

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [panslava,swetharepakula]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit acc1f4e into kubernetes:master Dec 5, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants