Skip to content

Conversation

@levan-m
Copy link
Contributor

@levan-m levan-m commented Dec 26, 2025

What does this PR do?

Ultimate goal of this change is to move Reconcile and Cleanup to ComponentRegistry and remove those from the ComponentReconciler interface. Assumption is that these two should be almost same for all components (except maybe Agent). Two hooks are still necessary: 1) for cleaning up DCA RBAC 2) deleting CLC if DCA is disabled. These are added to the interface.

ComponentReconciler/Registry refactor following approach in #2380.

  • eb5b177 carries over changes from Levan m/dca ccr reconciler refactor #2380. Mostly affect DCA, CLC components; also adds controller_reconcile_deployment_test.go to assert on existing behavior.
  • e66e910 adds DDAI support in controller_reconcile_deployment_test.go but disables as tests fail. There seems some variation between DDA/DDAI components, hence this change can't be drop-in replacement of DDAI components.
  • 3ea55d3 function signature changes to align DCA and CLC components.
  • b96dc88 This should be only functional change in the PR. splits cleanupV2*** functions in two parts: 1) deleting deployment 2) updating status. Regular reconcile flow executes both, cleanupOld*** only deployment deletion. Reasoning is that deleting old/stale deployments after rename shouldn't cleanup status on DDA.
  • 1f6f9e6 move deployment deletion part to controller_reconcile_v2_helpers.go as this is not component specific any more.
  • 04a98b6 adds function getters to interface so they can be used in ComponentRegistry, some minor cleanup.
  • f9eb064 moves ComponentReconcilerReconcile() from components to a common one in ComponentRegistry, removes interface function.
  • 17063f8 does same for Cleanup.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

Write there any instructions and details you may have to test your PR.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@levan-m levan-m modified the milestones: v1.23.0, v1.22.0 Dec 26, 2025
@levan-m levan-m force-pushed the levan-m/component-reconcile-refactor branch from 57dcb1e to 17063f8 Compare December 26, 2025 19:16
@codecov-commenter
Copy link

codecov-commenter commented Dec 26, 2025

Codecov Report

❌ Patch coverage is 73.61111% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.10%. Comparing base (13a04a6) to head (0d40aad).

Files with missing lines Patch % Lines
...al/controller/datadogagent/component_reconciler.go 68.00% 9 Missing and 7 partials ⚠️
...er/datadogagent/controller_reconcile_v2_helpers.go 76.00% 6 Missing and 6 partials ⚠️
pkg/testutils/builder.go 0.00% 5 Missing ⚠️
...ller/datadogagent/component_clusterchecksrunner.go 90.00% 1 Missing and 1 partial ⚠️
...adogagentinternal/component_clusterchecksrunner.go 0.00% 2 Missing ⚠️
...adogagent/component/clusterchecksrunner/default.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2434      +/-   ##
==========================================
+ Coverage   38.06%   38.10%   +0.03%     
==========================================
  Files         290      290              
  Lines       24760    24750      -10     
==========================================
+ Hits         9425     9431       +6     
+ Misses      14620    14609      -11     
+ Partials      715      710       -5     
Flag Coverage Δ
unittests 38.10% <73.61%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../controller/datadogagent/component_clusteragent.go 82.35% <100.00%> (+10.74%) ⬆️
...adogagent/component/clusterchecksrunner/default.go 10.52% <0.00%> (ø)
...ller/datadogagent/component_clusterchecksrunner.go 93.75% <90.00%> (+15.37%) ⬆️
...adogagentinternal/component_clusterchecksrunner.go 40.54% <0.00%> (ø)
pkg/testutils/builder.go 0.00% <0.00%> (ø)
...er/datadogagent/controller_reconcile_v2_helpers.go 62.01% <76.00%> (+5.85%) ⬆️
...al/controller/datadogagent/component_reconciler.go 71.95% <68.00%> (-5.25%) ⬇️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13a04a6...0d40aad. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@levan-m levan-m marked this pull request as ready for review December 26, 2025 19:46
@levan-m levan-m requested a review from a team as a code owner December 26, 2025 19:46
Copy link
Member

@tbavelier tbavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the implementation in the DDAI code path, which was not changed to follow the same pattern.
The bug I described below only applies in the DDA code path

condition.UpdateDatadogAgentStatusConditions(
params.Status,
now,
metav1.NewTime(time.Now()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended instead of using the function-level now ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an issue here: if the component override struct is not empty but not explicitly disabled, we reconcile a component that should not be possibly (CCR is not enabled by default).

  features:
    clusterChecks:
      enabled: true
      useClusterChecksRunners: false
  override:
    clusterChecksRunner:
      disabled: true
      containers:
        agent:
          resources:
            requests: 256m

-> we get a CCR deployment (no pods scheduled cuz missing dependencies but the component shouldnt be reconciled at all)

We could early exit instead of further reconciling:

	// Explicit override disable always wins (and may create a conflict condition if the component is otherwise enabled).
	if ok && apiutils.BoolValue(componentOverride.Disabled) {
		if componentEnabled {
			// The override supersedes what's set in requiredComponents; update status to reflect the conflict
			condition.UpdateDatadogAgentStatusConditions(
				params.Status,
				now,
				common.OverrideReconcileConflictConditionType,
				metav1.ConditionTrue,
				"OverrideConflict",
				fmt.Sprintf("%s component is set to disabled", component.Name()),
				true,
			)
		}
		return r.Cleanup(ctx, params, component)
	}

	// If the component isn't enabled, we should cleanup regardless of whether an override struct exists.
	// (Overrides should not implicitly enable a component.)
	if !componentEnabled {
		return r.Cleanup(ctx, params, component)
	}

	// Apply non-disable overrides.
	if ok {
		override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
		override.Deployment(deployment, componentOverride)
	}

podManagers := feature.NewPodTemplateManagers(&deployment.Spec.Template)

// Set Global setting on the default deployment
component.GetGlobalSettingsFunc()(params.Logger, podManagers, params.DDA.GetObjectMeta(), &params.DDA.Spec, params.ResourceManagers, params.RequiredComponents)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
component.GetGlobalSettingsFunc()(params.Logger, podManagers, params.DDA.GetObjectMeta(), &params.DDA.Spec, params.ResourceManagers, params.RequiredComponents)
component.GetGlobalSettingsFunc()(deploymentLogger, podManagers, params.DDA.GetObjectMeta(), &params.DDA.Spec, params.ResourceManagers, params.RequiredComponents)

continue
return r.Cleanup(ctx, params, component)
}
override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
override.PodTemplateSpec(deploymentLogger, podManagers, componentOverride, component.Name(), params.DDA.Name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants