Skip to content

Conversation

@fanny-jiang
Copy link
Contributor

@fanny-jiang fanny-jiang commented Jan 7, 2026

What does this PR do?

Adds DCA service clusterIP checksum hash to the helm-migrated daemonset so that if the DCA service clusterIP changes (i.e. during a datadog helm release uninstallation), the hash will update and trigger a rolling update on the daemonset so that agent pods will be able to communicate with DCA at the new DCA service's clusterIP.

Motivation

During a Helm -> Operator agent workload migration, one of the steps is to uninstall the datadog helm chart (after installing the operator chart).

Before migration: Helm manages all agent resources, including the DCA service, which is assigned a ClusterIP by K8s. At pod creation-time, agent pods are injected static environment variables that instruct them to connect to this clusterIP in order to communicate with the DCA:

DATADOG_CLUSTER_AGENT_SERVICE_HOST=10.96.0.100
DATADOG_CLUSTER_AGENT_SERVICE_PORT=5005

After migration & uninstall Datadog chart: Helm-managed DCA service is deleted, but agent daemonset is preserved and agent pods still reference old clusterIP defined in the helm-managed service

After migration & new operator pod assumes leader role: Operator creates a new DCA service and K8s assigns a new ClusterIP, but agent pods are still trying to connect to the clusterIP from the OLD deleted DCA service.

By setting a checksum hash of the DCA service's clusterIP, a change in the clusterIP will trigger a DS/EDS rollout to use the new service's cluster IP.

Additional Notes

This change does result in an additional rollingUpdate during the migration process:

  1. When the DS is first migrated from helm-managed to operator-managed and the new clusterIP hash annotation is added
  2. Additional rollingUpdate: When the datadog helm release is uninstalled and deletes the DCA service

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  • Deploy operator with this change and a DDA
  • Note the ClusterIP on the current cluster-agent's service: kubectl get service datadog-cluster-agent
  • Check that the current daemonset does NOT have the annotation key agent.datadoghq.com/dca-service-clusterip-hash
  • Add annotation to the DDA at metadata.annotations: agent.datadoghq.com/helm-migration: "true" and deploy the DDA (this simulates a migration-enabled DDA)
  • Check that the daemonset is now updated with the annotation key agent.datadoghq.com/dca-service-clusterip-hash. There should now be a rollingUpdate triggered on the agent pods.
  • Delete the cluster agent service (kubectl delete service datadog-cluster-agent)
  • After the operator recreates the cluster agent service, check the service's Cluster IP and the checksum hash on the DS. They should each be different from before
  • The agent pods should now be performing another rollingUpdate
  • Pods should reach healthy status with no restarts after the rollingUpdate is complete

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@fanny-jiang fanny-jiang added this to the v1.22.0 milestone Jan 7, 2026
@fanny-jiang fanny-jiang requested a review from a team as a code owner January 7, 2026 15:41
@fanny-jiang fanny-jiang added the enhancement New feature or request label Jan 7, 2026
@fanny-jiang fanny-jiang changed the title Add service clusterIP checksum hash to migrated daemonsets Add DCA service clusterIP checksum annotation to migrated daemonsets Jan 7, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.30%. Comparing base (517b3d1) to head (2f70e18).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...troller/datadogagent/controller_reconcile_agent.go 80.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2467      +/-   ##
==========================================
+ Coverage   37.28%   37.30%   +0.02%     
==========================================
  Files         290      290              
  Lines       24708    25085     +377     
==========================================
+ Hits         9212     9358     +146     
- Misses      14783    14999     +216     
- Partials      713      728      +15     
Flag Coverage Δ
unittests 37.30% <81.81%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...er/datadogagent/controller_reconcile_v2_helpers.go 55.81% <100.00%> (ø)
...troller/datadogagent/controller_reconcile_agent.go 63.57% <80.00%> (+1.16%) ⬆️

... and 6 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 517b3d1...2f70e18. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fanny-jiang fanny-jiang closed this Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants