Skip to content

Conversation

@mohit-pal-kohli
Copy link

@mohit-pal-kohli mohit-pal-kohli commented Nov 18, 2025

Part 1 of implementing explicit wildcard support for xds-control-plane.

Context

We leverage the Snapshot Cache in go-control-plane, where delta for a node ID snapshot is proliferated to Envoys with that node ID when there is a change. This is acceptable in the current state where Fabric Proxy declares the destinations (clusters) it needs at start time in YAML.

The introduction of ODCDS makes the current state unpalatable, even though it is functional. The issue is that we do not want a mass proliferation of new clusters to all Envoys with node ID X when one pod with that node ID X requests a resource. Not solving this could create significant load on our control plane, especially for Fabric Proxy users with hundreds or thousands of destinations.

To solve this, we should add support for explicit wildcard support for control plane. By default, a snapshot resource will be considered wildcard and proliferated to all Envoys with the relevant node ID. By contrast, we will allow certain resources to not be wildcard, allowing ODCDS clusters to only be proliferated on demand.

Implementation details

This PR aims to make changes in a way that are fully backward compatible for Delta, SoTW, and legacyWildcardEnabled.

Exported functions

NewSnapshotWithExplicitWildcard: creates a Snapshot where each resource explicitly specifies whether it should be returned as part of wildcard requests.

This allows the snapshot to contain wildcard metadata for every resource. E.g.

cache.NewSnapshotWithExplicitWildcard("v1", map[rsrc.Type][]cache.SnapshotResource{
    rsrc.ClusterType: {
        {Resource: types.ResourceWithTTL{Resource: serviceA}, Wildcard: true},
        {Resource: types.ResourceWithTTL{Resource: serviceB}, Wildcard: true},
        {Resource: types.ResourceWithTTL{Resource: analytics}, Wildcard: false},  // On-demand
        {Resource: types.ResourceWithTTL{Resource: logging}, Wildcard: false},    // On-demand
    },
})

The following are public accessors for snapshots to specifically get resources marked as wildcard:

  • GetWildcardResources
  • GetWildcardResourcesAndTTL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nil check above prevents sending empty responses when filtering returns no matching
resources for non-wildcard subscriptions, preserving the xDS "watch stays open" behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely be changed to "if non-wildcard" rather than "if has resources"

@@ -598,10 +608,10 @@ func TestAvertPanicForWatchOnNonExistentSnapshot(t *testing.T) {
// Create watch.
req := &cache.Request{
Node: &core.Node{Id: "test"},
ResourceNames: []string{"rtds"},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed test bug: resource name in request didn't match snapshot ("rtds" vs "one-second").
New filtering logic correctly returns nil for non-existent resources, exposing the mismatch.

@mohit-pal-kohli mohit-pal-kohli marked this pull request as ready for review November 20, 2025 23:17
@mohit-pal-kohli mohit-pal-kohli self-assigned this Nov 20, 2025
@mohit-pal-kohli mohit-pal-kohli requested review from a team November 20, 2025 23:18
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure why these tests needed this mock struct. The test can be accomplished without, though I was hesitant to make an out-of-scope change

Copy link
Collaborator

@valerian-roche valerian-roche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe overall the code might need a bit more refactor on the respond vs. createResponse functions, as well as within createDeltaResponse at this stage
The alternative would be to fully redesign as the linear cache but might be out of scope here

@@ -54,6 +72,47 @@ func NewSnapshot(version string, resources map[resource.Type][]types.Resource) (
return &out, nil
}

// NewSnapshotWithExplicitWildcard creates a Snapshot where each resource
// explicitly specifies whether it should be returned as part of wildcard requests.
func NewSnapshotWithExplicitWildcard(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: given the functional implication, I'm wondering if we should rename this "default" instead
Wildcard is the technical name within the xDS specification, but it does not necessarily mean much in the context of users

@@ -54,6 +72,47 @@ func NewSnapshot(version string, resources map[resource.Type][]types.Resource) (
return &out, nil
}

// NewSnapshotWithExplicitWildcard creates a Snapshot where each resource
// explicitly specifies whether it should be returned as part of wildcard requests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify that it is mainly for OdCDS currently?

out.Resources[index] = NewResourcesWithTTL(version, rawResources)

// Only store wildcard map entry if there are resources of this type
// An empty set means no resources are wildcard for this type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the functional interpretation of this? Is it that the type uses all resources for wildcard or none?


// Only store wildcard map entry if there are resources of this type
// An empty set means no resources are wildcard for this type
if len(snapshotResources) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed? If no resources exist it's fine to set an empty set

// IsResourceWildcard checks if a specific resource should be returned for wildcard requests.
// If wildcardMap is nil, all resources are considered wildcard-eligible (backward compatibility).
// If wildcardMap is not nil, only resources explicitly marked as wildcard are eligible.
func (s *Snapshot) IsResourceWildcard(typeURL resource.Type, resourceName string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: maybe ShouldReturnOnWildcard or WatchedByDefault

@@ -195,7 +248,7 @@ func (cache *snapshotCache) sendHeartbeats(ctx context.Context, node string) {
for id, watch := range info.watches {
// Respond with the current version regardless of whether the version has changed.
version := snapshot.GetVersion(watch.Request.GetTypeUrl())
resources := snapshot.GetResourcesAndTTL(watch.Request.GetTypeUrl())
resources := getResourcesAndTTLForSubscription(snapshot, watch.Request.GetTypeUrl(), watch.subscription.SubscribedResources(), watch.subscription.IsWildcard())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: why not just pass in the watch or subscription directly?

// for instance if the subscription is newly wildcard.
for r := range snapshot.GetResources(request.GetTypeUrl()) {
for r := range snapshot.GetWildcardResources(request.GetTypeUrl()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must also assess non-wildcard resources as above.
Likely would better be rewritten as always checking "subscribed" and also checking wildcard when applicable (instead of if/else)
Also shows there might be a missing test

@@ -485,6 +538,10 @@ func (cache *snapshotCache) cancelWatch(nodeID string, watchID int64) func() {
// Respond to a watch with the snapshot value. The value channel should have capacity not to block.
// TODO(kuat) do not respond always, see issue https://github.com/envoyproxy/go-control-plane/issues/46
func (cache *snapshotCache) respond(ctx context.Context, watch ResponseWatch, resources map[string]types.ResourceWithTTL, version string, heartbeat bool) error {
if resources == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not exit on initial query for wildcard, even if nothing is to be returned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely be changed to "if non-wildcard" rather than "if has resources"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard support should be added here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants