feat(flowcontrol): Implement ShardProcessor engine #1203

LukeAVanDrie · 2025-07-22T01:58:52Z

This pull request introduces the ShardProcessor, the core data plane engine of the FlowController. This is the heart of the flow control system, making it capable of processing, queuing, and dispatching requests.

This PR is part of a stack.

PR refactor(flowcontrol): Enable behavioral mocking #1202 : Behavioral mocking
This PR: Implement ShardProcessor

Incremental diff against #1202: LukeAVanDrie/gateway-api-inference-extension@refactor/flow-control-mocks...feat/flow-control-controller-shard-processor

Each ShardProcessor is paired one-to-one with a contracts.RegistryShard and is responsible for all request lifecycle operations on that shard. It acts as the concurrent worker that executes against the state provided by the shard, orchestrating policies and managing requests from initial enqueue to a final outcome (dispatch, eviction, or rejection).

Key Changes

ShardProcessor Implementation:
- Introduces the main Run loop, which uses a select statement to interleave enqueueing new requests with dispatching existing ones. This balances responsiveness to new arrivals with draining the existing backlog.
- Implements the core dispatchCycle, which iterates through priority bands and applies policies to select an item for dispatch.
- Adds a background runExpiryCleanup goroutine that periodically scans all queues for expired items (due to TTL or context cancellation).
Robust Concurrency and State Management:
- The processor's core logic is built on a single-goroutine ownership model. All queue write operations (enqueue) are funneled through a single channel, which makes the critical "check-then-act" capacity logic safe from race conditions without requiring complex locking.
- The internal flowItem uses sync.Once for its finalize method. This guarantees idempotent, exactly-once finalization and deterministically resolves the critical race condition between the dispatch and expiry cleanup loops.
Resilient Error Handling:
- A two-tiered error handling strategy is introduced using the errInterFlow and errIntraFlow sentinel errors.
- This isolates failures and acts as a circuit breaker: a failure to select a queue (errInterFlow) aborts the current priority band but allows the processor to continue to lower-priority bands. A failure after a queue is selected (errIntraFlow) aborts the band for the entire cycle to prevent tight-loop error conditions.
BandFilter Abstraction:
- Introduces the BandFilter function type, which acts as a pre-policy gate. This decouples the logic of determining request viability (e.g., is the system saturated?) from the logic of selection (e.g., which flow is the fairest to pick next?).
- This significantly simplifies the mental model for policy authors, who can focus solely on their selection logic. A NewSaturationFilter is provided as the default implementation.
New Contracts and Mocks:
- Defines the contracts.SaturationDetector interface, which acts as a service port for the engine to query for system load.
- Adds a high-fidelity, stateful MockManagedQueue to enable deterministic testing of the processor's concurrent logic.

Testing Strategy

The ShardProcessor is a complex, concurrent orchestrator. To test it reliably, this PR introduces an extensive, high-fidelity testHarness in processor_test.go.

Deterministic Simulation: The harness uses stateful mocks with function-based overrides to pause execution at critical moments. This allows us to deterministically simulate and verify the processor's behavior during race conditions (e.g., the dispatch vs. expiry race), which would be impossible with concrete implementations.
Failure Mode Injection: We can trigger on-demand errors from any dependency to verify the processor's resilience and complex error-handling logic, such as the errIntraFlow circuit breaker.
Isolation: The tests verify the processor's orchestration logic in complete isolation, ensuring that the tests are not affected by confounding bugs in its dependencies.

A detailed explanation of this testing philosophy is included in the comments at the top of processor_test.go.

netlify · 2025-07-22T01:58:57Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`30fbecb`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68827a0c2810bf0008404039
😎 Deploy Preview	https://deploy-preview-1203--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-07-22T01:59:02Z

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2025-07-22T02:00:35Z

/assign @ahg-g

/assign @kfswain

ahg-g · 2025-07-22T04:38:49Z

/ok-to-test

pkg/epp/flowcontrol/contracts/saturationdetector.go

ahg-g · 2025-07-23T23:34:25Z

pkg/epp/flowcontrol/controller/internal/processor.go

+// `dispatchCycle` methods). New requests are funneled to this goroutine via the `enqueueChan`.
+//
+// This design choice is critical for correctness and performance. The `Run` method's goroutine has exclusive ownership
+// of all operations that *add* items to the queues (`enqueue`). This prevents the most dangerous type of race


This design materially increased the complexity, I don't follow why a multi-writer/single-reader queue is not sufficient here.

That's a great question. You're right, a standard multi-writer queue would absolutely be simpler under normal circumstances.

The main reason for this single-writer approach is the complexity of our capacity check. The system needs to validate capacity not just for an individual flow's queue, but hierarchically against the entire priority band and the overall shard.

In a multi-writer model, that "check-then-act" sequence would require a coarse, shard-wide lock to prevent race conditions, which would likely create a contention bottleneck. This channel-based design serializes the additions, making that multi-level capacity check atomic without locks. The ShardProcessor's core design is built around this to ensure correctness and performance.

This model also gives us a big advantage for the future. We plan to add displacement policies (e.g., a high-priority item evicting lower-priority ones), which involves a complex, multi-step atomic transaction. Serializing all state-mutating operations through this single goroutine will make that much safer to implement.

This is a key design point, and I realize the rationale could be clearer. I'll add a more detailed comment to the ShardProcessor struct to explain these trade-offs for future contributors. Does this reasoning make sense?

Enqueueing the requests is being done sequentially anyway by the shard processor, so why would a shard-wide lock for the writers be worse here? also, I think we can be approximate when checking for capacity, we don't need to be exact, so perhaps a lock for capacity checking is not even necessary?

Not asking to change, just trying to think through the rationale behind this choice.

Thank you for the follow-up and for pushing on this, these are exactly the right questions to be asking to make sure the design is robust. You've hit on the core trade-offs I was weighing, and I'd like to walk you through my reasoning.

Why the Buffered Channel is Better Than a Lock Here

You're right, the processor does its actual enqueue work sequentially in either model. The key difference is that the buffered channel serves a critical dual purpose: it decouples the caller and it provides a high-fidelity backpressure signal.

Decoupling: With a lock, the FlowController's main goroutine would block every time it calls Enqueue on a busy shard, stalling the entire distribution process. The channel's non-blocking send frees the controller instantly, which is critical for system throughput.

Backpressure Signal: This is just as important. The state of the buffered channel is the backpressure signal. If a send fails because the buffer is full, the FlowController receives an immediate, unambiguous signal that this specific shard is saturated. This is invaluable data for its distribution algorithm, allowing the system to be more intelligent in its load balancing—a capability a simple lock can't offer.

Why Strict Capacity is Better Than Approximate

This leads to the next logical question: could we simplify even further by removing the need for strict synchronization and allowing approximate capacity checks?

This is a valid architectural path, but it has two significant technical downsides in our specific context:

It offers a performance gain we don't need. The current non-blocking send to a buffered channel is already extremely fast. A lock-free atomic model would add non-deterministic behavior without solving a real-world performance bottleneck.

It would make our future roadmap significantly harder to implement. This is the most critical point. The roadmap includes features like displacement.

The Displacement Case

You could argue that for a complex, multi-step transaction like displacement (add one item, remove one or more others), a single coarse-grained lock would make ensuring atomicity "easier".

However, the true cost of that lock becomes apparent when you consider its impact on the FlowController. A displacement transaction could be long.

With a lock, the FlowController's goroutine would be blocked for the entire duration of that long transaction. The central distributor would be stalled, unable to process other requests, even those destined for different, non-busy shards.

With the channel, the FlowController performs its non-blocking handoff and is immediately free. While the ShardProcessor is busy with the long displacement, the FlowController can continue its work, processing other requests and distributing them to other available shards.

The channel model contains the performance cost of a slow operation to the single ShardProcessor's goroutine, preventing it from bubbling up and halting the entire FlowController. The lock-based model would cause that performance cost to block the central distributor.

In summary, my hope is that while the ShardProcessor's internals may seem complex, this design results in a system that is higher-throughput and provides a robust, extensible foundation for the more complex features we know are coming.

I've made sure this full rationale is captured clearly in the code comments (I migrated it to the package comment and refer to it from the ShardProcesser go-doc comment now. Thank you for taking the time to dig into this with me.

Thanks for the elaborate reply, this is an interesting design choice indeed!

Decoupling: With a lock, the FlowController's main goroutine would block every time it calls Enqueue on a busy shard, stalling the entire distribution process. The channel's non-blocking send frees the controller instantly, which is critical for system throughput.

We don't have everything wired up yet so I might be wrong, but in my mind it is not the flowcontroller main goroutine that should block, it is the request goroutine; I assumed that the request goroutine would call enqueue and blocks until it gets its chance to add the request to the queue.

pkg/epp/flowcontrol/controller/internal/processor.go

ahg-g · 2025-07-24T16:23:57Z

This looks good to me, please address the two suggestions on code change to move forward.

LukeAVanDrie · 2025-07-24T17:44:42Z

This looks good to me, please address the two suggestions on code change to move forward.

Thank you for the thorough review! I've addressed the suggestions and pushed up the changes.

kfswain · 2025-07-24T17:56:52Z

Just FYI: I plan on taking a look through this, this afternoon, I want to make sure I'm up to speed here. Sorry for delays on my end

Introduces the `ShardProcessor`, the core data plane worker for the `FlowController`. Each processor is paired with a `RegistryShard` and is responsible for all request lifecycle operations on that shard, including enqueueing, dispatching, and background cleanup. The processor's design is centered on correctness and performance in a highly concurrent environment. It uses a single-goroutine ownership model for its main `Run` loop, which serializes all queue write operations (enqueues) through a channel. This makes the critical "check-then-act" capacity logic safe without complex locking. The `flowItem` uses a `sync.Once` to guarantee idempotent finalization, deterministically resolving the race between dispatch and expiry. Resilience is achieved through a two-tiered error handling strategy that distinguishes between inter-flow and intra-flow policy failures. This isolates faults, prevents tight-loop error conditions, and ensures a failure in one priority band does not halt progress in others. To simplify policy logic, this change also introduces the `BandFilter` abstraction. This acts as a pre-policy gate, decoupling the logic of *viability* (e.g., is a flow backpressured?) from the logic of *selection* (e.g., which flow is next?). A comprehensive, high-fidelity test suite with stateful mocks is included to reliably verify the processor's complex concurrent behavior, race conditions, and failure modes.

LukeAVanDrie · 2025-07-24T18:25:20Z

@kfswain and @ahg-g

You might notice a seemingly unrelated change in besthead_test.go. I've included this fix here because my refactoring of the framework mocks to support behavioral testing in #1202 exposed a latent, flaky test.

The original test for the besthead policy was non-deterministic because it iterated over a map to build its test cases. This meant the test would sometimes fail to trigger a necessary panic, causing intermittent CI failures.

The fix was to change the test setup to use a slice instead of a map, guaranteeing a deterministic iteration order and making the test reliable. I've included this fix in this PR to ensure the CI remains green

If this PR plans to stay open for much longer, I will put this fix into it's own isolated PR to resolve flaky CI for others contributing to the repo today.

Edit: split off into #1231 to unblock main

kfswain · 2025-07-24T23:18:38Z

pkg/epp/flowcontrol/controller/doc.go

+// The effectiveness of the sharded model hinges on a critical assumption: while the system as a whole manages a
+// heterogeneous set of flows, the traffic *within a single logical flow* is assumed to be roughly homogeneous in its
+// characteristics. A logical flow is intended to represent a single workload or tenant; therefore, the most
+// unpredictable variables (effecting decode behavior) are expected to be statistically similar *within* that flow.


'the most unpredictable variables' since this seems to be the core of the critical assumption, it may be good to list these variables.

kfswain · 2025-07-24T23:22:27Z

pkg/epp/flowcontrol/controller/internal/doc.go

+// This model's true power is that it provides a robust foundation for future features like **displacement** (a
+// high-priority item evicting lower-priority ones). This is an "all-or-nothing" atomic transaction that is
+// exceptionally difficult to implement correctly in a lock-free or coarse-grained locking model without significant
+// performance penalties. The single-writer model contains the performance cost of such a potentially long transaction


'performance penalties' did we do some rudimentary benchmarking to compare limitations? Or is there literature on these approaches?

kfswain · 2025-07-24T23:31:01Z

pkg/epp/flowcontrol/controller/internal/filter.go

+		// Phase 2 (Future): This is where per-flow saturation logic would go.
+		// It would iterate `band`, call `IsSaturated(ctx, flowID)`, and build a filtered map of allowed flows.
+		// For now, no per-flow filtering is done. Return nil to signal the fast path.
+		return nil, false // Do not pause, and do not filter any flows.


What would per-flow saturation look like? Is this a full expenditure of 'fairness' budget?

kfswain · 2025-07-24T23:41:21Z

pkg/epp/flowcontrol/controller/internal/processor.go

+
+		if err := sp.dispatchItem(item, dispatchPriority, logger); err != nil {
+			// All errors from dispatchItem are considered intra-flow and unrecoverable for this band in this cycle.
+			logger.Error(err, "Failed to dispatch item, skipping priority band for this cycle")


what sort of errors could happen from dispatch?

Jumping an entire priority band (and subsequently potentially dispatching a lower priority band if the failure is intermittent) might be a surprising behavior.

kfswain · 2025-07-24T23:44:31Z

pkg/epp/flowcontrol/controller/internal/processor.go

+	// FUTURE EXTENSION POINT: The iteration over priority bands is currently a simple, strict-priority loop.
+	// This could be abstracted into a third policy tier (e.g., an `InterBandDispatchPolicy`) if more complex scheduling
+	// between bands, such as Weighted Fair Queuing (WFQ), is ever required. For now, strict priority is sufficient.
+	for _, priority := range sp.shard.AllOrderedPriorityLevels() {


What happens if the Critical band fills up faster than this can clear it (and the pool has capacity for all requests)? I'm wondering how likely is that to happen, latency seems to be variable because the saturation detection logic is ran per flow (eventually), so this seems plausible as the saturation detection logic could grow sophisticated/expensive?

Would that mean that lower criticality requests do not get dispatched, even with capacity available?

kfswain · 2025-07-24T23:48:23Z

pkg/epp/flowcontrol/controller/internal/processor.go

+			if !sp.dispatchCycle(ctx) {
+				// If no work was done, yield to other goroutines to prevent a tight, busy-loop when idle, but allow for
+				// immediate rescheduling.
+				runtime.Gosched()


musing: We should probably look into our GOMAXPROCS if we start to hit scaling limits.

kfswain · 2025-07-24T23:55:09Z

pkg/epp/flowcontrol/controller/internal/processor.go

+	// enqueueChannelBufferSize sets the size of the buffered channel that accepts incoming requests for the shard
+	// processor. This buffer acts as a "shock absorber," decoupling the upstream distributor from the processor's main
+	// loop and allowing the system to handle short, intense bursts of traffic without blocking the distributor.
+	enqueueChannelBufferSize = 100


may need to make this configurable

kfswain · 2025-07-24T23:56:17Z

pkg/epp/flowcontrol/controller/internal/processor.go

+
+// Enqueue sends a new flow item to the processor's internal channel for asynchronous processing by its main `Run` loop.
+// If the processor is shutting down, it immediately finalizes the item with a shutdown error.
+func (sp *ShardProcessor) Enqueue(item *flowItem) {


Where is this called?

kfswain · 2025-07-25T00:01:01Z

pkg/epp/flowcontrol/controller/internal/processor.go

+	// This avoids holding locks on the shard while processing, and allows us to determine the optimal number of workers.
+	var queuesToProcess []framework.FlowQueueAccessor
+	for _, priority := range sp.shard.AllOrderedPriorityLevels() {
+		band, err := sp.shard.PriorityBandAccessor(priority)


A few questions:

Where is PriorityBandAccessor defined?

What is the lifecycle of subsetPriorityBandAccessor?

How are the allowed flows determined?

kfswain · 2025-07-25T00:03:09Z

Left some comments, overall it seems reasonable! I'm curious to see how the single routine shardprocessor works in practice. How is ShardProcessor sharded? Is the current implementation just a single shard?

ahg-g · 2025-07-25T04:05:51Z

/lgtm

leaving approve to Kellen

kfswain · 2025-07-25T04:28:40Z

/approve

k8s-ci-robot · 2025-07-25T04:28:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

) Introduces the `ShardProcessor`, the core data plane worker for the `FlowController`. Each processor is paired with a `RegistryShard` and is responsible for all request lifecycle operations on that shard, including enqueueing, dispatching, and background cleanup. The processor's design is centered on correctness and performance in a highly concurrent environment. It uses a single-goroutine ownership model for its main `Run` loop, which serializes all queue write operations (enqueues) through a channel. This makes the critical "check-then-act" capacity logic safe without complex locking. The `flowItem` uses a `sync.Once` to guarantee idempotent finalization, deterministically resolving the race between dispatch and expiry. Resilience is achieved through a two-tiered error handling strategy that distinguishes between inter-flow and intra-flow policy failures. This isolates faults, prevents tight-loop error conditions, and ensures a failure in one priority band does not halt progress in others. To simplify policy logic, this change also introduces the `BandFilter` abstraction. This acts as a pre-policy gate, decoupling the logic of *viability* (e.g., is a flow backpressured?) from the logic of *selection* (e.g., which flow is next?). A comprehensive, high-fidelity test suite with stateful mocks is included to reliably verify the processor's complex concurrent behavior, race conditions, and failure modes.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 22, 2025

k8s-ci-robot requested review from danehans and nirrozenbaum July 22, 2025 01:58

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 22, 2025

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 22, 2025

k8s-ci-robot assigned ahg-g and kfswain Jul 22, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 22, 2025

LukeAVanDrie force-pushed the feat/flow-control-controller-shard-processor branch 2 times, most recently from 05b21f8 to 7176d6c Compare July 22, 2025 16:58

ahg-g reviewed Jul 24, 2025

View reviewed changes

LukeAVanDrie force-pushed the feat/flow-control-controller-shard-processor branch from 7176d6c to d668378 Compare July 24, 2025 17:25

LukeAVanDrie force-pushed the feat/flow-control-controller-shard-processor branch from d668378 to 30fbecb Compare July 24, 2025 18:23

kfswain reviewed Jul 25, 2025

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2025

k8s-ci-robot merged commit fb2dfd8 into kubernetes-sigs:main Jul 25, 2025
9 checks passed

feat(flowcontrol): Implement ShardProcessor engine #1203

feat(flowcontrol): Implement ShardProcessor engine #1203

Uh oh!

Conversation

LukeAVanDrie commented Jul 22, 2025

Key Changes

Testing Strategy

Uh oh!

netlify bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jul 22, 2025

Uh oh!

LukeAVanDrie commented Jul 22, 2025

Uh oh!

ahg-g commented Jul 22, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahg-g commented Jul 24, 2025

Uh oh!

LukeAVanDrie commented Jul 24, 2025

Uh oh!

kfswain commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LukeAVanDrie commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain commented Jul 25, 2025

Uh oh!

ahg-g commented Jul 25, 2025

Uh oh!

kfswain commented Jul 25, 2025

Uh oh!

k8s-ci-robot commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

netlify bot commented Jul 22, 2025 •

edited

Loading

ahg-g Jul 24, 2025 •

edited

Loading

kfswain commented Jul 24, 2025 •

edited

Loading

LukeAVanDrie commented Jul 24, 2025 •

edited

Loading