(feat) initial types and interfaces for pluggable data layer #1154

elevran · 2025-07-13T13:11:03Z

This PR includes the types and interfaces needed to support a pluggable data source.
It is the first PR of several aimed to refactor the current pkg/epp/backend in the design described in #1023.

Specifically:

Endpoint replaces PodMetrics to includes PodInfo, Metrics and AttributeMap (an extensible set of attributes, to be used by pluggable data sources)
DataSource encapsulates the scraping loop with Extractors collecting specific items from the raw object being scraped (e.g., /metrics). Specific implementation, matching the current backend/metrics, will be added under datalayer/metrics.
The Endpoint interface currently includes management of the refresh loop. This will be removed under control of the DataSource in a later PR. Similarly, MetricClient and Factory might become redundant.

Next PRs

find/replace to migrate code to use new types (e.g., backend.Pod => datalayer.PodInfo across datastore, tests, etc.). This will also introduce mocks for tests.
implement base metrics collector and extractor
move extractor to use backend/metrics.MetricsSpec, etc.

k8s-ci-robot · 2025-07-13T13:11:13Z

Hi @elevran. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-07-13T13:11:40Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`a3f98c8`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6878ef7ed5571e00081ab019
😎 Deploy Preview	https://deploy-preview-1154--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

pkg/epp/datalayer/attributemap.go

pkg/epp/datalayer/datasource.go

nirrozenbaum · 2025-07-13T13:45:03Z

pkg/epp/datalayer/datasource.go

+	// TODO: the following is useful for a data source that operates on
+	// endpoints and might not be relevant for "global/system" collectors which
+	// might not need the concept of an endpoint. This can be split, if needed,
+	// to a separate interface in the future.
+
+	// AddEndpoint adds an endpoint to collect from.
+	AddEndpoint(ep Endpoint) error
+
+	// RemoveEndpoint removes an endpoint from collection.
+	RemoveEndpoint(ep Endpoint) error


I see you wrote it yourself in the comments, it might make sense to have DataSource interface without these functions and additional EndpointDataSource interface that embeds DataSource and defines these functions.

yes, that's a possibility.
As part of #1023 we agreed to push non-endpoint data source out until we have concrete requirements so I think we can keep it for now (i.e., don't have enough to go on at this time to rationalize what is shared and what's not between the per-endpoint and the non-endpoint data source).

Leaving the note in code for now to ensure getting feedback from @kfswain and @ahg-g (will remove once approved and before merging).

I'm fine with explicit, tight scope for now. And only expanding as needed.

The future will have more information with how we want to expand our interface, so if we can postpone decisions make more informed decisions, that is fantastic.

nirrozenbaum · 2025-07-13T13:45:50Z

pkg/epp/datalayer/datasource.go

+	// ExpectedType defines the type expected by the extractor. It must match
+	// the DataSource.OutputType() the extractor registers for.
+	ExpectedType() reflect.Type


is this also used internally only?

I don't think so - it is needed by the DataSource to validate the extractor can process its output.
Prefer to validate that data source and extractor are compatible at registration time, so at least one needs to have the type as part of the interface.
But if you have an idea on how to implement type checks differently - would be happy to remove from here as well.

Prefer to validate that data source and extractor are compatible at registration time, so at least one needs to have the type as part of the interface.

Agreed, we should have some 'pre-flight checks' to make sure everything is buttoned up before marking EPP ready.

I see that this function exists below, and have commented there

pkg/epp/datalayer/datasource.go

nirrozenbaum · 2025-07-13T13:59:36Z

pkg/epp/datalayer/datasource.go

+
+var (
+	// DefaultDataSources is the system default data source registry.
+	DefaultDataSources = DataSourceRegistry{}


I thought this is initialized on bootstrap and remains static. in such a case we don't need locking and it might be implemented the same way as plugins registry (just as a map[string]DataSource).

need to guarantee that concurrent additions are also safe or not expected/allowed.
Am fine with making the assumption that data sources are all added before reads happen and are added from the same (e.g., main) go routine.

Am fine with making the assumption that data sources are all added before reads happen and are added from the same (e.g., main) go routine.

Yeah, I think that is a safe assumption. Personally, I am cautious to allow any dynamic reconfiguration for our pluggable systems. That would allow for some pretty wild and hard-to-defend behavior imo

Personally, I am cautious to allow any dynamic reconfiguration for our pluggable systems. That would allow for some pretty wild and hard-to-defend behavior imo

Does it make sense to make it explicit (e.g., introduce an explicit "freeze data sources" call so that they can't be changed afterwards)?

changed here to use sync.Map as well

pkg/epp/datalayer/datasource.go

pkg/epp/datalayer/podinfo.go

nirrozenbaum · 2025-07-13T14:31:19Z

pkg/epp/datalayer/metrics.go

+)
+
+// Addressable supports getting an IP address and a namespaced name.
+type Addressable interface {


why Addressable in metrics file?

Wanted to keep it closest to where it used (the metrics client).
Seems to small to have its own file but open to suggestions on a better fitting location.
Another option is to move it next to PodInfo (which implements Addressable)

nirrozenbaum · 2025-07-13T14:32:25Z

/ok-to-test

kfswain

Left some comments!

I have some higher level questions around the extractor model & how its intended to be used. If our v1 of the pluggable data layer is scoped specifically to endpoints, I'm also wondering why we arent feeding into the datasource representation of the endpoints.

Nothing hard-blocking, but probably better to discuss eariler rather than later.

Thanks Etai!

pkg/epp/datalayer/attributemap.go

kfswain · 2025-07-14T15:30:33Z

pkg/epp/datalayer/datasource.go

+	// TODO: the following is useful for a data source that operates on
+	// endpoints and might not be relevant for "global/system" collectors which
+	// might not need the concept of an endpoint. This can be split, if needed,
+	// to a separate interface in the future.
+
+	// AddEndpoint adds an endpoint to collect from.
+	AddEndpoint(ep Endpoint) error
+
+	// RemoveEndpoint removes an endpoint from collection.
+	RemoveEndpoint(ep Endpoint) error


I'm fine with explicit, tight scope for now. And only expanding as needed.

The future will have more information with how we want to expand our interface, so if we can postpone decisions make more informed decisions, that is fantastic.

pkg/epp/datalayer/datasource.go

kfswain · 2025-07-14T15:39:28Z

pkg/epp/datalayer/datasource.go

+
+// DataSource is an interface required from all datalayer data collection
+// sources.
+type DataSource interface {


This implies that each datasource will have it's own list of endpoints, that all need to be kept up to date, rather than feeding into a centralized list of endpoints, just with its custom data included.

OOC, why did we choose this path?

There are two possible design options. I'm fine with either so if there's a strong preference one way or another, am happy to change it.

data sources keep references to endpoints (e.g., in the case of /metrics, it needs the IP address and a way to set metrics on the endpoint itself). This is what I had in mind the rationale for Add/RemoveEndpoint.

endpoints keep references (directly or indirectly) to datasources (see bullets below).

I was thinking of the first option as more flexible (e.g., a data source that records node name for each endpoint does not need to be continuously called for the endpoint... the data source would not need to create a go routine for this endpoint at all), but engineering wise both work.

If we prefer a model where there is explicitly one go routine per endpoint (possible downsides: head of line blocking by "slow/failing" data sources impact collection by others, all data sources are applicable to all endpoints, etc.), the design would change as follows:

data store own the go routines and calls start/stop when endpoints are added and removed. There is a "collection context/state" stored in datastore per endpoint and there should not be a start/stop RefreshLoop method defined on endpoints - they're "pure data structures".

on every "tick", the go routine code calls all data sources to CollectFor(ep). Internally the data source can still call multiple extractors, so the main change is removal of Start/Stop methods and addition of CollectFor to data source.

WDYT? The second option (go routine per endpoint calling all data sources passing each the endpoint) is possibly closer to how it is done in GIE today so can continue along that path.

@kfswain @nirrozenbaum this is a schematic snippet of what the second option may look like:

type Collector interface { Name() string CollectFor(Endpoint) error AddExtractor(Extractor) } type Extractor interface { Name() string ExpectedType() reflect.Type Extract(interface{}, Endpoint) } // when data store is notified of new Pod, it starts a go routine for it (there a scraping context that saves a reference to the Endpoint, cancellation contexts, etc) func (dlc *CollectionState) CollectionCycle() { ep := dlc.Endpoint() for _, c := range datalayer.Collectors() { // or we can store collectors on the collection state directly if err := c.CollectFor(ep); err != nil { // handle failures - such as exponential backoff, disabling collectors after some time... } // if successful, collector has called extractors with ep for updating } }

Let me know if this seems more "natural/aligned".

Changed so DataSource is no longer informed of and tracks endpoints.

I'm with you here. The second option, from my perspective, separates concerns more cleanly:

Let the datastore focus on the actual collection mechanism. It does potentially create head of line blocking, agreed. But this seems acceptable with proper timeouts/backoff.

the Pod controller remains the single source of truth wrt endpoints and just needs to update the datastore endpoint list

Custom endpoints just focus on what data they care about, and how to get it

kfswain · 2025-07-14T15:40:29Z

pkg/epp/datalayer/datasource.go

+	// ExpectedType defines the type expected by the extractor. It must match
+	// the DataSource.OutputType() the extractor registers for.
+	ExpectedType() reflect.Type


Prefer to validate that data source and extractor are compatible at registration time, so at least one needs to have the type as part of the interface.

Agreed, we should have some 'pre-flight checks' to make sure everything is buttoned up before marking EPP ready.

I see that this function exists below, and have commented there

kfswain · 2025-07-14T15:41:31Z

pkg/epp/datalayer/datasource.go

+
+var (
+	// DefaultDataSources is the system default data source registry.
+	DefaultDataSources = DataSourceRegistry{}


Am fine with making the assumption that data sources are all added before reads happen and are added from the same (e.g., main) go routine.

Yeah, I think that is a safe assumption. Personally, I am cautious to allow any dynamic reconfiguration for our pluggable systems. That would allow for some pretty wild and hard-to-defend behavior imo

pkg/epp/datalayer/datasource.go

pkg/epp/datalayer/metrics.go

nirrozenbaum · 2025-07-16T10:15:33Z

this LGTM.
I left just a few few nits in case you want to address.
will leave final stamp to @kfswain

/approve

elevran · 2025-07-16T11:25:42Z

I'm leaning towards the second option outlined in the discussion below this comment where Endpoints are not updated into the DataSource but instead managed in the data store exclusively, calling collectors with the endpoint as parameter (i.e., data source manages the collection go routines).
If acceptable, I'd like to push a final version later today, integrating all feedback.
FYI: @kfswain @nirrozenbaum

Signed-off-by: Etai Lev Ran <[email protected]>

- DataSource is no longer tracking Endpoints. - Removed unused interfaces. - Changes to exported scope. - Move Addressable to podinfo.go. Signed-off-by: Etai Lev Ran <[email protected]>

Signed-off-by: Etai Lev Ran <[email protected]>

kfswain · 2025-07-17T22:47:45Z

Sorry for the delay @elevran I checked this PR a few days ago and it didn't look like you had responded to all comments. Either I looked just before you commented or I'm going blind :)

/lgtm
/approve

Thanks!

k8s-ci-robot · 2025-07-17T22:47:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elevran, kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain,nirrozenbaum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…tes-sigs#1154) * initial data layer types and interfaces Signed-off-by: Etai Lev Ran <[email protected]> * conform to Go boilerplate (missing empty lines) Signed-off-by: Etai Lev Ran <[email protected]> * address subset of review comments Signed-off-by: Etai Lev Ran <[email protected]> * replace mutex+map with sync.Map Signed-off-by: Etai Lev Ran <[email protected]> * Address review comments: - DataSource is no longer tracking Endpoints. - Removed unused interfaces. - Changes to exported scope. - Move Addressable to podinfo.go. Signed-off-by: Etai Lev Ran <[email protected]> * remove duplicate word Signed-off-by: Etai Lev Ran <[email protected]> --------- Signed-off-by: Etai Lev Ran <[email protected]>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 13, 2025

k8s-ci-robot requested review from kfswain and robscott July 13, 2025 13:11

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 13, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 13, 2025

nirrozenbaum reviewed Jul 13, 2025

View reviewed changes

pkg/epp/datalayer/attributemap.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Jul 13, 2025

View reviewed changes

pkg/epp/datalayer/datasource.go Outdated Show resolved Hide resolved