Skip to content

Conversation

mohammedabdulwahhab
Copy link

No description provided.

@mohammedabdulwahhab mohammedabdulwahhab changed the title Add document on ETCD alternatives for Dynamo in K8s ETCD-less Dynamo in K8s Sep 4, 2025
Clarified trade-offs and Kubernetes concepts in etcd-k8s documentation.

Signed-off-by: mohammedabdulwahhab <[email protected]>
This update introduces a new approach for managing DynamoEndpoint resources, combining pod lifecycle management with controller oversight. It details the server, controller, and client-side interactions for improved service discovery and resource management.

Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Clarified wording and improved structure in the EndpointSlice based discovery section.

Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
deps/etcd-k8s.md Outdated
- Lists all instances that match the given namespace and component.
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance
- Returns the metadata for an instance.
- `subscribe`(namespace: str, component: str) -> EventStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to distinguish this from event mechanisms - we can call this watch

and instead of eventstream we have InstanceAdded, InstanceRemoved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be the hardest one to support. For example NATS doesn't have a watch mechanism, so we'd have to poll. Maybe we should split this out to the notification mechanism.

Copy link
Author

@mohammedabdulwahhab mohammedabdulwahhab Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @nnshah1 Fixed to watch. I do think the return value of this will be an EventStream that can yield an InstanceAddedEvent and InstanceRemovedEvent
  • @grahamking Potentially, I'm thinking of this as an analog to the kv_get_and_watch_prefix which is implemented with ETCD right now.

deps/etcd-k8s.md Outdated
- Creates an instance and persists associated immutable metadata. Does not mark the instance as ready.
- `list_instances`(namespace: str, component: str) -> List[Instance]
- Lists all instances that match the given namespace and component.
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not critical here - we can make this an object model where the Instance has a metadata property instead of get and set on the ServiceDiscovery interface.

I'm thinking this is all part of the runtime object model

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think this makes sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

deps/etcd-k8s.md Outdated
- Returns the metadata for an instance.
- `subscribe`(namespace: str, component: str) -> EventStream
- Subscribes to events for the set of instances that match (namespace, component). Returns a stream of events giving signals for when instances that match the subscription are created, deleted, or readiness status changes.
- `set_instance_status`(instance: Instance, status: str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me immediately why we need a separate step for setting status - could we have "add" and "remove" - so if an instance is ready it is added, if it is not ready it is 'removed'.

  • just a thought in comparison to what we have today - we have add and remove but readiness is implied.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending discussion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in which scenario are we going to call this function ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, it's detailed below in the doc

deps/etcd-k8s.md Outdated
#### Frontend Discovers Decode Workers

- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves.
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies frontend knows at launch that it is in namespace 'dynamo' and is looking for 'decode' - just wanting to confirm here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this extrapolate to other workers, like "planner"? Does Frontend know what workers it has and how many of each to expect?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand your question correctly, any component can list and watch any tuple of (namespace, component)

deps/etcd-k8s.md Outdated

#### Frontend Needs to Know Some Model Specifics (like Tokenizer)

- Decode writes a Model Card using the `write_kv` method. Frontend can simply read this entry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be in the metadata - because if not this also creates a race condition for readiness - i.e. is it ready if the model card is not written?

I'm in favor of putting all required information for interacting with this instance into the metadata - this decouples the instances from each other as well. No shared state is required between instances.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that this might be sensible.

deps/etcd-k8s.md Outdated

```python
# Context: decode worker is registering the MDC before it marks itself as ready
service_discovery.write_kv("mdc/Qwen3-32B", mdc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still unclear to me why the mdc has to live outside the instance metadata - we can discuss

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe because this mdc would be duplicated in the case of multiple instance of the same worker ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the motivation was to not duplicate the MDC, but I believe we reached the consensus that it's okay for it to be duplicated as long as some conditions are checked.

deps/etcd-k8s.md Outdated

## ServiceDiscovery Interface

To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a single node, non K8s environment, can this discovery be backed by static configuration? Or local file system with file watchers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't put too much thought into it, but it should be the case. Could be useful for greg clark's use case and the NIM team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a section to cover this case - or add a TODO - need to have the local deployment case covered as well / sketeched.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think it might be a useful exercise to do to see if this interface generalizes across different backends.

Co-authored-by: Neelay Shah <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
- Writes data at a given key. Data not associated with any instance.
- `read_kv`(key: str)
- Reads data at a given key. Data not associated with any instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Figure out if there is a compelling reason to keep a separate entity for MDC

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if each instance has MDC embedded, there are consistency and race conditions to watch for

deps/etcd-k8s.md Outdated

## ServiceDiscovery Interface

To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't put too much thought into it, but it should be the case. Could be useful for greg clark's use case and the NIM team.

deps/etcd-k8s.md Outdated
- Creates an instance and persists associated immutable metadata. Does not mark the instance as ready.
- `list_instances`(namespace: str, component: str) -> List[Instance]
- Lists all instances that match the given namespace and component.
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think this makes sense.

deps/etcd-k8s.md Outdated
- Lists all instances that match the given namespace and component.
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance
- Returns the metadata for an instance.
- `subscribe`(namespace: str, component: str) -> EventStream
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense to me.

deps/etcd-k8s.md Outdated
- Returns the metadata for an instance.
- `subscribe`(namespace: str, component: str) -> EventStream
- Subscribes to events for the set of instances that match (namespace, component). Returns a stream of events giving signals for when instances that match the subscription are created, deleted, or readiness status changes.
- `set_instance_status`(instance: Instance, status: str)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending discussion

deps/etcd-k8s.md Outdated
#### Frontend Discovers Decode Workers

- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves.
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

deps/etcd-k8s.md Outdated

#### Frontend Needs to Know Some Model Specifics (like Tokenizer)

- Decode writes a Model Card using the `write_kv` method. Frontend can simply read this entry.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that this might be sensible.

deps/etcd-k8s.md Outdated

## ServiceDiscovery Interface

To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface.
Copy link

@grahamking grahamking Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is implementing this interface and who is using it? I'm not clear on where this sits, and everything afterwards depends on that.

  • Is it provided by the frontend? Is the frontend instead calling into this interface?
  • By a new discovery component?
  • Is it provided by dynamo to the worker, so it's the worker calling it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC - this is provided by the runtime - @mohammedabdulwahhab is that correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deps/etcd-k8s.md Outdated
To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface.

### Methods
- `create_instance`(namespace: str, component: str, metadata: dict) -> Instance

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be add_instance? Is this how we tell the frontend that it has a new instance to route to? Or is it how the worker tells the discovery implementation to advertise itself?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to be register_instance. It returns an instance identifier but does not mark it as ready until set_ready is called.

deps/etcd-k8s.md Outdated
- `write_kv`(key: str, value: str)
- Writes data at a given key. Data not associated with any instance.
- `read_kv`(key: str)
- Reads data at a given key. Data not associated with any instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to provide generic key/value features. I think this should always be in the domain specific language of dynamo instances.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

# Fetch the associated metadata for the instance to register the model in-memory model registry
metadata = service_discovery.get_metadata("dynamo", "decode", decode_worker.instance_id)
model = metadata["Model"]
# map the instance to the model in the in-memory model registry ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples are all in Python, but the implementation will need to be in Rust. That's fine, Python is clearer to illustrate the mechanics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be written in Rust and consumed in rust

@@ -0,0 +1,314 @@
# Service and Model Discovery Interface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting what we have right now. This will likely be the integration point.

deps/etcd-k8s.md Outdated
#### Frontend Discovers Decode Workers

- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves.
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this extrapolate to other workers, like "planner"? Does Frontend know what workers it has and how many of each to expect?

deps/etcd-k8s.md Outdated
Comment on lines 73 to 75
#### Decode Worker Needs to Know how to Reach Prefill Workers

- Decode worker does the exact same thing as the frontend, instead listing and subscribing to the "prefill" component.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will prefill workers be doing both frontend + decode worker work? e.g. create_instance("dynamo", "prefill", metadata) and set_instance_status(prefill_worker, "ready") and also list_instances("dynamo", "decode") and get_metadata? Will its instance status be dependent on the decode worker(s)' statuses?

Comment on lines +77 to +78

<!-- ### Simplifications and Assumptions -->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some diagrams/examples showing some more complex flows? Like Planner + Prometheus disagg?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is metadata propagated + kept in sync for these components?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhzhang16 Added some more details about the way metadata is exposed and read.

deps/etcd-k8s.md Outdated
```

#### Kubernetes Reference Impl
- The readiness probe of the pod will proxy the result of `get_instance_status`. When the instance will be ready for traffic, the readiness probe will return 200 and EndpointSlices will be updated to include the instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a need for customizing readiness probes if instance readiness is more complex than a simple HTTP health check (e.g. waiting for a model to finish loading or for some other aspect to be set up)

deps/etcd-k8s.md Outdated
### Discovery Flow
1. **Instance Registration**: When `create_instance()` is called, the pod creates a ConfigMap with instance metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are ConfigMaps immutable per instance or recreated? What happens if metadata changes/gets updated?

name: dynamo-decode-service
spec:
selector:
nvidia.com/dynamo-namespace: dynamo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/when are you avoiding namespace collisions?


## Kubernetes EndpointSlice Discovery Mechanism

This section provides a detailed explanation of how the ServiceDiscovery interface is implemented using Kubernetes EndpointSlices for service discovery.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see a diagram for this!! create_instance --> Pod/ConfigMap --> Service --> EndpointSlice --> client subscribe

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a mermaid diagram incoming

Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Removed outdated implementation details and restructured the breakdown of the implementation section.

Signed-off-by: mohammedabdulwahhab <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants