-
Notifications
You must be signed in to change notification settings - Fork 5
ETCD-less Dynamo in K8s #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Clarified trade-offs and Kubernetes concepts in etcd-k8s documentation. Signed-off-by: mohammedabdulwahhab <[email protected]>
This update introduces a new approach for managing DynamoEndpoint resources, combining pod lifecycle management with controller oversight. It details the server, controller, and client-side interactions for improved service discovery and resource management. Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Clarified wording and improved structure in the EndpointSlice based discovery section. Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
deps/etcd-k8s.md
Outdated
- Lists all instances that match the given namespace and component. | ||
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance | ||
- Returns the metadata for an instance. | ||
- `subscribe`(namespace: str, component: str) -> EventStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to distinguish this from event mechanisms - we can call this watch
and instead of eventstream
we have InstanceAdded
, InstanceRemoved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be the hardest one to support. For example NATS doesn't have a watch mechanism, so we'd have to poll. Maybe we should split this out to the notification mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @nnshah1 Fixed to
watch
. I do think the return value of this will be an EventStream that can yield an InstanceAddedEvent and InstanceRemovedEvent - @grahamking Potentially, I'm thinking of this as an analog to the
kv_get_and_watch_prefix
which is implemented with ETCD right now.
deps/etcd-k8s.md
Outdated
- Creates an instance and persists associated immutable metadata. Does not mark the instance as ready. | ||
- `list_instances`(namespace: str, component: str) -> List[Instance] | ||
- Lists all instances that match the given namespace and component. | ||
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not critical here - we can make this an object model where the Instance
has a metadata
property instead of get and set on the ServiceDiscovery
interface.
I'm thinking this is all part of the runtime
object model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deps/etcd-k8s.md
Outdated
- Returns the metadata for an instance. | ||
- `subscribe`(namespace: str, component: str) -> EventStream | ||
- Subscribes to events for the set of instances that match (namespace, component). Returns a stream of events giving signals for when instances that match the subscription are created, deleted, or readiness status changes. | ||
- `set_instance_status`(instance: Instance, status: str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me immediately why we need a separate step for setting status - could we have "add" and "remove" - so if an instance is ready it is added, if it is not ready it is 'removed'.
- just a thought in comparison to what we have today - we have add and remove but readiness is implied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in which scenario are we going to call this function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, it's detailed below in the doc
deps/etcd-k8s.md
Outdated
#### Frontend Discovers Decode Workers | ||
|
||
- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves. | ||
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implies frontend knows at launch that it is in namespace 'dynamo' and is looking for 'decode' - just wanting to confirm here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this extrapolate to other workers, like "planner"? Does Frontend know what workers it has and how many of each to expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i understand your question correctly, any component can list and watch any tuple of (namespace, component)
deps/etcd-k8s.md
Outdated
|
||
#### Frontend Needs to Know Some Model Specifics (like Tokenizer) | ||
|
||
- Decode writes a Model Card using the `write_kv` method. Frontend can simply read this entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be in the metadata - because if not this also creates a race condition for readiness - i.e. is it ready if the model card is not written?
I'm in favor of putting all required information for interacting with this instance into the metadata - this decouples the instances from each other as well. No shared state is required between instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that this might be sensible.
deps/etcd-k8s.md
Outdated
|
||
```python | ||
# Context: decode worker is registering the MDC before it marks itself as ready | ||
service_discovery.write_kv("mdc/Qwen3-32B", mdc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still unclear to me why the mdc has to live outside the instance metadata - we can discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe because this mdc would be duplicated in the case of multiple instance of the same worker ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the motivation was to not duplicate the MDC, but I believe we reached the consensus that it's okay for it to be duplicated as long as some conditions are checked.
deps/etcd-k8s.md
Outdated
|
||
## ServiceDiscovery Interface | ||
|
||
To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a single node, non K8s environment, can this discovery be backed by static configuration? Or local file system with file watchers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't put too much thought into it, but it should be the case. Could be useful for greg clark's use case and the NIM team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a section to cover this case - or add a TODO - need to have the local deployment case covered as well / sketeched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think it might be a useful exercise to do to see if this interface generalizes across different backends.
Co-authored-by: Neelay Shah <[email protected]> Signed-off-by: mohammedabdulwahhab <[email protected]>
- Writes data at a given key. Data not associated with any instance. | ||
- `read_kv`(key: str) | ||
- Reads data at a given key. Data not associated with any instance. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Figure out if there is a compelling reason to keep a separate entity for MDC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if each instance has MDC embedded, there are consistency and race conditions to watch for
deps/etcd-k8s.md
Outdated
|
||
## ServiceDiscovery Interface | ||
|
||
To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't put too much thought into it, but it should be the case. Could be useful for greg clark's use case and the NIM team.
deps/etcd-k8s.md
Outdated
- Creates an instance and persists associated immutable metadata. Does not mark the instance as ready. | ||
- `list_instances`(namespace: str, component: str) -> List[Instance] | ||
- Lists all instances that match the given namespace and component. | ||
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think this makes sense.
deps/etcd-k8s.md
Outdated
- Lists all instances that match the given namespace and component. | ||
- `get_metadata`(namespace: str, component: str, instance_id: str) -> Instance | ||
- Returns the metadata for an instance. | ||
- `subscribe`(namespace: str, component: str) -> EventStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense to me.
deps/etcd-k8s.md
Outdated
- Returns the metadata for an instance. | ||
- `subscribe`(namespace: str, component: str) -> EventStream | ||
- Subscribes to events for the set of instances that match (namespace, component). Returns a stream of events giving signals for when instances that match the subscription are created, deleted, or readiness status changes. | ||
- `set_instance_status`(instance: Instance, status: str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending discussion
deps/etcd-k8s.md
Outdated
#### Frontend Discovers Decode Workers | ||
|
||
- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves. | ||
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right
deps/etcd-k8s.md
Outdated
|
||
#### Frontend Needs to Know Some Model Specifics (like Tokenizer) | ||
|
||
- Decode writes a Model Card using the `write_kv` method. Frontend can simply read this entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that this might be sensible.
deps/etcd-k8s.md
Outdated
|
||
## ServiceDiscovery Interface | ||
|
||
To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is implementing this interface and who is using it? I'm not clear on where this sits, and everything afterwards depends on that.
- Is it provided by the frontend? Is the frontend instead calling into this interface?
- By a new discovery component?
- Is it provided by dynamo to the worker, so it's the worker calling it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC - this is provided by the runtime
- @mohammedabdulwahhab is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grahamking I've added a section about this here: https://github.com/ai-dynamo/enhancements/blob/df9ddfac245e189bb5e01a4d3355b25da166a828/deps/etcd-k8s.md#where-will-these-apis-be-used
deps/etcd-k8s.md
Outdated
To de-couple Dynamo from ETCD, we define a minimal `ServiceDiscovery` interface that can be satisfied by different backends (etcd, kubernetes, etc). In Kubernetes environments, we will use the Kubernetes APIs to implement this interface. | ||
|
||
### Methods | ||
- `create_instance`(namespace: str, component: str, metadata: dict) -> Instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be add_instance
? Is this how we tell the frontend that it has a new instance to route to? Or is it how the worker tells the discovery implementation to advertise itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to be register_instance
. It returns an instance identifier but does not mark it as ready until set_ready
is called.
deps/etcd-k8s.md
Outdated
- `write_kv`(key: str, value: str) | ||
- Writes data at a given key. Data not associated with any instance. | ||
- `read_kv`(key: str) | ||
- Reads data at a given key. Data not associated with any instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to provide generic key/value features. I think this should always be in the domain specific language of dynamo instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
# Fetch the associated metadata for the instance to register the model in-memory model registry | ||
metadata = service_discovery.get_metadata("dynamo", "decode", decode_worker.instance_id) | ||
model = metadata["Model"] | ||
# map the instance to the model in the in-memory model registry ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples are all in Python, but the implementation will need to be in Rust. That's fine, Python is clearer to illustrate the mechanics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will be written in Rust and consumed in rust
@@ -0,0 +1,314 @@ | |||
# Service and Model Discovery Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting what we have right now. This will likely be the integration point.
KeyValueStore
interface: https://github.com/ai-dynamo/dynamo/blob/main/lib/runtime/src/storage/key_value_store.rs#L39 which manages keys in KeyValueBucket's.KeyValueBucket
interface https://github.com/ai-dynamo/dynamo/blob/main/lib/runtime/src/storage/key_value_store.rs#L199
deps/etcd-k8s.md
Outdated
#### Frontend Discovers Decode Workers | ||
|
||
- Decode calls `create_instance("dynamo", "decode", metadata)` and `set_instance_status(decode_worker, "ready")` to register itself with the service discovery backend. Its metadata includes the model it serves. | ||
- Frontend calls `list_instances("dynamo", "decode")` and `get_metadata` to get all decode workers and the associated model information to bootstrap its cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this extrapolate to other workers, like "planner"? Does Frontend know what workers it has and how many of each to expect?
deps/etcd-k8s.md
Outdated
#### Decode Worker Needs to Know how to Reach Prefill Workers | ||
|
||
- Decode worker does the exact same thing as the frontend, instead listing and subscribing to the "prefill" component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will prefill workers be doing both frontend + decode worker work? e.g. create_instance("dynamo", "prefill", metadata)
and set_instance_status(prefill_worker, "ready")
and also list_instances("dynamo", "decode")
and get_metadata
? Will its instance status be dependent on the decode worker(s)' statuses?
|
||
<!-- ### Simplifications and Assumptions --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some diagrams/examples showing some more complex flows? Like Planner + Prometheus disagg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is metadata propagated + kept in sync for these components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hhzhang16 Added some more details about the way metadata is exposed and read.
deps/etcd-k8s.md
Outdated
``` | ||
|
||
#### Kubernetes Reference Impl | ||
- The readiness probe of the pod will proxy the result of `get_instance_status`. When the instance will be ready for traffic, the readiness probe will return 200 and EndpointSlices will be updated to include the instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a need for customizing readiness probes if instance readiness is more complex than a simple HTTP health check (e.g. waiting for a model to finish loading or for some other aspect to be set up)
deps/etcd-k8s.md
Outdated
### Discovery Flow | ||
1. **Instance Registration**: When `create_instance()` is called, the pod creates a ConfigMap with instance metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are ConfigMaps immutable per instance or recreated? What happens if metadata changes/gets updated?
name: dynamo-decode-service | ||
spec: | ||
selector: | ||
nvidia.com/dynamo-namespace: dynamo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when are you avoiding namespace collisions?
|
||
## Kubernetes EndpointSlice Discovery Mechanism | ||
|
||
This section provides a detailed explanation of how the ServiceDiscovery interface is implemented using Kubernetes EndpointSlices for service discovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to see a diagram for this!! create_instance --> Pod/ConfigMap --> Service --> EndpointSlice --> client subscribe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a mermaid diagram incoming
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Removed outdated implementation details and restructured the breakdown of the implementation section. Signed-off-by: mohammedabdulwahhab <[email protected]>
No description provided.