Skip to content

[Feature] Support for Mooncake integration #74

@Syspretor

Description

@Syspretor

Checklist

Motivation

Mooncake is a distributed KVCache storage engine specifically designed for inference with large language models (LLM) based on Transfer Engine. It is a central component in the KVCache-centric distributed architecture. The goal of Mooncake is to store reusable KV caches at various locations within the inference cluster.

Integrate Mooncake as a role in the RBG-deployed SGLang inference service, providing KVCache offload capabilities for the inference service.

Related resources

sglang-integration

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions