Skip to content

Create a SpeculativeCacheManager #6

@nasheedyasin

Description

@nasheedyasin

Is your feature request related to a problem? Please describe.
I find hard to book-keep all the KV cache updates and backtracks between the draft_subnet and verify_subnet. Need a consistent way to perform these actions; especially when we have batches where an uneven number of draft tokens are accepted for each prompt.

Describe the solution you'd like
A SpeculativeCacheManager that will abstract away all of the KV Cache updates and backtracks.

Describe alternatives you've considered
Making classmethods in the CleavedAutoModelForCausalLM class; this seemed to violate the single responsibility principle and made the class very verbose.

Additional context
None atm.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions