-
Notifications
You must be signed in to change notification settings - Fork 0
Labels
Description
Is your feature request related to a problem? Please describe.
I find hard to book-keep all the KV cache updates and backtracks between the draft_subnet
and verify_subnet
. Need a consistent way to perform these actions; especially when we have batches where an uneven number of draft tokens are accepted for each prompt.
Describe the solution you'd like
A SpeculativeCacheManager
that will abstract away all of the KV Cache updates and backtracks.
Describe alternatives you've considered
Making classmethods in the CleavedAutoModelForCausalLM
class; this seemed to violate the single responsibility principle and made the class very verbose.
Additional context
None atm.