Support `StaticCache` in assisted generation

Looking for contributions!

Assisted generation (or speculative decoding) is a strategy to speed up generation. Using `StaticCache` and `torch.compile` is another strategy to speed up generation. Currently, the two are not compatible. It would be nice to be able to use both at the same time, for maximum speed 😎 

In a nutshell, assisted generation has to clear the cache of the models for the tokens that were rejected. `StaticCache` doesn't have the functions to do it implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `StaticCache` in assisted generation #32946

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support StaticCache in assisted generation #32946

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support `StaticCache` in assisted generation #32946