You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assisted generation (or speculative decoding) is a strategy to speed up generation. Using StaticCache and torch.compile is another strategy to speed up generation. Currently, the two are not compatible. It would be nice to be able to use both at the same time, for maximum speed 😎
In a nutshell, assisted generation has to clear the cache of the models for the tokens that were rejected. StaticCache doesn't have the functions to do it implemented.