-
Notifications
You must be signed in to change notification settings - Fork 8
Polish Eagle3 config and model definitions #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the Eagle3 model implementation to reduce code duplication and streamline configuration. It introduces a shared base class for transformer layer configuration and removes the current vocabulary mapping implementation.
- Extract shared transformer layer configuration into
TransformerLayerConfigMixin
base class - Remove redundant configuration parameters and vocabulary mapping logic from Eagle3
- Clean up Eagle3 model initialization and forward pass implementation
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/speculators/models/eagle3.py | Simplified Eagle3 configuration and model by removing vocab mapping, using mixin for shared config |
src/speculators/models/eagle.py | Extracted transformer layer config logic into reusable TransformerLayerConfigMixin class |
src/speculators/convert/eagle/eagle3_converter.py | Updated Eagle3 converter to remove deprecated config parameters |
Co-authored-by: Copilot <[email protected]>
📦 Build Artifacts Available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be testing conversion both with and without reduced vocab and a dummy forward pass based on known shapes before making these changes, Because our diffs have already merged into vllm, I'm afraid we will break vllm serve <speculators-model>
without testing
|
||
self.fc = nn.Linear( | ||
3 * self.target_hidden_size, # Use target model's hidden size | ||
3 * self.hidden_size, # Use target model's hidden size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markurtz since the hidden states are coming from the verifier, shouldn't this be 3 * self.target_hidden_size
?
self.register_buffer( # type: ignore[attr-defined] | ||
"d2t", | ||
torch.zeros(self.draft_vocab_size, dtype=torch.long), | ||
) | ||
self.register_buffer( # type: ignore[attr-defined] | ||
"t2d", | ||
torch.zeros(self.target_vocab_size, dtype=torch.bool), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove these? wouldn't this break the conversion of existing checkpoints? I think it would be nice to have their presence reflected in the config, and initialize these based on that arg; something like: reduced_vocab: bool
I would also argue if this arg is True we should keep target_vocab_size
along with vocab_size
Description