Skip to content

Conversation

markurtz
Copy link
Collaborator

@markurtz markurtz commented Aug 5, 2025

Description

  • Share code between Eagle and Eagle3 implementations as base class to minimize redundancy
  • Remove redundant params and source as much info from the transformers config as possible
  • Remove current implementation for token mappings/head pruning, more needs to be worked in to solidify the direction and implementation there
  • Minor cleanup for eagle3 module

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Eagle3 model implementation to reduce code duplication and streamline configuration. It introduces a shared base class for transformer layer configuration and removes the current vocabulary mapping implementation.

  • Extract shared transformer layer configuration into TransformerLayerConfigMixin base class
  • Remove redundant configuration parameters and vocabulary mapping logic from Eagle3
  • Clean up Eagle3 model initialization and forward pass implementation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/speculators/models/eagle3.py Simplified Eagle3 configuration and model by removing vocab mapping, using mixin for shared config
src/speculators/models/eagle.py Extracted transformer layer config logic into reusable TransformerLayerConfigMixin class
src/speculators/convert/eagle/eagle3_converter.py Updated Eagle3 converter to remove deprecated config parameters

Copy link

github-actions bot commented Aug 5, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/neuralmagic/speculators/actions/runs/16760456544/artifacts/3694860958.
They will be retained for up to 30 days.
Commit: 653f164

Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be testing conversion both with and without reduced vocab and a dummy forward pass based on known shapes before making these changes, Because our diffs have already merged into vllm, I'm afraid we will break vllm serve <speculators-model> without testing


self.fc = nn.Linear(
3 * self.target_hidden_size, # Use target model's hidden size
3 * self.hidden_size, # Use target model's hidden size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markurtz since the hidden states are coming from the verifier, shouldn't this be 3 * self.target_hidden_size?

Comment on lines -405 to -412
self.register_buffer( # type: ignore[attr-defined]
"d2t",
torch.zeros(self.draft_vocab_size, dtype=torch.long),
)
self.register_buffer( # type: ignore[attr-defined]
"t2d",
torch.zeros(self.target_vocab_size, dtype=torch.bool),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these? wouldn't this break the conversion of existing checkpoints? I think it would be nice to have their presence reflected in the config, and initialize these based on that arg; something like: reduced_vocab: bool I would also argue if this arg is True we should keep target_vocab_size along with vocab_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants