Delete hidden_size and num_attention_heads modification in a config #198

titaiwangms · 2025-08-08T22:03:56Z

Previous to this PR, the config is modified to smaller (not significant smaller) to speed up the model run/export on the benchmark. However, there is a hard requirements on LLMs that "embed_dim must be divisible by num_heads" (e.g.: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L175-L178).

This PR deletes the lines of code that modifies those attributes, because (1) reduce num_layers should be enough for speedy model run/export, (2) conditionally adjust these attributes might affect model performance (deviate from its design)

gramalingam

LGTM

fix

c4228f6

titaiwangms requested a review from xadupre August 8, 2025 22:03

keep head_dim

7d8c261

titaiwangms requested a review from sdpython August 8, 2025 22:14

gramalingam approved these changes Aug 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delete hidden_size and num_attention_heads modification in a config #198

Delete hidden_size and num_attention_heads modification in a config #198

titaiwangms commented Aug 8, 2025

Uh oh!

gramalingam left a comment

Uh oh!

Uh oh!

Delete hidden_size and num_attention_heads modification in a config #198

Are you sure you want to change the base?

Delete hidden_size and num_attention_heads modification in a config #198

Conversation

titaiwangms commented Aug 8, 2025

Uh oh!

gramalingam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!