Skip to content

Commit f062d48

Browse files
authored
[llama4] Change expert_bias and tokens_per_expert to non-persistent buffer (#1403)
As titled. Tested on llama4 debugging model (dp=8, ep=2): <img width="1188" height="226" alt="Screenshot 2025-07-15 at 8 05 12 PM" src="https://github.com/user-attachments/assets/24a1bf87-b038-481e-b40b-96e2123c96fc" />
1 parent 9a8cb98 commit f062d48

File tree

2 files changed

+0
-4
lines changed
  • torchtitan
    • experiments/llama4/model
    • models/deepseek_v3/model

2 files changed

+0
-4
lines changed

torchtitan/experiments/llama4/model/moe.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,10 @@ def __init__(self, model_args: TransformerModelArgs):
249249
self.register_buffer(
250250
"expert_bias",
251251
torch.zeros(num_experts, dtype=torch.float32),
252-
persistent=True,
253252
)
254253
self.register_buffer(
255254
"tokens_per_expert",
256255
torch.zeros(num_experts, dtype=torch.float32),
257-
persistent=True,
258256
)
259257
else:
260258
self.expert_bias = None

torchtitan/models/deepseek_v3/model/moe.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -290,12 +290,10 @@ def __init__(self, model_args: DeepSeekV3ModelArgs):
290290
self.register_buffer(
291291
"expert_bias",
292292
torch.zeros(num_experts, dtype=torch.float32),
293-
persistent=True,
294293
)
295294
self.register_buffer(
296295
"tokens_per_expert",
297296
torch.zeros(num_experts, dtype=torch.float32),
298-
persistent=True,
299297
)
300298
else:
301299
self.expert_bias = None

0 commit comments

Comments
 (0)