You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[DSV3] Adding 16B model training config, Enable FSDP and AC on DSV3-16B model (#1330)
## Context
1. Introduced a basic DSV3-16B model training config
2. Enabled FSDP/HSDP on DSV3-16B model training
## Performance
Current profiler looks like this: The `to_copy` takes to long and needs
to be optimized. The copy comes from dtype conversion in class MoE():
```routed_output = (routed_output.to(torch.float32) * top_scores.unsqueeze(-1)).to(x.dtype)```
With FSDP only:
<img width="1544" alt="Screenshot 2025-06-23 at 2 10 20 PM" src="https://github.com/user-attachments/assets/bcd698dc-3899-46e0-ae53-e7f8b0db13fc" />
0 commit comments