feat: Implement Muon Optimizer Grafting #227

Vishal-sys-code · 2025-08-07T20:58:29Z

This PR introduces Muon Optimizer Grafting, a variant of the Shampoo optimizer that incorporates Momentum-SGD for grafting. This approach is beneficial for training large-scale models efficiently, and follows the optimization direction mentioned in the open issue #203.

While not formally assigned to this task, I came across the discussion in issue #203, where there was a mention of Muon support being a potential addition in the future. The comment:

"Thanks for your kind words, and I am not familiar with K-FAC and TNT, but this repo is currently focusing on Shampoo-like algorithms, i.e., Distributed Shampoo, SOAP, Muon (coming soon) for now."

...inspired me to implement Muon grafting, contributing to the community and the project. I hope this can serve as a helpful starting point or even be merged directly, if appropriate.

What is included:

MuonGraftingConfig: A new configuration class to explicitly enable and control Muon grafting behavior.
DistributedShampoo enhancements: Updated to support and integrate the new Muon grafting mechanism cleanly.
Test coverage: Added test_muon_grafting to ensure the implementation behaves as expected and is compatible with the existing optimizer framework.

The design aligns with the architecture and practices observed in the repo and other optimizer implementations. Care has been taken to ensure maintainability, modularity, and minimal disruption to existing components.

Note: I understand the Muon optimizer was on the roadmap, and I hope this early implementation is helpful. Open to feedback and happy to revise or improve the implementation based on your guidance or plans for Muon support.

Thank you for your time and for maintaining this excellent repository.

tsunghsienlee · 2025-08-10T00:30:36Z

Hi @Vishal-sys-code ,

Thanks for your PR, and I wonder are you interested in Muon or using Muon for grafting?

For Muon, it is added few weeks ago, and https://github.com/facebookresearch/optimizers/tree/main/distributed_shampoo#example-6-muon is the instruction on how to use it.

For using Muon for grafting, that is not supported yet, and I am working on that by merging GraftingConfig into PreconditionerConfig so the Muon implementation above could be used for grafting as well.

Please let me know what is your need so we could see how this could be done together.

Vishal-sys-code · 2025-08-11T08:34:42Z

Hi @tsunghsienlee, thanks for the pointer and the review!

My goal with this PR was not to add a separate Muon optimizer, but to let Shampoo use Muon for grafting. I noticed Muon is already in the upstream repo, awesome.

If you’re planning to merge GraftingConfig into PreconditionerConfig so the existing Muon implementation can be reused for grafting, I’m happy to adapt my changes to that design instead of duplicating code. Specifically, I can:

Rebase and update this PR to use the upstream Muon preconditioner (or a small adapter) for grafting.
Convert MuonGraftingConfig into a thin adapter/alias that maps to the unified config.
Expand test_muon_grafting to check for consistent behavior between “Muon as preconditioner” and “Muon used for grafting.”

Which approach would you prefer?

I can rework the PR now to match the PreconditionerConfig merge, or
I can keep this PR as a short-term, standalone Muon-grafting implementation and switch it over after your refactor lands.

I’m flexible, tell me which you’d like and I’ll update the branch. Thanks again!

tsunghsienlee · 2025-08-16T16:54:35Z

Hi @Vishal-sys-code ,

Sorry for my late reply, I was working on merging GraftingConfig into PreconditionerConfig so we could merge the optimizers we could use for grafting and preconditioning. The problem is that there are some internal dependencies due to the types nature of PreconditionerConfig and GraftingConfig so it won't be easy for OSS to work on.

I think for now, would you mind to review my PR when I finish that? I could definitely credit you as the co-author of that PR because you have the same idea as I do.

Vishal-sys-code · 2025-08-17T16:49:02Z

Hi @tsunghsienlee no worries at all, and thanks for clarifying!

That sounds like a solid plan. I’d be glad to review your PR once it’s ready, and appreciate your kind offer to credit me as a co-author. Honestly, I’m just happy if my work helped spark or support the direction you’re already moving in.

Looking forward to your changes, please tag me when the PR is up, and I’ll give it a careful review. Thanks again for taking the time to explain the design decisions!

tsunghsienlee · 2025-08-28T23:07:43Z

Close this one because #242 covered it.

Vishal-sys-code added 2 commits August 8, 2025 02:08

feat: Implement Muon Optimizer Grafting

d07ad6d

Merge branch 'facebookresearch:main' into main

3897317

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025

tsunghsienlee mentioned this pull request Aug 28, 2025

Merge GraftingConfig into PreconditionerConfig #242

Closed

tsunghsienlee closed this Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Muon Optimizer Grafting #227

feat: Implement Muon Optimizer Grafting #227

Uh oh!

Vishal-sys-code commented Aug 7, 2025

Uh oh!

tsunghsienlee commented Aug 10, 2025

Uh oh!

Vishal-sys-code commented Aug 11, 2025

Uh oh!

tsunghsienlee commented Aug 16, 2025

Uh oh!

Vishal-sys-code commented Aug 17, 2025 •

edited

Loading

Uh oh!

tsunghsienlee commented Aug 28, 2025

Uh oh!

Uh oh!

feat: Implement Muon Optimizer Grafting #227

feat: Implement Muon Optimizer Grafting #227

Uh oh!

Conversation

Vishal-sys-code commented Aug 7, 2025

Uh oh!

tsunghsienlee commented Aug 10, 2025

Uh oh!

Vishal-sys-code commented Aug 11, 2025

Uh oh!

tsunghsienlee commented Aug 16, 2025

Uh oh!

Vishal-sys-code commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tsunghsienlee commented Aug 28, 2025

Uh oh!

Uh oh!

Vishal-sys-code commented Aug 17, 2025 •

edited

Loading