Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Oct 29, 2025

Stack from ghstack (oldest at bottom):

This is the recommended way to get the nD mesh now that DeviceMesh has _concatenate().

Squash and Merge button won't work for this PR. I'll merge by myself.

fegin added 2 commits October 28, 2025 17:36
[ghstack-poisoned]
[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fegin added 2 commits October 28, 2025 22:28
[ghstack-poisoned]
[ghstack-poisoned]
ruisizhang123 pushed a commit that referenced this pull request Oct 29, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0)
(oldest at bottom):
* #1960
* #1959
* __->__ #1963

pytorch/pytorch#166130 changes the configs and
this PR adopts the new configs

Squash and Merge button won't work for this PR. I'll merge by myself.
fegin added 2 commits October 29, 2025 10:14
[ghstack-poisoned]
[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 29, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0)
(oldest at bottom):
* #1960
* #1959
* __->__ #1965

**Squash and Merge button won't work for this PR. I'll merge by
myself.**

#1963 was accdientally merge
with Squash and Merge button. This is a reland PR.
Copy link

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

assert inner_mesh.mesh_dim_names is not None
submesh_names = outer_mesh.mesh_dim_names + inner_mesh.mesh_dim_names
spanned_mesh = outer_global_mesh[submesh_names]
spanned_mesh = DeviceMesh._concatenate((outer_mesh, inner_mesh))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

One nit: isn't the argument supposed to be a list, and not a tuple?

If so, how come there is no type checking or other linting to catch this?

Note that we've already observed TorchTitan being somewhat incorrect with types, e.g., it passes lists to init_device_mesh instead of tuples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I believe we have not enabled type checking for TorchTitan, which we should.

[ghstack-poisoned]
@fegin fegin changed the base branch from gh/fegin/19/base to main October 30, 2025 17:22
@fegin fegin merged commit bc3021e into main Oct 30, 2025
5 checks passed
fegin added a commit that referenced this pull request Oct 30, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0)
(oldest at bottom):
* __->__ #1960
* #1959

As title, no logic change.

**Squash and Merge button won't work for this PR. I'll merge by
myself.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants