Skip to content

Enable Changing the # of shards for CW resharding #3188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

isururanawaka
Copy link
Contributor

Summary:
Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:

E.g.

  • table_0 originally sharded on ranks: [0, 1]
  • Reshard API currently supports moving table_0 shards to ranks [1, 2].
    • Where the shard on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:

  • e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
  • Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table

  • e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 14, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:
Pull Request resolved: pytorch#3188

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:
Pull Request resolved: pytorch#3188

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 15, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 16, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 16, 2025
Summary:
Pull Request resolved: pytorch#3188

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
isururanawaka added a commit to isururanawaka/torchrec that referenced this pull request Jul 16, 2025
Summary:

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
Summary:
Pull Request resolved: pytorch#3188

Currently Dynamic Sharding assumes the # of shards per embedding table stays the same:
- https://www.internalfb.com/code/fbsource/[6d270632037a1e8bca7f63500dd07fd0b213e572]/fbcode/torchrec/distributed/sharding/dynamic_sharding.py?lines=140

E.g.
- `table_0` originally sharded on ranks: [0, 1]
- Reshard API currently supports moving `table_0` shards to ranks [1, 2].
    - Where `the shard` on rank 0 will move to rank 1, and the shard on rank 1 will move to rank 2

We want to support changing the # of shards:
- e.g. table_0 originally on ranks: [0, 1] --> reshard to [0]
- Or reshard to [0, 1, 2, 3]

Here's the unit test you can modify to check if your usecase passes:
- https://www.internalfb.com/code/fbsource/[4d0d74b9f3c441e7aa35ce7102200fa0ca8c95cf]/fbcode/torchrec/distributed/tests/test_dynamic_sharding.py?lines=453-459
- Basically change the new sharding plan to be a different # of ranks  than the original sharding plan.

Note: the new total number of ranks for each embedding table should be a factor of the dimension 0 of that embedding table
- e.g. emb_table size: [4, 8], this can only be sharded on 1, 2, or 4 ranks. not 3 ranks

Differential Revision: D78291717
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78291717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants