Skip to content

[core][compiled graph] Support one-to-all collective ops (e.g. broadcast) #49325

@jeffreyjeffreywang

Description

@jeffreyjeffreywang

Description

As part of the effort (meta-issue: #47983) to support collective communication ops, we need to support one-to-all (broadcast) patterns. As discussed in the RFC, we will pass the sender worker handle to the collective call as follows:

workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
# One-to-all pattern.
with InputNode() as inp:
  result = workers[0].fwd.bind(inp)
  results = ray.collective.broadcast.bind(
    result, workers,
    transport=nccl_group_handle)
  dag = MultiOutputNode(results)

# Errors if `broadcast` sender is not part of the group.
dag = dag.experimental_compile()

Use case

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weekscommunity-backlogcompiled-graphscoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions