@@ -470,9 +470,10 @@ Communication Backends
470470
471471One of the most elegant aspects of ``torch.distributed `` is its ability
472472to abstract and build on top of different backends. As mentioned before,
473- there are multiple backends implemented in PyTorch.
474- Some of the most popular ones are Gloo, NCCL, and MPI.
475- They each have different specifications and tradeoffs, depending
473+ there are multiple backends implemented in PyTorch. These backends can be easily selected
474+ using the `Accelerator API <https://pytorch.org/docs/stable/torch.html#accelerators >`__,
475+ which provides a interface for working with different accelerator types.
476+ Some of the most popular backends are Gloo, NCCL, and MPI. They each have different specifications and tradeoffs, depending
476477on the desired use case. A comparative table of supported functions can
477478be found
478479`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed >`__.
@@ -492,12 +493,13 @@ distributed SGD example does not work if you put ``model`` on the GPU.
492493In order to use multiple GPUs, let us also make the following
493494modifications:
494495
495- 1. Use ``device = torch.device("cuda:{}".format(rank)) ``
496- 2. ``model = Net() `` :math: `\rightarrow ` ``model = Net().to(device) ``
497- 3. Use ``data, target = data.to(device), target.to(device) ``
496+ 1. Use Accelerator API ``device_type = torch.accelerator.current_accelerator() ``
497+ 2. Use ``torch.device(f"{device_type}:{rank}") ``
498+ 3. ``model = Net() `` :math: `\rightarrow ` ``model = Net().to(device) ``
499+ 4. Use ``data, target = data.to(device), target.to(device) ``
498500
499- With the above modifications, our model is now training on two GPUs and
500- you can monitor their utilization with ``watch nvidia-smi ``.
501+ With these modifications, your model will now train across two GPUs.
502+ You can monitor GPU utilization using ``watch nvidia-smi `` if you are running on NVIDIA hardware .
501503
502504**MPI Backend **
503505
@@ -553,6 +555,7 @@ more <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
553555Doing so, you should obtain the same familiar output as with the other
554556communication backends.
555557
558+
556559**NCCL Backend **
557560
558561The `NCCL backend <https://github.com/nvidia/nccl >`__ provides an
@@ -561,6 +564,14 @@ tensors. If you only use CUDA tensors for your collective operations,
561564consider using this backend for the best in class performance. The
562565NCCL backend is included in the pre-built binaries with CUDA support.
563566
567+ **XCCL Backend **
568+
569+ The `XCCL backend ` offers an optimized implementation of collective operations for XPU tensors.
570+ If your workload uses only XPU tensors for collective operations,
571+ this backend provides best-in-class performance.
572+ The XCCL backend is included in the pre-built binaries with XPU support.
573+
574+
564575Initialization Methods
565576~~~~~~~~~~~~~~~~~~~~~~
566577
0 commit comments