Skip to content

Commit 2c22314

Browse files
authored
Merge branch 'main' into regional-aot
2 parents ad48868 + e5bea03 commit 2c22314

File tree

2 files changed

+23
-12
lines changed

2 files changed

+23
-12
lines changed

intermediate_source/dist_tuto.rst

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -470,9 +470,10 @@ Communication Backends
470470

471471
One of the most elegant aspects of ``torch.distributed`` is its ability
472472
to abstract and build on top of different backends. As mentioned before,
473-
there are multiple backends implemented in PyTorch.
474-
Some of the most popular ones are Gloo, NCCL, and MPI.
475-
They each have different specifications and tradeoffs, depending
473+
there are multiple backends implemented in PyTorch. These backends can be easily selected
474+
using the `Accelerator API <https://pytorch.org/docs/stable/torch.html#accelerators>`__,
475+
which provides a interface for working with different accelerator types.
476+
Some of the most popular backends are Gloo, NCCL, and MPI. They each have different specifications and tradeoffs, depending
476477
on the desired use case. A comparative table of supported functions can
477478
be found
478479
`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed>`__.
@@ -492,12 +493,13 @@ distributed SGD example does not work if you put ``model`` on the GPU.
492493
In order to use multiple GPUs, let us also make the following
493494
modifications:
494495

495-
1. Use ``device = torch.device("cuda:{}".format(rank))``
496-
2. ``model = Net()`` :math:`\rightarrow` ``model = Net().to(device)``
497-
3. Use ``data, target = data.to(device), target.to(device)``
496+
1. Use Accelerator API ``device_type = torch.accelerator.current_accelerator()``
497+
2. Use ``torch.device(f"{device_type}:{rank}")``
498+
3. ``model = Net()`` :math:`\rightarrow` ``model = Net().to(device)``
499+
4. Use ``data, target = data.to(device), target.to(device)``
498500

499-
With the above modifications, our model is now training on two GPUs and
500-
you can monitor their utilization with ``watch nvidia-smi``.
501+
With these modifications, your model will now train across two GPUs.
502+
You can monitor GPU utilization using ``watch nvidia-smi`` if you are running on NVIDIA hardware.
501503

502504
**MPI Backend**
503505

@@ -553,6 +555,7 @@ more <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
553555
Doing so, you should obtain the same familiar output as with the other
554556
communication backends.
555557

558+
556559
**NCCL Backend**
557560

558561
The `NCCL backend <https://github.com/nvidia/nccl>`__ provides an
@@ -561,6 +564,14 @@ tensors. If you only use CUDA tensors for your collective operations,
561564
consider using this backend for the best in class performance. The
562565
NCCL backend is included in the pre-built binaries with CUDA support.
563566

567+
**XCCL Backend**
568+
569+
The `XCCL backend` offers an optimized implementation of collective operations for XPU tensors.
570+
If your workload uses only XPU tensors for collective operations,
571+
this backend provides best-in-class performance.
572+
The XCCL backend is included in the pre-built binaries with XPU support.
573+
574+
564575
Initialization Methods
565576
~~~~~~~~~~~~~~~~~~~~~~
566577

intermediate_source/torchvision_tutorial.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -406,14 +406,14 @@ def get_transform(train):
406406

407407

408408
######################################################################
409-
# Let’s now write the main function which performs the training and the
410-
# validation:
409+
# We want to be able to train our model on an `accelerator <https://pytorch.org/docs/stable/torch.html#accelerators>`_
410+
# such as CUDA, MPS, MTIA, or XPU. Let’s now write the main function which performs the training and the validation:
411411

412412

413413
from engine import train_one_epoch, evaluate
414414

415-
# train on the GPU or on the CPU, if a GPU is not available
416-
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
415+
# train on the accelerator or on the CPU, if an accelerator is not available
416+
device = torch.accelerator.current_accelerator() if torch.accelerator.is_available() else torch.device('cpu')
417417

418418
# our dataset has two classes only - background and person
419419
num_classes = 2

0 commit comments

Comments
 (0)