Skip to content

Commit efe11eb

Browse files
advanced/rpc_ddp_tutorial ๋ฒˆ์—ญ (#382)
* advanced/rpc_ddp_tutorial-translation Translate rpc_ddp_tutorial.rst and main.py * Update rpc_ddp_tutorial.rst typo correction from line133 * Update rpc_ddp_tutorial.rst Modified the translation based on reviews. * Update main.py Incomplete _considering about how to solve the 'optimizer' * Update rpc_ddp_tutorial.rst Incomplete * Update main.py Fix translation errors. * Update rpc_ddp_tutorial.rst Fix translation errors * Update main.py Fix a translation error * Update rpc_ddp_tutorial.rst Fix a translation error
1 parent 0c32b3d commit efe11eb

File tree

2 files changed

+144
-161
lines changed

2 files changed

+144
-161
lines changed
Lines changed: 108 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,160 +1,144 @@
1-
Combining Distributed DataParallel with Distributed RPC Framework
1+
๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DDP)๊ณผ ๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ๊ฒฐํ•ฉ
22
=================================================================
3-
**Authors**: `Pritam Damania <https://github.com/pritamdamania87>`_ and `Yi Wang <https://github.com/SciPioneer>`_
4-
5-
6-
This tutorial uses a simple example to demonstrate how you can combine
7-
`DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)
8-
with the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
9-
to combine distributed data parallelism with distributed model parallelism to
10-
train a simple model. Source code of the example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.
11-
12-
Previous tutorials,
13-
`Getting Started With Distributed Data Parallel <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__
14-
and `Getting Started with Distributed RPC Framework <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__,
15-
described how to perform distributed data parallel and distributed model
16-
parallel training respectively. Although, there are several training paradigms
17-
where you might want to combine these two techniques. For example:
18-
19-
1) If we have a model with a sparse part (large embedding table) and a dense
20-
part (FC layers), we might want to put the embedding table on a parameter
21-
server and replicate the FC layer across multiple trainers using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
22-
The `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
23-
can be used to perform embedding lookups on the parameter server.
24-
2) Enable hybrid parallelism as described in the `PipeDream <https://arxiv.org/abs/1806.03377>`__ paper.
25-
We can use the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
26-
to pipeline stages of the model across multiple workers and replicate each
27-
stage (if needed) using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
3+
**์ €์ž**: `Pritam Damania <https://github.com/pritamdamania87>`__ and `Yi Wang <https://github.com/SciPioneer>`__
4+
5+
**๋ฒˆ์—ญ**: `๋ฐ•๋‹ค์ • <https://github.com/dajeongPark-dev>`_
6+
7+
8+
์ด ํŠœํ† ๋ฆฌ์–ผ์€ ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(distributed data parallelism)์™€
9+
๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(distributed model parallelism)๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ ํ•™์Šต์‹œํ‚ฌ ๋•Œ
10+
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DistributedDataParallel) <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)๊ณผ
11+
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ(Distributed RPC framework) <https://pytorch.org/docs/master/rpc.html>`__๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
12+
์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
13+
14+
์ด์ „ ํŠœํ† ๋ฆฌ์–ผ ๋‚ด์šฉ์ด์—ˆ๋˜
15+
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__์™€
16+
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__๋Š”
17+
๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐ ๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ํ•™์Šต์„ ๊ฐ๊ฐ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
18+
๊ทธ๋Ÿฌ๋‚˜ ์ด ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์„ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:
19+
20+
1) ํฌ์†Œ ๋ถ€๋ถ„(ํฐ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”)๊ณผ ๋ฐ€์ง‘ ๋ถ€๋ถ„(FC ๋ ˆ์ด์–ด)์ด ์žˆ๋Š” ๋ชจ๋ธ์ด ์žˆ๋Š” ๊ฒฝ์šฐ,
21+
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„(parameter server)์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(embedding table)์„ ๋†“๊ณ  `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
22+
์—ฌ๋Ÿฌ ํŠธ๋ ˆ์ด๋„ˆ์— ๊ฑธ์ณ FC ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์„ ์›ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
23+
์ด๋•Œ `๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__๋Š”
24+
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
25+
2) ๋‹ค์Œ์€ `PipeDream <https://arxiv.org/abs/1806.03377>`__ ๋ฌธ์„œ์—์„œ ์„ค๋ช…๋œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์ž…๋‹ˆ๋‹ค.
26+
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ
27+
์—ฌ๋Ÿฌ worker์— ๊ฑธ์ณ ๋ชจ๋ธ์˜ ๋‹จ๊ณ„๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ(pipeline)ํ•  ์ˆ˜ ์žˆ๊ณ 
28+
(ํ•„์š”์— ๋”ฐ๋ผ) `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์ด์šฉํ•ด์„œ
29+
๊ฐ ๋‹จ๊ณ„๋ฅผ ๋ณต์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2830

2931
|
30-
In this tutorial we will cover case 1 mentioned above. We have a total of 4
31-
workers in our setup as follows:
32+
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ์ฒซ ๋ฒˆ์งธ ๊ฒฝ์šฐ๋ฅผ ๋‹ค๋ฃฐ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
33+
๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด 4๊ฐœ์˜ worker๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:
3234

3335

34-
1) 1 Master, which is responsible for creating an embedding table
35-
(nn.EmbeddingBag) on the parameter server. The master also drives the
36-
training loop on the two trainers.
37-
2) 1 Parameter Server, which basically holds the embedding table in memory and
38-
responds to RPCs from the Master and Trainers.
39-
3) 2 Trainers, which store an FC layer (nn.Linear) which is replicated amongst
40-
themselves using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
41-
The trainers are also responsible for executing the forward pass, backward
42-
pass and optimizer step.
36+
1) 1๊ฐœ์˜ ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(nn.EmbeddingBag) ์ƒ์„ฑ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
37+
๋˜ํ•œ ๋งˆ์Šคํ„ฐ๋Š” ๋‘ ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
38+
2) 1๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋ณด์œ ํ•˜๊ณ  ๋งˆ์Šคํ„ฐ ๋ฐ ํŠธ๋ ˆ์ด๋„ˆ์˜ RPC์— ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
39+
3) 2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ๋Š” `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„
40+
์‚ฌ์šฉํ•˜์—ฌ ์ž์ฒด์ ์œผ๋กœ ๋ณต์ œ๋˜๋Š” FC ๋ ˆ์ด์–ด(nn.Linear)๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
41+
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋˜ํ•œ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ(forward pass), ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ(backward pass) ๋ฐ ์ตœ์ ํ™” ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
4342

4443
|
45-
The entire training process is executed as follows:
46-
47-
1) The master creates a `RemoteModule <https://pytorch.org/docs/master/rpc.html#remotemodule>`__
48-
that holds an embedding table on the Parameter Server.
49-
2) The master, then kicks off the training loop on the trainers and passes the
50-
remote module to the trainers.
51-
3) The trainers create a ``HybridModel`` which first performs an embedding lookup
52-
using the remote module provided by the master and then executes the
53-
FC layer which is wrapped inside DDP.
54-
4) The trainer executes the forward pass of the model and uses the loss to
55-
execute the backward pass using `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__.
56-
5) As part of the backward pass, the gradients for the FC layer are computed
57-
first and synced to all trainers via allreduce in DDP.
58-
6) Next, Distributed Autograd propagates the gradients to the parameter server,
59-
where the gradients for the embedding table are updated.
60-
7) Finally, the `Distributed Optimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__ is used to update all the parameters.
61-
62-
63-
.. attention::
64-
65-
You should always use `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__
66-
for the backward pass if you're combining DDP and RPC.
67-
68-
69-
Now, let's go through each part in detail. Firstly, we need to setup all of our
70-
workers before we can perform any training. We create 4 processes such that
71-
ranks 0 and 1 are our trainers, rank 2 is the master and rank 3 is the
72-
parameter server.
73-
74-
We initialize the RPC framework on all 4 workers using the TCP init_method.
75-
Once RPC initialization is done, the master creates a remote module that holds an `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__
76-
layer on the Parameter Server using `RemoteModule <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule>`__.
77-
The master then loops through each trainer and kicks off the training loop by
78-
calling ``_run_trainer`` on each trainer using `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__.
79-
Finally, the master waits for all training to finish before exiting.
80-
81-
The trainers first initialize a ``ProcessGroup`` for DDP with world_size=2
82-
(for two trainers) using `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__.
83-
Next, they initialize the RPC framework using the TCP init_method. Note that
84-
the ports are different in RPC initialization and ProcessGroup initialization.
85-
This is to avoid port conflicts between initialization of both frameworks.
86-
Once the initialization is done, the trainers just wait for the ``_run_trainer``
87-
RPC from the master.
88-
89-
The parameter server just initializes the RPC framework and waits for RPCs from
90-
the trainers and master.
44+
์ „์ฒด์ ์ธ ํ•™์Šต๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค:
45+
46+
1) ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋‹ด๊ณ  ์žˆ๋Š”
47+
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
48+
2) ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์›๊ฒฉ ๋ชจ๋“ˆ(remote module)์„ ํŠธ๋ ˆ์ด๋„ˆ์—๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
49+
3) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งˆ์Šคํ„ฐ์—์„œ ์ œ๊ณตํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
50+
์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•œ ๋‹ค์Œ DDP ๋‚ด๋ถ€์— ๊ฐ์‹ธ์ง„ FC ๋ ˆ์ด์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
51+
4) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๊ณ  ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ `๋ถ„์‚ฐ Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__๋ฅผ
52+
์‚ฌ์šฉํ•˜์—ฌ ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
53+
5) ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์˜ ์ผ๋ถ€๋กœ FC ๋ ˆ์ด์–ด์˜ ๋ณ€ํ™”๋„๊ฐ€ ๋จผ์ € ๊ณ„์‚ฐ๋˜๊ณ  DDP์˜ allreduce๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋™๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
54+
6) ๋‹ค์Œ์œผ๋กœ, ๋ถ„์‚ฐ Autograd๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋กœ ๋ณ€ํ™”๋„๋ฅผ ์ „ํŒŒํ•˜๊ณ  ๊ทธ๊ณณ์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ณ€ํ™”๋„๊ฐ€ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค.
55+
7) ๋งˆ์ง€๋ง‰์œผ๋กœ, `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
56+
57+
.. ์ฃผ์˜์‚ฌํ•ญ::
58+
59+
DDP์™€ RPC๋ฅผ ๊ฒฐํ•ฉํ•  ๋•Œ, ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์— ๋Œ€ํ•ด ํ•ญ์ƒ
60+
`๋ถ„์‚ฐ Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
61+
62+
63+
์ด์ œ ๊ฐ ๋ถ€๋ถ„์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
64+
๋จผ์ € ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ์ž‘์—…์ž๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
65+
์ˆœ์œ„ 0๊ณผ 1์€ ํŠธ๋ ˆ์ด๋„ˆ, ์ˆœ์œ„ 2๋Š” ๋งˆ์Šคํ„ฐ, ์ˆœ์œ„ 3์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์ธ 4๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
66+
67+
TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 4๊ฐœ์˜ ๋ชจ๋“  worker์—์„œ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
68+
RPC ์ดˆ๊ธฐํ™”๊ฐ€ ๋๋‚˜๋ฉด, ๋งˆ์Šคํ„ฐ๋Š” `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__ ๋ ˆ์ด์–ด๋ฅผ
69+
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
70+
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ๋‹ด๊ณ  ์žˆ๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ ํ•˜๋‚˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
71+
๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ๋ฅผ ๋ฐ˜๋ณตํ•˜๊ณ  `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__๋ฅผ
72+
์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ``_run_trainer``๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ฐ˜๋ณต ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
73+
๋งˆ์ง€๋ง‰์œผ๋กœ ๋งˆ์Šคํ„ฐ๋Š” ์ข…๋ฃŒํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ํ•™์Šต์ด ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
74+
75+
ํŠธ๋ ˆ์ด๋„ˆ๋Š” `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
76+
(2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ) world_size=2๋กœ DDP๋ฅผ ์œ„ํ•ด ``ProcessGroup``์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
77+
๋‹ค์Œ์œผ๋กœ TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
78+
์—ฌ๊ธฐ์„œ ์ฃผ์˜ ํ•  ์ ์€ RPC ์ดˆ๊ธฐํ™”์™€ ProgressGroup ์ดˆ๊ธฐํ™”์—์„œ ์“ฐ์ด๋Š” ํฌํŠธ(port)๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
79+
์ด๋Š” ๋‘ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ดˆ๊ธฐํ™” ๊ฐ„์— ํฌํŠธ ์ถฉ๋Œ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.
80+
์ดˆ๊ธฐํ™”๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋งˆ์Šคํ„ฐ์˜ ``_run_trainer` RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
81+
82+
ํŒŒ๋ผํ”ผํ„ฐ ์„œ๋ฒ„๋Š” RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋งˆ์Šคํ„ฐ์˜ RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
9183
9284
9385
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
9486
:language: py
9587
:start-after: BEGIN run_worker
9688
:end-before: END run_worker
9789
98-
Before we discuss details of the Trainer, let's introduce the ``HybridModel`` that
99-
the trainer uses. As described below, the ``HybridModel`` is initialized using a
100-
remote module that holds an embedding table (``remote_emb_module``) on the parameter server and the ``device``
101-
to use for DDP. The initialization of the model wraps an
102-
`nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__
103-
layer inside DDP to replicate and synchronize this layer across all trainers.
90+
ํŠธ๋ ˆ์ด๋„ˆ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์— ์•ž์„œ, ํŠธ๋ ˆ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ``HybridModel``์— ๋Œ€ํ•ด ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
91+
์•„๋ž˜์— ์„ค๋ช…๋œ ๋Œ€๋กœ ``HybridModel``์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(``remote_emb_module``)๊ณผ DDP์— ์‚ฌ์šฉํ•  ``device``๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
92+
๋ชจ๋ธ ์ดˆ๊ธฐํ™”๋Š” DDP ๋‚ด๋ถ€์˜ `nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__ ๋ ˆ์ด์–ด๋ฅผ
93+
๊ฐ์‹ธ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์ด ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๊ณ  ๋™๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
94+
10495
105-
The forward method of the model is pretty straightforward. It performs an
106-
embedding lookup on the parameter server using RemoteModule's ``forward``
107-
and passes its output onto the FC layer.
96+
๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ(forward) ํ•จ์ˆ˜๋Š” ๊ฝค ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค.
97+
RemoteModule์˜ ``forward``๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ์ถœ๋ ฅ์„ FC ๋ ˆ์ด์–ด์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
10898
10999
110100
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
111101
:language: py
112102
:start-after: BEGIN hybrid_model
113103
:end-before: END hybrid_model
114104
115-
Next, let's look at the setup on the Trainer. The trainer first creates the
116-
``HybridModel`` described above using a remote module that holds the embedding table on the
117-
parameter server and its own rank.
118-
119-
Now, we need to retrieve a list of RRefs to all the parameters that we would
120-
like to optimize with `DistributedOptimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__.
121-
To retrieve the parameters for the embedding table from the parameter server,
122-
we can call RemoteModule's `remote_parameters <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule.remote_parameters>`__,
123-
which basically walks through all the parameters for the embedding table and returns
124-
a list of RRefs. The trainer calls this method on the parameter server via RPC
125-
to receive a list of RRefs to the desired parameters. Since the
126-
DistributedOptimizer always takes a list of RRefs to parameters that need to
127-
be optimized, we need to create RRefs even for the local parameters for our
128-
FC layers. This is done by walking ``model.fc.parameters()``, creating an RRef for
129-
each parameter and appending it to the list returned from ``remote_parameters()``.
130-
Note that we cannnot use ``model.parameters()``,
131-
because it will recursively call ``model.remote_emb_module.parameters()``,
132-
which is not supported by ``RemoteModule``.
133-
134-
Finally, we create our DistributedOptimizer using all the RRefs and define a
135-
CrossEntropyLoss function.
105+
๋‹ค์Œ์œผ๋กœ ํŠธ๋ ˆ์ด๋„ˆ์˜ ์„ค์ •์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
106+
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”๊ณผ ์ž์ฒด ์ˆœ์œ„๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
107+
์œ„์—์„œ ์„ค๋ช…ํ•œ ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
108+
109+
์ด์ œ `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋กœ
110+
์ตœ์ ํ™”ํ•˜๋ ค๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
111+
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด
112+
RemoteModule์˜ `remote_parameters <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule.remote_parameters>`__๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
113+
๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ดํŽด๋ณด๊ณ  RRef ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
114+
ํŠธ๋ ˆ์ด๋„ˆ๋Š” RPC๋ฅผ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ด ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์›ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
115+
DistributedOptimizer๋Š” ํ•ญ์ƒ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ๋•Œ๋ฌธ์— FC ๋ ˆ์ด์–ด์˜ ์ „์—ญ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ๋„ RRef๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
116+
์ด๊ฒƒ์€ ``model.fc.parameters()``๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
117+
``remote_parameters()``์—์„œ ๋ฐ˜ํ™˜๋œ ๋ชฉ๋ก์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
118+
์ฐธ๊ณ ๋กœ ``model.parameters()``๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ``RemoteModule``์—์„œ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ``model.remote_emb_module.parameters()``๋ฅผ ์žฌ๊ท€์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
119+
120+
๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  RRef๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DistributedOptimizer๋ฅผ ๋งŒ๋“ค๊ณ  CrossEntropyLoss ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
136121
137122
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
138123
:language: py
139124
:start-after: BEGIN setup_trainer
140125
:end-before: END setup_trainer
141126
142-
Now we're ready to introduce the main training loop that is run on each trainer.
143-
``get_next_batch`` is just a helper function to generate random inputs and
144-
targets for training. We run the training loop for multiple epochs and for each
145-
batch:
127+
์ด์ œ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ธฐ๋ณธ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์†Œ๊ฐœํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
128+
``get_next_batch``๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ž„์˜์˜ ์ž…๋ ฅ๊ณผ ๋Œ€์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๋Š” ํ•จ์ˆ˜์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.
129+
์—ฌ๋Ÿฌ ์—ํญ(epoch)๊ณผ ๊ฐ ๋ฐฐ์น˜(batch)์— ๋Œ€ํ•ด ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
146130
147-
1) Setup a `Distributed Autograd Context <https://pytorch.org/docs/master/rpc.html#torch.distributed.autograd.context>`__
148-
for Distributed Autograd.
149-
2) Run the forward pass of the model and retrieve its output.
150-
3) Compute the loss based on our outputs and targets using the loss function.
151-
4) Use Distributed Autograd to execute a distributed backward pass using the loss.
152-
5) Finally, run a Distributed Optimizer step to optimize all the parameters.
131+
1) ๋จผ์ € ๋ถ„์‚ฐ Autograd์— ๋Œ€ํ•ด
132+
`๋ถ„์‚ฐ Autograd Context <https://pytorch.org/docs/master/rpc.html#torch.distributed.autograd.context>`__๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
133+
2) ๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๊ณ  ํ•ด๋‹น ์ถœ๋ ฅ์„ ๊ฒ€์ƒ‰(retrieve)ํ•ฉ๋‹ˆ๋‹ค.
134+
3) ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ๋ ฅ๊ณผ ๋ชฉํ‘œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
135+
4) ๋ถ„์‚ฐ Autograd๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์‚ฐ ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
136+
5) ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
153137
154138
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
155139
:language: py
156140
:start-after: BEGIN run_trainer
157141
:end-before: END run_trainer
158142
.. code:: python
159143
160-
Source code for the entire example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.
144+
์ „์ฒด ์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

0 commit comments

Comments
ย (0)