Skip to content

Commit fc9c09b

Browse files
Tom-Zhengnv-kkudrynski
authored andcommitted
[ResNet/Paddle] Add CUDNNv8 ResUnit fusion
1 parent 2693c63 commit fc9c09b

File tree

5 files changed

+13
-6
lines changed

5 files changed

+13
-6
lines changed

PaddlePaddle/Classification/RN50v1.5/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/paddlepaddle:23.02-py3
1+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/paddlepaddle:23.06-py3
22
FROM ${FROM_IMAGE_NAME}
33

44
ADD requirements.txt /workspace/

PaddlePaddle/Classification/RN50v1.5/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ Example:
303303
bash scripts/training/train_resnet50_TF32_90E_DGXA100.sh
304304

305305
# For AMP and 8 GPUs training in 90 epochs
306-
bash scripts/training/train_resnet50_TF32_90E_DGXA100.sh
306+
bash scripts/training/train_resnet50_AMP_90E_DGXA100.sh
307307
```
308308

309309
Or you can manually launch training by `paddle.distributed.launch`. `paddle.distributed.launch` is a built-in module in PaddlePaddle that spawns up multiple distributed training processes on each of the training nodes.
@@ -497,6 +497,7 @@ Advanced Training:
497497
--use-dynamic-loss-scaling
498498
Enable dynamic loss scaling in AMP training, only be applied when --amp is set. (default: False)
499499
--use-pure-fp16 Enable pure FP16 training, only be applied when --amp is set. (default: False)
500+
--fuse-resunit Enable CUDNNv8 ResUnit fusion, only be applied when --amp is set. (default: False)
500501
--asp Enable automatic sparse training (ASP). (default: False)
501502
--prune-model Prune model to 2:4 sparse pattern, only be applied when --asp is set. (default: False)
502503
--mask-algo {mask_1d,mask_2d_greedy,mask_2d_best}
@@ -827,8 +828,8 @@ To achieve these same results, follow the steps in the [Quick Start Guide](#quic
827828

828829
| **GPUs** | **Throughput - TF32** | **Throughput - mixed precision** | **Throughput speedup (TF32 to mixed precision)** | **TF32 Scaling** | **Mixed Precision Scaling** | **Mixed Precision Training Time (90E)** | **TF32 Training Time (90E)** |
829830
|:--------:|:------------:|:-------------:|:------------:|:------:|:--------:|:--------:|:--------:|
830-
| 1 | 993 img/s | 2711 img/s | 2.73 x | 1.0 x | 1.0 x | ~13 hours| ~40 hours|
831-
| 8 | 7955 img/s | 20267 img/s | 2.54 x | 8.01 x | 7.47 x | ~2 hours | ~4 hours |
831+
| 1 | 1024 img/s | 2897 img/s | 2.83 x | 1.0 x | 1.0 x | ~13 hours| ~40 hours|
832+
| 8 | 8013 img/s | 23874 img/s | 2.98 x | 7.83 x | 8.24 x | ~2 hours | ~4 hours |
832833

833834
##### Training performance of Automatic SParsity: NVIDIA DGX A100 (8x A100 80GB)
834835
| **GPUs** | **Throughput - mixed precision** | **Throughput - mixed precision+ASP** | **Overhead** |

PaddlePaddle/Classification/RN50v1.5/program.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,8 @@ def create_strategy(args, is_train=True):
143143
build_strategy.fuse_elewise_add_act_ops = True
144144
build_strategy.fuse_bn_add_act_ops = True
145145
build_strategy.enable_addto = True
146+
if args.fuse_resunit and is_train:
147+
build_strategy.fuse_resunit = True
146148

147149
return build_strategy, exec_strategy
148150

PaddlePaddle/Classification/RN50v1.5/scripts/training/train_resnet50_AMP_90E_DGXA100.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,5 @@ python -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 train.py \
1717
--amp \
1818
--scale-loss 128.0 \
1919
--use-dynamic-loss-scaling \
20-
--data-layout NHWC
20+
--data-layout NHWC \
21+
--fuse-resunit

PaddlePaddle/Classification/RN50v1.5/utils/config.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,10 @@ def add_advance_args(parser):
276276
'--use-pure-fp16',
277277
action='store_true',
278278
help='Enable pure FP16 training, only be applied when --amp is set.')
279-
279+
group.add_argument(
280+
'--fuse-resunit',
281+
action='store_true',
282+
help='Enable CUDNNv8 ResUnit fusion, only be applied when --amp is set.')
280283
# ASP
281284
group.add_argument(
282285
'--asp',

0 commit comments

Comments
 (0)