Skip to content

Qwen3 30A3B DPO sequence parallel eval_loop 报错 #6848

@zfj1998

Description

@zfj1998

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type dpo \
    --model ${MODEL} \
    --train_type full \
    --dataset ${dataset} \
    --load_from_cache_file true \
    --split_dataset_ratio 0.01 \
    --torch_dtype bfloat16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 160 \
    --save_steps 160 \
    --logging_steps 5 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 8 \
    --dataset_num_proc 8 \
    --save_total_limit 10 \
    --save_only_model true \
    --output_dir ${SAVE} \
    --deepspeed zero3 \
    --attn_impl flash_attn \
    --max_length 131072 \
    --use_liger_kernel true \
    --sequence_parallel_size 2

用上面的脚本训练,train loop都好好的,但是eval loop报了下面的错误

[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/cli/rlhf.py", line 7, in <module>
[rank0]:     rlhf_main()
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/rlhf.py", line 233, in rlhf_main
[rank0]:     return SwiftRLHF(args).main()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/base.py", line 49, in main
[rank0]:     result = self.run()
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/ray/base.py", line 170, in wrapper
[rank0]:     return func(self, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/sft.py", line 206, in run
[rank0]:     return self.train(trainer)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/sft.py", line 254, in train
[rank0]:     trainer.train(trainer.args.resume_from_checkpoint)
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/mixin.py", line 815, in train
[rank0]:     res = super().train(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2325, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2756, in _inner_training_loop
[rank0]:     self._maybe_log_save_evaluate(
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/mixin.py", line 888, in _maybe_log_save_evaluate
[rank0]:     super()._maybe_log_save_evaluate(tr_loss, *args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3221, in _maybe_log_save_evaluate
[rank0]:     metrics = self._evaluate(trial, ignore_keys_for_eval)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3170, in _evaluate
[rank0]:     metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 4489, in evaluate
[rank0]:     output = eval_loop(
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1919, in evaluation_loop
[rank0]:     initial_output = super().evaluation_loop(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 4685, in evaluation_loop
[rank0]:     losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
[rank0]:                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/dpo_trainer.py", line 185, in prediction_step
[rank0]:     return super().prediction_step(model, inputs, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1846, in prediction_step
[rank0]:     loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="eval")
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1684, in get_batch_loss_metrics
[rank0]:     model_output = self.concatenated_forward(model, batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/dpo_trainer.py", line 109, in concatenated_forward
[rank0]:     per_token_logps, mean_all_logits, loss_mask = self.get_per_token_logps(
[rank0]:                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/rlhf_mixin.py", line 139, in get_per_token_logps
[rank0]:     raise ValueError(f'Logits (batch and sequence length dim) {logits.shape[:-1]}'
[rank0]: ValueError: Logits (batch and sequence length dim) torch.Size([4, 2654])and labels must have the same shape {labels.shape}

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

cuda 12.2, python 3.11, GPU H20, torch 2.8, ms-swift 3.11.0.dev0

Additional context
Add any other context about the problem here(在这里补充其他信息)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions