Qwen3 30A3B DPO sequence parallel eval_loop 报错

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
```
NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type dpo \
    --model ${MODEL} \
    --train_type full \
    --dataset ${dataset} \
    --load_from_cache_file true \
    --split_dataset_ratio 0.01 \
    --torch_dtype bfloat16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 160 \
    --save_steps 160 \
    --logging_steps 5 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 8 \
    --dataset_num_proc 8 \
    --save_total_limit 10 \
    --save_only_model true \
    --output_dir ${SAVE} \
    --deepspeed zero3 \
    --attn_impl flash_attn \
    --max_length 131072 \
    --use_liger_kernel true \
    --sequence_parallel_size 2
```

用上面的脚本训练，train loop都好好的，但是eval loop报了下面的错误

```
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/cli/rlhf.py", line 7, in <module>
[rank0]:     rlhf_main()
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/rlhf.py", line 233, in rlhf_main
[rank0]:     return SwiftRLHF(args).main()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/base.py", line 49, in main
[rank0]:     result = self.run()
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/ray/base.py", line 170, in wrapper
[rank0]:     return func(self, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/sft.py", line 206, in run
[rank0]:     return self.train(trainer)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/llm/train/sft.py", line 254, in train
[rank0]:     trainer.train(trainer.args.resume_from_checkpoint)
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/mixin.py", line 815, in train
[rank0]:     res = super().train(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2325, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2756, in _inner_training_loop
[rank0]:     self._maybe_log_save_evaluate(
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/mixin.py", line 888, in _maybe_log_save_evaluate
[rank0]:     super()._maybe_log_save_evaluate(tr_loss, *args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3221, in _maybe_log_save_evaluate
[rank0]:     metrics = self._evaluate(trial, ignore_keys_for_eval)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3170, in _evaluate
[rank0]:     metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 4489, in evaluate
[rank0]:     output = eval_loop(
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1919, in evaluation_loop
[rank0]:     initial_output = super().evaluation_loop(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 4685, in evaluation_loop
[rank0]:     losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
[rank0]:                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/dpo_trainer.py", line 185, in prediction_step
[rank0]:     return super().prediction_step(model, inputs, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1846, in prediction_step
[rank0]:     loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="eval")
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1684, in get_batch_loss_metrics
[rank0]:     model_output = self.concatenated_forward(model, batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/dpo_trainer.py", line 109, in concatenated_forward
[rank0]:     per_token_logps, mean_all_logits, loss_mask = self.get_per_token_logps(
[rank0]:                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/cpfs02/user/zhengyuxiang.zyx/ms-swift/swift/trainers/rlhf_trainer/rlhf_mixin.py", line 139, in get_per_token_logps
[rank0]:     raise ValueError(f'Logits (batch and sequence length dim) {logits.shape[:-1]}'
[rank0]: ValueError: Logits (batch and sequence length dim) torch.Size([4, 2654])and labels must have the same shape {labels.shape}
```

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

cuda 12.2, python 3.11, GPU H20, torch 2.8, ms-swift 3.11.0.dev0


**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 30A3B DPO sequence parallel eval_loop 报错 #6848

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3 30A3B DPO sequence parallel eval_loop 报错 #6848

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions