modelscope
diff --git a/‎README.md‎
Lines changed: 3 additions & 1 deletion b/‎README.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎README_CN.md‎
Lines changed: 2 additions & 0 deletions b/‎README_CN.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/Instruction/ray的支持.md‎
Lines changed: 150 additions & 0 deletions b/‎docs/source/Instruction/ray的支持.md‎
Lines changed: 150 additions & 0 deletions
diff --git a/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 31 additions & 13 deletions b/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 31 additions & 13 deletions
diff --git a/‎docs/source/Instruction/强化微调.md‎
Lines changed: 1 addition & 11 deletions b/‎docs/source/Instruction/强化微调.md‎
Lines changed: 1 addition & 11 deletions
diff --git a/‎docs/source_en/Instruction/Command-line-parameters.md‎
Lines changed: 32 additions & 12 deletions b/‎docs/source_en/Instruction/Command-line-parameters.md‎
Lines changed: 32 additions & 12 deletions
@@ -75,8 +75,10 @@ You can contact us and communicate with us by adding our group:
 
 
 ## 🎉 News
+- 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
+- 🎁 2025.10.28: Support [use yaml](examples/yaml) to configure command line parameters.
 - 🎁 2025.09.29: Support padding_free for embedding/reranker/seq_cls tasks, use `--padding_free true --task_type embedding/reranker/generative_reranker/seq_cls` to begin!
-- 🎁 2025.09.07: Added support for CHORD training algorithm. See the [documentation](./docs/source_en/Instruction/GRPO/AdvancedResearch/CHORD.md)
+- 🎁 2025.09.07: Added support for CHORD training algorithm. See the [documentation](./docs/source_en/Instruction/GRPO/AdvancedResearch/CHORD.md).
 - 🎁 2025.09.06: Ulysses can now be used with ring-attention, allowing sequences to be sharded into any number of chunks (no longer limited by the number of heads). The argument remains `--sequence_parallel_size N`.
 - 🎁 2025.09.02: Megatron-SWIFT now supports multimodal model training. Documentation can be found [here](./docs/source_en/Megatron-SWIFT/Multimodal-Model.md).
 - 🎁 2025.08.12: Support [Dynamic Fine-Tuning](https://arxiv.org/abs/2508.05629)(DFT) in SFT training, use parameter `--enable_dft_loss true`. Training scripts can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/dft.sh).
 
@@ -71,6 +71,8 @@
 - **模型量化**：支持AWQ、GPTQ、FP8和BNB的量化导出，导出的模型支持使用vLLM/SGLang/LmDeploy推理加速，并支持继续训练。
 
 ## 🎉 新闻
+- 🎁 2025.10.28: Ray [已支持](docs/source/Instruction/ray的支持.md)。
+- 🎁 2025.10.28: 已支持[使用yaml](examples/yaml)配置命令行参数。
 - 🎁 2025.09.29: 支持embedding/reranker/seq_cls任务的padding_free参数, 使用`--padding_free true --task_type embedding/reranker/generative_reranker/seq_cls`开始训练!
 - 🎁 2025.09.07: 支持CHORD训练算法，请查看[文档](docs/source/Instruction/GRPO/AdvancedResearch/CHORD.md)。
 - 🎁 2025.09.06: Ulysses现已支持与ring-attention结合使用，使得输入序列可以被切分成任意数量的块（不再受限于num_heads），命令参数仍然是`--sequence_parallel_size N`。
 
@@ -0,0 +1,150 @@
+# ray的支持
+
+SWIFT已经支持使用ray来进行多卡或多节点训练。已有功能中对ray的支持情况如下：
+
+| 功能       | 支持ray | 例子                                                                             | 可分配角色           |
+|----------|-------|--------------------------------------------------------------------------------|-----------------|
+| pt/sft   | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-node/ray | default         |
+| dpo      | ❎     |                                                                                |                 |
+| grpo     | ❎     |                                                                                |                 |
+| ppo      | ❎     |                                                                                |                 |
+| megatron | ❎     |                                                                                |                 |
+| sampling | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/sampler/distill      | sampler/prm/orm |
+| distill  | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/sampler/sample       | sampler/prm/orm |
+
+## 技术细节
+
+在叙述参数设置之前，我们有必要先行讲一下技术细节。由于SWIFT的内部当前使用了大量transformers和trl的已有实现，像veRL或ROLL一样拆解为不同的ray角色是不现实的，而且拆解后会以ray为中心，对非ray的场景的支持会不良。
+因此SWIFT采取了装饰器为主的技术方案，以函数级别定义了不同角色，这些角色可以在参数中被定义如何使用。看下面的例子：
+
+```python
+from swift.ray import RayHelper
+
+@RayHelper.worker(group=['model1', 'model2'])
+class MyTrainer:
+
+    def __init__(self, args):
+        self._prepare_model1()
+        self._prepare_model2()
+        self._prepare_datasets()
+
+    @RayHelper.function(group='model1')
+    def _prepare_model1(self):
+        ...
+
+    @RayHelper.function(group='model2')
+    def _prepare_model2(self):
+        ...
+
+    @RayHelper.function(group='model1')
+    def rollout(self, inputs):
+        return self.model1.generate(inputs)
+
+    @RayHelper.function(group='model2')
+    def forward_model2(self, inputs):
+        loss = self.model2.forward(inputs)
+        loss.backward()
+
+    def _prepare_datasets(self):
+        self.dataset = ...
+
+    def train(self):
+        for batch in DataLoader(self.dataset):
+            generated = self.rollout(batch)
+            self.forward_model2(generated)
+            ...
+
+
+if __name__ == '__main__':
+    ...
+    MyTrainer(args).train()
+```
+
+RayHelper会将被装饰的方法分配到不同的硬件集群中，本地调用会被平滑转换到ray集群中进行远程调用。也可以以类为中心进行划分：
+
+```python
+
+@RayHelper.worker(group=['model1'])
+class Model1:
+    ...
+
+    @RayHelper.function(group='model1')
+    def rollout(self):
+        ...
+
+@RayHelper.worker(group=['model2'])
+class Model2:
+    ...
+
+    @RayHelper.function(group='model2')
+    def forward_and_optimize(self):
+        ...
+
+
+class Trainer:
+    ...
+```
+
+SWIFT对ray的支持本质上是使用@worker和@function两个注解的组合使用，worker指定ray集群的角色，function指定如何分配数据。
+
+function注解有额外的几个参数：
+```python
+    @staticmethod
+    def function(group: str,
+                 dispatch: Union[Literal['slice', 'all'], Callable] = 'all',
+                 execute: Literal['first', 'all'] = 'all',
+                 collect: Union[Literal['none', 'flatten'], Callable] = 'none'):
+```
+
+- dispatch: 如何分配调用入参
+  - slice：对入参切分，也就是worker负载均衡执行
+  - all：各个worker入参完全相同
+  - 自定义切分方式，格式为：
+    ```python
+        def my_custom_slice(n, i, data):
+            # n是worker数量，i是当前worker索引，data是原始入参
+            # 返回第i个的入参
+    ```
+- execute: 如何执行
+  - first: rank0执行，此时slice和Callable方式切分无效
+  - all: 全部执行
+
+- collect: 如何收集返回数据
+  - none：原样返回，格式为各个worker返回值的列表
+  - flatten: 将worker返回的结果进行拉平，支持tuple的拉平
+  - Callable: 自定义collect方式，格式为：
+    ```python
+        def my_custom_collect(result):
+            # result是各个worker返回的列表
+            # 输入你想要的格式
+    ```
+
+## 参数设置
+
+讲完技术细节后，可以将参数配置了。开发者可以根据不同的流程中的角色列表，设置不同的硬件搭配方式，例如采样功能中，共有三个角色，sampler、prm、orm，可以这样配置：
+
+```yaml
+device_groups:
+  nproc_per_node: 4
+  sample_group:
+    device: GPU
+    ranks: list(range(0, 2))
+    workers:
+      - sampler
+  rm_group:
+    device: GPU
+    ranks: list(range(2, 4))
+    workers:
+      - prm
+      - orm
+```
+
+- nproc_per_node: ray集群中需要的每个node的最小卡数。
+xxx_group: 每个ray组的名称，可以随意指定
+  - device: 设备类型，当前支持GPU/CPU等。
+  - ranks: 当前组分配到哪些ranks上。如果是CPU，ranks只能为整数，代表共需要多少进程，如果是GPU，可以为`[0,1,2,3]`, `4`, `list(range(0, 4))`等格式。
+  - workers: 哪些角色分配到当前组中。
+
+所有可用的角色可以见本文最上面的表。
+
+如果使用命令行，device_groups也可以以`--device_groups xxx`方式传入，xxx为jsonstring。为了配置的简便，我们强烈推荐使用yaml方式搭配ray使用。
@@ -142,6 +142,34 @@
 - bnb_4bit_use_double_quant: 是否使用双重量化，默认为`True`。
 - bnb_4bit_quant_storage: bnb量化存储类型，默认为None。
 
+### RAY参数
+
+- use_ray: boolean类型。是否使用ray，默认为`False`
+- ray_exp_name: ray实验名字，这个字段会用作cluster和worker名称前缀，可以不填
+- device_groups: 字符串（jsonstring）类型。在使用ray时，该字段必须配置，具体可以查看[ray文档](ray的支持.md)。
+
+### yaml支持
+
+- config: 可以使用config代替命令行参数，例如：
+
+```shell
+swift sft --config demo.yaml
+```
+
+demo.yaml的内容为具体命令行配置：
+
+```yaml
+# Model args
+model: Qwen/Qwen2.5-7B-Instruct
+dataset: swift/self-cognition
+...
+
+# Train args
+output_dir: xxx/xxx
+gradient_checkpointing: true
+
+...
+```
 
 ## 原子参数
 
@@ -681,13 +709,13 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
 
 - prm_model: 过程奖励模型的类型，可以是模型id（以pt方式拉起），或者plugin中定义的prm key（自定义推理过程）。
 - orm_model: 结果奖励模型的类型，通常是通配符或测试用例等，一般定义在plugin中。
-- sampler_type：采样类型，目前支持 sample, mcts，未来会支持 dvts。
+- sampler_type：采样类型，目前支持 sample, distill
 - sampler_engine：支持`pt`, `lmdeploy`, `vllm`, `client`, `no`，默认为`pt`，采样模型的推理引擎。
 - output_dir：输出目录，默认为`sample_output`。
 - output_file：输出文件名称，默认为`None`使用时间戳作为文件名。传入时不需要传入目录，仅支持jsonl格式。
 - override_exist_file：如`output_file`存在，是否覆盖。
-- num_sampling_per_gpu_batch_size：每次采样的batch_size。
-- num_sampling_per_gpu_batches：共采样多少batch。
+- num_sampling_batch_size：每次采样的batch_size。
+- num_sampling_batches：共采样多少batch。
 - n_best_to_keep：返回多少最佳sequences。
 - data_range：本采样处理数据集的分片。传入格式为`2 3`，代表数据集分为3份处理（这意味着通常有三个`swift sample`在并行处理），本实例正在处理第3个分片。
 - temperature：在这里默认为1.0。
@@ -698,16 +726,6 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
 - cache_files：为避免同时加载prm和generator造成显存OOM，可以分两步进行采样，第一步将prm和orm置为`None`，则所有结果都会输出到文件中，第二次运行采样将sampler_engine置为`no`并传入`--cache_files`为上次采样的输出文件，则会使用上次输出的结果进行prm和orm评估并输出最终结果。
   - 注意：使用cache_files时，`--dataset`仍然需要传入，这是因为cache_files的id是由原始数据计算的md5，需要把两部分信息结合使用。
 
-#### MCTS
-- rollout_depth：rollout 时的最大深度，默认为 `5`。
-- rollout_start_depth：开始 rollout 时的深度，低于此深度的节点只会进行 expand 操作，默认为 `3`。
-- max_iterations：mcts 的最大迭代次数，默认为 `100`。
-- process_reward_rate：select 中计算 value 时 process reward 占的比例，默认为 `0.0`，即不使用 PRM。
-- exploration_rate：UCT 算法中的探索参数，值越大越照顾探索次数较小的节点，默认为 `0.5`。
-- api_key：使用 client 作为推理引擎时需要，默认为 `EMPTY`。
-- base_url：使用 client 作为推理引擎时需要，默认为 'https://dashscope.aliyuncs.com/compatible-mode/v1'
-
-
 ## 特定模型参数
 除了以上参数外，有些模型还支持额外的具体模型参数。这些参数含义通常可以在对应模型官方repo或者其推理代码中找到相应含义。**ms-swift引入这些参数以确保训练的模型与官方推理代码效果对齐**。
 - 特定模型参数可以通过`--model_kwargs`或者环境变量进行设置，例如: `--model_kwargs '{"fps_max_frames": 12}'`或者`FPS_MAX_FRAMES=12`。
 
@@ -66,11 +66,7 @@ DeepSeek-R1使用了GRPO算法从零使base模型涌现CoT能力，该方法需
 
 SWIFT支持sample命令，该命令就是用于模型采样。目前支持的采样方式有：
 
-- do_sample：sample方式对模型进行采样，该方式支持对开源模型进行采样，后续会支持模型蒸馏
-  - sample方式后续会支持URL采样，用于大模型蒸馏
-
-- mcts：蒙特卡洛采样，该方式在PR中，后续会支持
-- dvts：调研中
+- sample：以generate方式对模型进行采样
 
 目前我们给出了一个较为通用的[RFT脚本](https://github.com/modelscope/ms-swift/tree/main/examples/train/rft/rft.py)。该脚本适用于自我提升方式的训练，且支持动态调整采样温度值、PRM阈值等超参数，并且训练方式灵活可变（微调、DPO等；或者每次迭代重新训练原模型或继续训练上个迭代的模型，甚至加载上个迭代的所有训练状态等）。开发者可以在该脚本中增加其他数据过滤（生成的数据集中，id相同的行来自同一个query），例如多样性判断、语种判断等。
 
@@ -95,9 +91,3 @@ SWIFT支持sample命令，该命令就是用于模型采样。目前支持的采
 | Qwen2.5_math_7b_instruct | 92.8      | 91.6           |
 
 可以看到，RFT训练后gsm8k指标变化不大，并没有出现前述的掉点现象。
-
-## 未来计划
-
-1. 更多的采样方式，如MCTS
-2. 超大模型蒸馏训练
-3. 以PPO为主的on-policy训练
@@ -144,6 +144,35 @@ The following are parameters for quantizing models upon loading. See the [quanti
 - bnb_4bit_use_double_quant: Whether to use double quantization. Default is `True`.
 - bnb_4bit_quant_storage: Data type used to store quantized weights. Default is `None`.
 
+### RAY Arguments
+
+- use_ray: Boolean type. Whether to use ray, defaults to `False`.
+- ray_exp_name: Ray experiment name. This field will be used as the prefix for cluster and worker names, can be empty.
+- device_groups: String (jsonstring) type. When using ray, this field must be configured. For details, please refer to the [ray documentation](Ray.md).
+
+### YAML Arguments
+
+- config: You can use config instead of command-line arguments, for example:
+
+```shell
+swift sft --config demo.yaml
+```
+
+The content of demo.yaml consists of other command-line configurations:
+
+```yaml
+# Model args
+model: Qwen/Qwen2.5-7B-Instruct
+dataset: swift/self-cognition
+...
+
+# Train args
+output_dir: xxx/xxx
+gradient_checkpointing: true
+
+...
+```
+
 ## Atomic Arguments
 
 ### Seq2SeqTrainer Arguments
@@ -698,13 +727,13 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
 
 - prm_model: The type of process reward model. It can be a model ID (triggered using `pt`) or a `prm` key defined in a plugin (for custom inference processes).
 - orm_model: The type of outcome reward model, typically a wildcard or test case, usually defined in a plugin.
-- sampler_type: The type of sampling. Currently supports `sample` (using `do_sample` method). Future support will include `mcts` and `dvts`.
+- sampler_type: The type of sampling. Currently supports `sample` and `distill`.
 - sampler_engine: Supports `pt`, `lmdeploy`, `vllm`, `no`. Defaults to `pt`. Specifies the inference engine for the sampling model.
 - output_dir: The output directory. Defaults to `sample_output`.
 - output_file: The name of the output file. Defaults to `None`, which uses a timestamp as the filename. When provided, only the filename should be passed without the directory, and only JSONL format is supported.
 - override_exist_file: Whether to overwrite if `output_file` already exists.
-- num_sampling_per_gpu_batch_size: The batch size for each sampling operation.
-- num_sampling_per_gpu_batches: The total number of batches to sample.
+- num_sampling_batch_size: The batch size for each sampling operation.
+- num_sampling_batches: The total number of batches to sample.
 - n_best_to_keep: The number of best sequences to return.
 - data_range: The partition of the dataset being processed for this sampling operation. The format should be `2 3`, meaning the dataset is divided into 3 parts, and this instance is processing the 3rd partition (this implies that typically three `swift sample` processes are running in parallel).
 - temperature: Defaults to `1.0`.
@@ -715,15 +744,6 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
 - cache_files: To avoid loading both `prm` and `generator` simultaneously and causing GPU memory OOM, sampling can be done in two steps. In the first step, set `prm` and `orm` to `None`, and all results will be output to a file. In the second run, set `sampler_engine` to `no` and pass `--cache_files` with the output file from the first sampling. This will use the results from the first run for `prm` and `orm` evaluation and output the final results.
   - Note: When using `cache_files`, the `--dataset` still needs to be provided because the ID for `cache_files` is calculated using the MD5 of the original data. Both pieces of information need to be used together.
 
-#### MCTS
-- rollout_depth: The maximum depth during rollouts, default is `5`.
-- rollout_start_depth: The depth at which rollouts begin; nodes below this depth will only undergo expand operations, default is `3`.
-- max_iterations: The maximum number of iterations for MCTS, default is `100`.
-- process_reward_rate: The proportion of process reward used in calculating value during selection, default is `0.0`, meaning PRM is not used.
-- exploration_rate: A parameter in the UCT algorithm that balances exploration; a higher value gives more weight to nodes with fewer explorations, default is `0.5`.
-- api_key: Required when using the client as an inference engine, default is `EMPTY`.
-- base_url: Required when using the client as an inference engine, default is 'https://dashscope.aliyuncs.com/compatible-mode/v1'.
-
 ## Specific Model Arguments
 
 In addition to the parameters listed above, some models support additional model-specific arguments. The meanings of these parameters can usually be found in the corresponding model's official repository or its inference code. **MS-Swift includes these parameters to ensure that the trained model aligns with the behavior of the official inference implementation**.