You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,8 +75,10 @@ You can contact us and communicate with us by adding our group:
75
75
76
76
77
77
## 🎉 News
78
+
- 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
79
+
- 🎁 2025.10.28: Support [use yaml](examples/yaml) to configure command line parameters.
78
80
- 🎁 2025.09.29: Support padding_free for embedding/reranker/seq_cls tasks, use `--padding_free true --task_type embedding/reranker/generative_reranker/seq_cls` to begin!
79
-
- 🎁 2025.09.07: Added support for CHORD training algorithm. See the [documentation](./docs/source_en/Instruction/GRPO/AdvancedResearch/CHORD.md)
81
+
- 🎁 2025.09.07: Added support for CHORD training algorithm. See the [documentation](./docs/source_en/Instruction/GRPO/AdvancedResearch/CHORD.md).
80
82
- 🎁 2025.09.06: Ulysses can now be used with ring-attention, allowing sequences to be sharded into any number of chunks (no longer limited by the number of heads). The argument remains `--sequence_parallel_size N`.
81
83
- 🎁 2025.09.02: Megatron-SWIFT now supports multimodal model training. Documentation can be found [here](./docs/source_en/Megatron-SWIFT/Multimodal-Model.md).
82
84
- 🎁 2025.08.12: Support [Dynamic Fine-Tuning](https://arxiv.org/abs/2508.05629)(DFT) in SFT training, use parameter `--enable_dft_loss true`. Training scripts can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/dft.sh).
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Command-line-parameters.md
+32-12Lines changed: 32 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -144,6 +144,35 @@ The following are parameters for quantizing models upon loading. See the [quanti
144
144
- bnb_4bit_use_double_quant: Whether to use double quantization. Default is `True`.
145
145
- bnb_4bit_quant_storage: Data type used to store quantized weights. Default is `None`.
146
146
147
+
### RAY Arguments
148
+
149
+
- use_ray: Boolean type. Whether to use ray, defaults to `False`.
150
+
- ray_exp_name: Ray experiment name. This field will be used as the prefix for cluster and worker names, can be empty.
151
+
- device_groups: String (jsonstring) type. When using ray, this field must be configured. For details, please refer to the [ray documentation](Ray.md).
152
+
153
+
### YAML Arguments
154
+
155
+
- config: You can use config instead of command-line arguments, for example:
156
+
157
+
```shell
158
+
swift sft --config demo.yaml
159
+
```
160
+
161
+
The content of demo.yaml consists of other command-line configurations:
162
+
163
+
```yaml
164
+
# Model args
165
+
model: Qwen/Qwen2.5-7B-Instruct
166
+
dataset: swift/self-cognition
167
+
...
168
+
169
+
# Train args
170
+
output_dir: xxx/xxx
171
+
gradient_checkpointing: true
172
+
173
+
...
174
+
```
175
+
147
176
## Atomic Arguments
148
177
149
178
### Seq2SeqTrainer Arguments
@@ -698,13 +727,13 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
698
727
699
728
- prm_model: The type of process reward model. It can be a model ID (triggered using `pt`) or a `prm` key defined in a plugin (for custom inference processes).
700
729
- orm_model: The type of outcome reward model, typically a wildcard or test case, usually defined in a plugin.
701
-
- sampler_type: The type of sampling. Currently supports `sample`(using `do_sample` method). Future support will include `mcts`and `dvts`.
730
+
- sampler_type: The type of sampling. Currently supports `sample` and `distill`.
702
731
- sampler_engine: Supports `pt`, `lmdeploy`, `vllm`, `no`. Defaults to `pt`. Specifies the inference engine for the sampling model.
703
732
- output_dir: The output directory. Defaults to `sample_output`.
704
733
- output_file: The name of the output file. Defaults to `None`, which uses a timestamp as the filename. When provided, only the filename should be passed without the directory, and only JSONL format is supported.
705
734
- override_exist_file: Whether to overwrite if `output_file` already exists.
706
-
-num_sampling_per_gpu_batch_size: The batch size for each sampling operation.
707
-
-num_sampling_per_gpu_batches: The total number of batches to sample.
735
+
-num_sampling_batch_size: The batch size for each sampling operation.
736
+
-num_sampling_batches: The total number of batches to sample.
708
737
- n_best_to_keep: The number of best sequences to return.
709
738
- data_range: The partition of the dataset being processed for this sampling operation. The format should be `2 3`, meaning the dataset is divided into 3 parts, and this instance is processing the 3rd partition (this implies that typically three `swift sample` processes are running in parallel).
710
739
- temperature: Defaults to `1.0`.
@@ -715,15 +744,6 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
715
744
- cache_files: To avoid loading both `prm` and `generator` simultaneously and causing GPU memory OOM, sampling can be done in two steps. In the first step, set `prm` and `orm` to `None`, and all results will be output to a file. In the second run, set `sampler_engine` to `no` and pass `--cache_files` with the output file from the first sampling. This will use the results from the first run for `prm` and `orm` evaluation and output the final results.
716
745
- Note: When using `cache_files`, the `--dataset` still needs to be provided because the ID for `cache_files` is calculated using the MD5 of the original data. Both pieces of information need to be used together.
717
746
718
-
#### MCTS
719
-
- rollout_depth: The maximum depth during rollouts, default is `5`.
720
-
- rollout_start_depth: The depth at which rollouts begin; nodes below this depth will only undergo expand operations, default is `3`.
721
-
- max_iterations: The maximum number of iterations for MCTS, default is `100`.
722
-
- process_reward_rate: The proportion of process reward used in calculating value during selection, default is `0.0`, meaning PRM is not used.
723
-
- exploration_rate: A parameter in the UCT algorithm that balances exploration; a higher value gives more weight to nodes with fewer explorations, default is `0.5`.
724
-
- api_key: Required when using the client as an inference engine, default is `EMPTY`.
725
-
- base_url: Required when using the client as an inference engine, default is 'https://dashscope.aliyuncs.com/compatible-mode/v1'.
726
-
727
747
## Specific Model Arguments
728
748
729
749
In addition to the parameters listed above, some models support additional model-specific arguments. The meanings of these parameters can usually be found in the corresponding model's official repository or its inference code. **MS-Swift includes these parameters to ensure that the trained model aligns with the behavior of the official inference implementation**.
0 commit comments