You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> - When $0 < \beta < 1$: use the above mixture distribution formula for interpolation
84
84
85
85
By adjusting the $\beta$ parameter, interpolation can be performed between different divergence metrics. When $\beta = 0.5$, the divergence is the standard symmetric JSD.
@@ -142,7 +142,7 @@ We can perform GKD training by setting the following parameters:
142
142
| Parameter | Type | Default | Range | Description |
143
143
|------|------|--------|---------|------|
144
144
|`--teacher_model`| str | Required | - | Teacher model path or model ID |
|`--seq_kd`| bool | False | True/False | Whether to use teacher-generated sequences<br>• False: Use dataset when not on-policy<br>• True: Use teacher generation when not on-policy |
Training script reference [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/rlhf/gkd/fast.sh)
204
+
205
+
206
+
## On-Policy Distillation
207
+
We can achieve the [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) training described in the Thinking Machines Lab blog by setting the following parameters:
208
+
209
+
```bash
210
+
--lmbda 1 # on-policy
211
+
--beta 1 # reverse
212
+
```
213
+
214
+
For a complete implementation, refer to the example script [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/on_policy_distillation.sh).
0 commit comments