You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: dspy/teleprompt/gepa/gepa.py
+57-33Lines changed: 57 additions & 33 deletions
Original file line number
Diff line number
Diff line change
@@ -197,38 +197,60 @@ def metric(
197
197
# pareto_frontier is a list of scores, one for each task in the batch.
198
198
```
199
199
200
-
Parameters:
201
-
- metric: The metric function to use for feedback and evaluation.
202
-
203
-
Budget configuration (exactly one of the following must be provided):
204
-
- auto: The auto budget to use for the run.
205
-
- max_full_evals: The maximum number of full evaluations to perform.
206
-
- max_metric_calls: The maximum number of metric calls to perform.
207
-
208
-
Reflection based configuration:
209
-
- reflection_minibatch_size: The number of examples to use for reflection in a single GEPA step.
210
-
- candidate_selection_strategy: The strategy to use for candidate selection. Default is "pareto", which stochastically selects candidates from the Pareto frontier of all validation scores.
211
-
- reflection_lm: [Required] The language model to use for reflection. GEPA benefits from a strong reflection model, and you can use `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)` to get a good reflection model.
212
-
213
-
Merge-based configuration:
214
-
- use_merge: Whether to use merge-based optimization. Default is True.
215
-
- max_merge_invocations: The maximum number of merge invocations to perform. Default is 5.
216
-
217
-
Evaluation configuration:
218
-
- num_threads: The number of threads to use for evaluation with `Evaluate`
219
-
- failure_score: The score to assign to failed examples. Default is 0.0.
220
-
- perfect_score: The maximum score achievable by the metric. Default is 1.0. Used by GEPA to determine if all examples in a minibatch are perfect.
221
-
222
-
Logging configuration:
223
-
- log_dir: The directory to save the logs. GEPA saves elaborate logs, along with all the candidate programs, in this directory. Running GEPA with the same `log_dir` will resume the run from the last checkpoint.
224
-
- track_stats: Whether to return detailed results and all proposed programs in the `detailed_results` attribute of the optimized program. Default is False.
225
-
- use_wandb: Whether to use wandb for logging. Default is False.
226
-
- wandb_api_key: The API key to use for wandb. If not provided, wandb will use the API key from the environment variable `WANDB_API_KEY`.
227
-
- wandb_init_kwargs: Additional keyword arguments to pass to `wandb.init`.
228
-
- track_best_outputs: Whether to track the best outputs on the validation set. track_stats must be True if track_best_outputs is True. `optimized_program.detailed_results.best_outputs_valset` will contain the best outputs for each task in the validation set.
229
-
230
-
Reproducibility:
231
-
- seed: The random seed to use for reproducibility. Default is 0.
200
+
Args:
201
+
metric: The metric function to use for feedback and evaluation.
202
+
auto: The auto budget to use for the run. Options: "light", "medium", "heavy".
203
+
max_full_evals: The maximum number of full evaluations to perform.
204
+
max_metric_calls: The maximum number of metric calls to perform.
205
+
reflection_minibatch_size: The number of examples to use for reflection in a single GEPA step. Default is 3.
206
+
candidate_selection_strategy: The strategy to use for candidate selection. Default is "pareto",
207
+
which stochastically selects candidates from the Pareto frontier of all validation scores.
208
+
Options: "pareto", "current_best".
209
+
reflection_lm: The language model to use for reflection. Required parameter. GEPA benefits from
210
+
a strong reflection model. Consider using `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)`
211
+
for optimal performance.
212
+
skip_perfect_score: Whether to skip examples with perfect scores during reflection. Default is True.
213
+
add_format_failure_as_feedback: Whether to add format failures as feedback. Default is False.
214
+
use_merge: Whether to use merge-based optimization. Default is True.
215
+
max_merge_invocations: The maximum number of merge invocations to perform. Default is 5.
216
+
num_threads: The number of threads to use for evaluation with `Evaluate`. Optional.
217
+
failure_score: The score to assign to failed examples. Default is 0.0.
218
+
perfect_score: The maximum score achievable by the metric. Default is 1.0. Used by GEPA
219
+
to determine if all examples in a minibatch are perfect.
220
+
log_dir: The directory to save the logs. GEPA saves elaborate logs, along with all candidate
221
+
programs, in this directory. Running GEPA with the same `log_dir` will resume the run
222
+
from the last checkpoint.
223
+
track_stats: Whether to return detailed results and all proposed programs in the `detailed_results`
224
+
attribute of the optimized program. Default is False.
225
+
use_wandb: Whether to use wandb for logging. Default is False.
226
+
wandb_api_key: The API key to use for wandb. If not provided, wandb will use the API key
227
+
from the environment variable `WANDB_API_KEY`.
228
+
wandb_init_kwargs: Additional keyword arguments to pass to `wandb.init`.
229
+
track_best_outputs: Whether to track the best outputs on the validation set. track_stats must
230
+
be True if track_best_outputs is True. The optimized program's `detailed_results.best_outputs_valset`
231
+
will contain the best outputs for each task in the validation set.
232
+
seed: The random seed to use for reproducibility. Default is 0.
233
+
234
+
Note:
235
+
Budget Configuration: Exactly one of `auto`, `max_full_evals`, or `max_metric_calls` must be provided.
236
+
The `auto` parameter provides preset configurations: "light" for quick experimentation, "medium" for
237
+
balanced optimization, and "heavy" for thorough optimization.
238
+
239
+
Reflection Configuration: The `reflection_lm` parameter is required and should be a strong language model.
240
+
GEPA performs best with models like `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)`.
241
+
The reflection process analyzes failed examples to generate feedback for program improvement.
242
+
243
+
Merge Configuration: GEPA can merge successful program variants using `use_merge=True`.
244
+
The `max_merge_invocations` parameter controls how many merge attempts are made during optimization.
245
+
246
+
Evaluation Configuration: Use `num_threads` to parallelize evaluation. The `failure_score` and
247
+
`perfect_score` parameters help GEPA understand your metric's range and optimize accordingly.
248
+
249
+
Logging Configuration: Set `log_dir` to save detailed logs and enable checkpoint resuming.
250
+
Use `track_stats=True` to access detailed optimization results via the `detailed_results` attribute.
251
+
Enable `use_wandb=True` for experiment tracking and visualization.
252
+
253
+
Reproducibility: Set `seed` to ensure consistent results across runs with the same configuration.
0 commit comments