Skip to content

Commit 4889c02

Browse files
adding keep_module_scores flag
1 parent 0a44512 commit 4889c02

File tree

1 file changed

+36
-56
lines changed

1 file changed

+36
-56
lines changed

dspy/teleprompt/gepa/gepa.py

Lines changed: 36 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -197,60 +197,38 @@ def metric(
197197
# pareto_frontier is a list of scores, one for each task in the batch.
198198
```
199199
200-
Args:
201-
metric: The metric function to use for feedback and evaluation.
202-
auto: The auto budget to use for the run. Options: "light", "medium", "heavy".
203-
max_full_evals: The maximum number of full evaluations to perform.
204-
max_metric_calls: The maximum number of metric calls to perform.
205-
reflection_minibatch_size: The number of examples to use for reflection in a single GEPA step. Default is 3.
206-
candidate_selection_strategy: The strategy to use for candidate selection. Default is "pareto",
207-
which stochastically selects candidates from the Pareto frontier of all validation scores.
208-
Options: "pareto", "current_best".
209-
reflection_lm: The language model to use for reflection. Required parameter. GEPA benefits from
210-
a strong reflection model. Consider using `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)`
211-
for optimal performance.
212-
skip_perfect_score: Whether to skip examples with perfect scores during reflection. Default is True.
213-
add_format_failure_as_feedback: Whether to add format failures as feedback. Default is False.
214-
use_merge: Whether to use merge-based optimization. Default is True.
215-
max_merge_invocations: The maximum number of merge invocations to perform. Default is 5.
216-
num_threads: The number of threads to use for evaluation with `Evaluate`. Optional.
217-
failure_score: The score to assign to failed examples. Default is 0.0.
218-
perfect_score: The maximum score achievable by the metric. Default is 1.0. Used by GEPA
219-
to determine if all examples in a minibatch are perfect.
220-
log_dir: The directory to save the logs. GEPA saves elaborate logs, along with all candidate
221-
programs, in this directory. Running GEPA with the same `log_dir` will resume the run
222-
from the last checkpoint.
223-
track_stats: Whether to return detailed results and all proposed programs in the `detailed_results`
224-
attribute of the optimized program. Default is False.
225-
use_wandb: Whether to use wandb for logging. Default is False.
226-
wandb_api_key: The API key to use for wandb. If not provided, wandb will use the API key
227-
from the environment variable `WANDB_API_KEY`.
228-
wandb_init_kwargs: Additional keyword arguments to pass to `wandb.init`.
229-
track_best_outputs: Whether to track the best outputs on the validation set. track_stats must
230-
be True if track_best_outputs is True. The optimized program's `detailed_results.best_outputs_valset`
231-
will contain the best outputs for each task in the validation set.
232-
seed: The random seed to use for reproducibility. Default is 0.
233-
234-
Note:
235-
Budget Configuration: Exactly one of `auto`, `max_full_evals`, or `max_metric_calls` must be provided.
236-
The `auto` parameter provides preset configurations: "light" for quick experimentation, "medium" for
237-
balanced optimization, and "heavy" for thorough optimization.
238-
239-
Reflection Configuration: The `reflection_lm` parameter is required and should be a strong language model.
240-
GEPA performs best with models like `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)`.
241-
The reflection process analyzes failed examples to generate feedback for program improvement.
242-
243-
Merge Configuration: GEPA can merge successful program variants using `use_merge=True`.
244-
The `max_merge_invocations` parameter controls how many merge attempts are made during optimization.
245-
246-
Evaluation Configuration: Use `num_threads` to parallelize evaluation. The `failure_score` and
247-
`perfect_score` parameters help GEPA understand your metric's range and optimize accordingly.
248-
249-
Logging Configuration: Set `log_dir` to save detailed logs and enable checkpoint resuming.
250-
Use `track_stats=True` to access detailed optimization results via the `detailed_results` attribute.
251-
Enable `use_wandb=True` for experiment tracking and visualization.
252-
253-
Reproducibility: Set `seed` to ensure consistent results across runs with the same configuration.
200+
Parameters:
201+
- metric: The metric function to use for feedback and evaluation.
202+
203+
Budget configuration (exactly one of the following must be provided):
204+
- auto: The auto budget to use for the run.
205+
- max_full_evals: The maximum number of full evaluations to perform.
206+
- max_metric_calls: The maximum number of metric calls to perform.
207+
208+
Reflection based configuration:
209+
- reflection_minibatch_size: The number of examples to use for reflection in a single GEPA step.
210+
- candidate_selection_strategy: The strategy to use for candidate selection. Default is "pareto", which stochastically selects candidates from the Pareto frontier of all validation scores.
211+
- reflection_lm: [Required] The language model to use for reflection. GEPA benefits from a strong reflection model, and you can use `dspy.LM(model='gpt-5', temperature=1.0, max_tokens=32000)` to get a good reflection model.
212+
213+
Merge-based configuration:
214+
- use_merge: Whether to use merge-based optimization. Default is True.
215+
- max_merge_invocations: The maximum number of merge invocations to perform. Default is 5.
216+
217+
Evaluation configuration:
218+
- num_threads: The number of threads to use for evaluation with `Evaluate`
219+
- failure_score: The score to assign to failed examples. Default is 0.0.
220+
- perfect_score: The maximum score achievable by the metric. Default is 1.0. Used by GEPA to determine if all examples in a minibatch are perfect.
221+
222+
Logging configuration:
223+
- log_dir: The directory to save the logs. GEPA saves elaborate logs, along with all the candidate programs, in this directory. Running GEPA with the same `log_dir` will resume the run from the last checkpoint.
224+
- track_stats: Whether to return detailed results and all proposed programs in the `detailed_results` attribute of the optimized program. Default is False.
225+
- use_wandb: Whether to use wandb for logging. Default is False.
226+
- wandb_api_key: The API key to use for wandb. If not provided, wandb will use the API key from the environment variable `WANDB_API_KEY`.
227+
- wandb_init_kwargs: Additional keyword arguments to pass to `wandb.init`.
228+
- track_best_outputs: Whether to track the best outputs on the validation set. track_stats must be True if track_best_outputs is True. `optimized_program.detailed_results.best_outputs_valset` will contain the best outputs for each task in the validation set.
229+
230+
Reproducibility:
231+
- seed: The random seed to use for reproducibility. Default is 0.
254232
"""
255233
def __init__(
256234
self,
@@ -282,6 +260,7 @@ def __init__(
282260
track_best_outputs: bool = False,
283261
# Reproducibility
284262
seed: int | None = 0,
263+
keep_module_scores: bool = False,
285264
):
286265
try:
287266
inspect.signature(metric).bind(None, None, None, None, None)
@@ -334,6 +313,8 @@ def __init__(
334313
self.wandb_api_key = wandb_api_key
335314
self.wandb_init_kwargs = wandb_init_kwargs
336315

316+
self.keep_module_scores = keep_module_scores
317+
337318
if track_best_outputs:
338319
assert track_stats, "track_stats must be True if track_best_outputs is True."
339320
self.track_best_outputs = track_best_outputs
@@ -452,6 +433,7 @@ def feedback_fn(
452433
num_threads=self.num_threads,
453434
add_format_failure_as_feedback=self.add_format_failure_as_feedback,
454435
rng=rng,
436+
keep_module_scores=self.keep_module_scores,
455437
)
456438

457439
reflection_lm = self.reflection_lm
@@ -486,8 +468,6 @@ def feedback_fn(
486468
wandb_api_key=self.wandb_api_key,
487469
wandb_init_kwargs=self.wandb_init_kwargs,
488470
track_best_outputs=self.track_best_outputs,
489-
display_progress_bar=True,
490-
raise_on_exception=True,
491471

492472
# Reproducibility
493473
seed=self.seed,

0 commit comments

Comments
 (0)