Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
7e78e53
kube
Nov 27, 2025
b448767
uuid as name
Nov 27, 2025
9ac9226
normalize pod name
Nov 27, 2025
cebbf37
load dataset outside
Nov 27, 2025
ffd87cf
remove tolerations
Nov 27, 2025
0d29967
incorporate dataset loading
Nov 27, 2025
c7afaa2
some type annotations
Nov 27, 2025
c506fe1
fixture first fix
Nov 27, 2025
aab04ff
fix
Nov 27, 2025
cc8f813
fix tests
Nov 27, 2025
77aeb78
simplify filtering
Nov 27, 2025
28caf41
remove deps on swesmith! also fix excluded_ids for swesmith
Nov 27, 2025
d9b76c7
remove swesmith
Nov 27, 2025
e13462b
Merge remote-tracking branch 'origin/main' into envs_for_images
MarcCote Nov 28, 2025
928c1d8
load dataset as class method / setup_task
Nov 28, 2025
b338e1c
fix tests
Nov 28, 2025
0858bea
change run.py
Nov 28, 2025
35a4f66
blacked
Nov 28, 2025
e9600ed
remove imports
Nov 28, 2025
81b2eda
task name / task data adaptation
Nov 28, 2025
3468a62
pre commit
Nov 28, 2025
c56579c
cls keyword
Nov 28, 2025
4b01ac8
remove load dataset
Nov 28, 2025
0dd0f4e
Working on tests + refactoring
MarcCote Nov 28, 2025
e6fcd58
Adding back swesmith
MarcCote Nov 28, 2025
62a2eda
remove convert_tool_.. , refactor history
Nov 29, 2025
7f63767
Merge remote-tracking branch 'origin/envs_for_images' into simpler-agent
Nov 29, 2025
b690b93
more robust loop
Nov 29, 2025
de8715b
instance_prompt
Nov 30, 2025
fb9292d
contract
Nov 30, 2025
f4e79a9
should_stop
Nov 30, 2025
bd4b9b3
instance_prompt_template_file
Nov 30, 2025
bb9e26e
max rewrite is in froggy agent
Nov 30, 2025
5389a95
nicer loop
Nov 30, 2025
430b662
just move around
Nov 30, 2025
5295f8f
store message in history
Nov 30, 2025
134d248
cutoff logic
Dec 1, 2025
8c7b562
invert llm response and env observation
Dec 1, 2025
98328d3
Merging changed from main
MarcCote Dec 4, 2025
41cb3c4
Merge branch 'main' into simpler-agent-merged
MarcCote Dec 4, 2025
353ceb3
Remove apply_patch from FroggyAgent
MarcCote Dec 4, 2025
38b755e
Fixing tests
MarcCote Dec 4, 2025
a08dbd1
Continue clean up
MarcCote Dec 4, 2025
1ff4482
remove show_directory_tree and _auto_eval_on_rewrite from the shortcu…
xingdi-eric-yuan Dec 4, 2025
27ad8fa
remove unused imports
xingdi-eric-yuan Dec 4, 2025
d9363a7
remove test cases for legacy functions (format_tool_call_history)
xingdi-eric-yuan Dec 5, 2025
bdaf6c9
Update conftest.py
xingdi-eric-yuan Dec 5, 2025
82a50f0
remove debug_5_agent config
xingdi-eric-yuan Dec 5, 2025
b664cd7
remove rewrite agent and debug agent, now can just use froggy_agent a…
xingdi-eric-yuan Dec 5, 2025
db45a5b
minor
xingdi-eric-yuan Dec 5, 2025
baf93ef
skip and copy
xingdi-eric-yuan Dec 5, 2025
60e44de
Update r2egym.py
xingdi-eric-yuan Dec 5, 2025
d875895
Update r2egym.py
xingdi-eric-yuan Dec 5, 2025
1122bbd
Fixing tests
MarcCote Dec 5, 2025
03e47e0
Update solution agent
MarcCote Dec 5, 2025
ff199d7
Update tests
MarcCote Dec 5, 2025
4712bd0
Simplify system/instance prompt template
MarcCote Dec 5, 2025
48e76e7
Clean up utils
MarcCote Dec 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,9 @@ jobs:
ALLOW_LOCAL_TERMINAL: true
DEBUG_GYM_DEBUG: 1
run: |
df -h
pytest -vv -n 16 -k "not test_swe_bench and not test_swe_smith and not test_r2egym and not test_kubernetes" --timeout=600 --cov=debug_gym --cov-report=term-missing
df -h
- name: Store coverage report
uses: actions/upload-artifact@v4
with:
Expand Down
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,9 @@ We provide the below LLM-based agents, they all have minimal design and serve th

| Agent name | Available Tools | Description |
| :-: | :-: | :----- |
| `debug_agent` | `pdb`, `rewrite`, `view`, `eval` | A minimal agent that dumps all available information into its prompt and queries the LLM to generate a command. |
| `rewrite_agent` | `rewrite`, `view`, `eval` | A `debug_agent` but `pdb` tool is disabled (an agent keeps rewriting). |
| `debug_5_agent` | `pdb`, `rewrite`, `view`, `eval` | A `debug_agent`, but `pdb` tool is only enabled after certain amount of rewrites. |
| `grep_agent` | `grep`, `rewrite`, `view`, `eval` | A variant of `rewrite_agent` that includes the `grep` tool for searching patterns in the codebase before making changes. |
| `froggy_agent` | `grep`, `pdb`, `view`, `rewrite`, `eval` (configurable) | Primary debugging agent. Adjust prompts and tool lists in YAML to mimic rewrite-only, grep-heavy, or other workflows. |
| `solution_agent` | `pdb`, `eval` | An oracle agent that applies a gold patch (only works with `swebench` and `swesmith` benchmarks for now). The agent checks that tests are failing before applying the patch, and passing after. It also checks that `pdb` tool can be used as expected. |
| `swe_agent` | `bash`, `rewrite`, `submit` | Baseline agent tailored for the SWE-bench setting that executes bash commands in addition to rewrites. |

---

Expand Down Expand Up @@ -171,27 +169,27 @@ We provide a human mode that enables developers to manually interact with `debug

#### 3.3. Overriding Values in Config

The `-p` flag is a handy way to override values defined in the config file. For example, the command below will run the rewrite_agent agent on Aider with human mode (even if the config file specifies gpt-4o). The command also overrides the default system prompt (see below for more information).
The `-p` flag is a handy way to override values defined in the config file. For example, the command below will run the `froggy_agent` configuration on Aider with human mode (even if the config file specifies gpt-4o). The command also overrides the default system prompt (see below for more information).

python scripts/run.py scripts/config_aider.yaml \
--agent debug_agent \
--agent froggy_agent \
-v \
-p debug_agent.llm_name="human" \
-p debug_agent.system_prompt_template_file="scripts/templates/human_friendly_system_prompt.jinja"
-p froggy_agent.llm_name="human" \
-p froggy_agent.system_prompt="scripts/templates/human_friendly_system_prompt.jinja"


#### 3.4. Customizing the System Prompt with Jinja Templates

`debug-gym` allows you to fully customize the system prompt by providing a [Jinja](https://jinja.palletsprojects.com/) template file. This enables you to control the format and content of the prompt sent to the LLM, making it easier to adapt the environment to your specific needs or research experiments.

To use a custom system prompt template, specify the path to your Jinja template file in your agent's configuration under `system_prompt_template_file`. For example:
To use a custom system prompt template, specify the path to your Jinja template file in your agent's configuration under `system_prompt`. For example:

```yaml
debug_agent:
system_prompt_template_file: scripts/templates/custom_system_prompt.jinja
froggy_agent:
system_prompt: scripts/templates/custom_system_prompt.jinja
```

Alternatively, you can provide a custom template from the command line with `-p <agent>.system_prompt_template_file="<path/to/template.jinja>"` (see above).
Alternatively, you can provide a custom template from the command line with `-p <agent>.system_prompt="<path/to/template.jinja>"` (see above).

Within your Jinja template, you have access to the `agent` and `info` objects, which provide all relevant context about the current environment and agent state.

Expand All @@ -215,10 +213,10 @@ In addition to all [built-in Jinja filters](https://jinja.palletsprojects.com/en

#### Example Template

```jinja
System Prompt for Debug-Gym
Here is an example of a custom system prompt template using Jinja:

Task: {{ agent.system_prompt }}
```jinja
You are an autonomous debugging agent designed to fix bugs in Python code repositories.

Instructions:
{{ info.instructions }}
Expand Down
4 changes: 3 additions & 1 deletion debug_gym/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from debug_gym.version import __version__
from debug_gym.version import __version__ as __version__

__all__ = ["__version__"]
9 changes: 7 additions & 2 deletions debug_gym/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
from debug_gym.agents.debug_agent import Debug_5_Agent, DebugAgent
from debug_gym.agents.rewrite_agent import RewriteAgent
from debug_gym.agents.froggy_agent import FroggyAgent
from debug_gym.agents.solution_agent import AgentSolution
from debug_gym.agents.swe_agent import SWEAgent

__all__ = [
"FroggyAgent",
"AgentSolution",
"SWEAgent",
]
Loading
Loading