Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f70c35f
and support for code viewing tool
saurabh111233212 Aug 5, 2025
cc5b3c8
syntax
saurabh111233212 Aug 5, 2025
9cd8edd
format
saurabh111233212 Aug 5, 2025
d052acf
pass /view_file in the launch config
saurabh111233212 Aug 5, 2025
c0d6068
add code_view to assert
saurabh111233212 Aug 5, 2025
c56c75d
format of tool changes
saurabh111233212 Aug 6, 2025
e1cf21b
Merge branch 'main' into saurabhs/coding-agent
saurabh111233212 Aug 6, 2025
f57cd27
get rid the the wrong code search verifier
saurabh111233212 Aug 7, 2025
85db46f
works?
saurabh111233212 Aug 7, 2025
207b6bb
tool stuff
saurabh111233212 Aug 11, 2025
598f01b
Resolve merge conflicts: update CI to uv 0.8.6 and use 'uv sync'; Doc…
saurabh111233212 Aug 11, 2025
2394c9c
data scrupts
saurabh111233212 Aug 12, 2025
3e0448f
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 12, 2025
af0df06
dumb fix
saurabh111233212 Aug 12, 2025
e3eabb7
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 13, 2025
6a50636
new try
saurabh111233212 Aug 14, 2025
9138842
more stuff
saurabh111233212 Aug 14, 2025
bcdceb7
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 14, 2025
0c2644e
decrease bszz
saurabh111233212 Aug 18, 2025
24d3cae
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 18, 2025
1d35d2c
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 18, 2025
83d689d
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 20, 2025
cbee129
merge main
saurabh111233212 Aug 20, 2025
9d1652c
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 20, 2025
ba87ed8
realign w main
saurabh111233212 Aug 22, 2025
d9c783b
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Aug 22, 2025
abbfab0
getting things working...
saurabh111233212 Aug 26, 2025
e1ef9d7
Added submit.py file
Aug 26, 2025
d33de25
things
saurabh111233212 Aug 28, 2025
5bbceaa
works?
saurabh111233212 Aug 28, 2025
0b77872
termination works!
saurabh111233212 Aug 28, 2025
9d19b3f
works
saurabh111233212 Aug 29, 2025
1d28a3c
Merge branch 'main' into saurabhs/view-tool-rl
saurabh111233212 Sep 8, 2025
130886c
small fix
saurabh111233212 Sep 8, 2025
ad50787
changes
saurabh111233212 Sep 9, 2025
c8908bd
fix race condition, testbed viz
saurabh111233212 Sep 9, 2025
cfe63e4
kinda works but hangs : (
saurabh111233212 Sep 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"permissions": {
"allow": [
"Bash(chmod:*)",
"Bash(make:*)",
"Bash(python test:*)",
"Bash(python:*)",
"Bash(ls:*)",
"Bash(rm:*)",
"Bash(uv run ruff check:*)",
"Bash(uv run pytest:*)",
"Bash(uv run:*)",
"Bash(ldd:*)",
"Bash(uv pip show:*)",
"Bash(uv sync:*)",
"Bash(uv pip uninstall:*)",
"Bash(curl:*)",
"Bash(grep:*)",
"Bash(bash:*)",
"Bash(sed:*)",
"Bash(awk:*)",
"WebSearch",
"WebFetch(domain:docs.ray.io)",
"Bash(ss:*)",
"Bash(uv pip:*)"
],
"deny": [],
"defaultMode": "acceptEdits"
}
}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,5 @@ dmypy.json
cache/
local_dataset_cache/
scratch/
open_instruct/code_utils/testbed/
open_instruct/code_utils/repos/
13 changes: 13 additions & 0 deletions build_and_push_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
image_name=open-coding-agent

# Build and push the Docker image to Beaker
docker build . --build-arg UV_CACHE_DIR=$UV_CACHE_DIR -t $image_name

beaker_user=$(beaker account whoami --format json | jq -r '.[0].name')

# Use '|| true' to prevent script from exiting if image doesn't exist to delete.
beaker image delete $beaker_user/$image_name || true

# Create the image in the same workspace used for jobs.
beaker image create $image_name -n $image_name -w ai2/$beaker_user
255 changes: 255 additions & 0 deletions coding-agent/CODE_SEARCH_DATASETS_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
# Code Search Datasets Documentation

## Overview

This document describes the two HuggingFace datasets created for code search tasks:

1. **Multi-Step Tool Dataset** (`HF_OUTPUT_MULTI_STEP_TOOL`) - For models that use the CodeSearchTool with multiple interactions
2. **Single-Turn Dataset** - For models that output a single view call to find buggy code

## Dataset Creation

### Script: `create_code_search_datasets.py`

The main script that transforms raw coding-agent data into structured datasets.

**Usage:**
```bash
# Process all data files
python create_code_search_datasets.py \
--data-dir coding-agent/data \
--output-dir code_search_datasets

# Test with sample.json only
python create_code_search_datasets.py \
--single-file-test \
--output-dir test_datasets

# Push to HuggingFace Hub
python create_code_search_datasets.py \
--push-to-hub \
--hub-org your-org-name
```

## Dataset Formats

### 1. Multi-Step Tool Dataset

**Purpose:** Training models to use CodeSearchTool for exploring repositories with multiple tool calls.

**Schema:**
```python
{
"instance_id": str, # Unique identifier (e.g., "starlette_10457")
"messages": List[Dict], # Full conversation with system, user, assistant messages
"buggy_info": Dict, # Information about the bug location
"num_turns": int, # Total number of conversation turns
"tool_calls_made": int # Number of tool calls in the conversation
}
```

**Message Format:**
- Uses the full conversation history including tool responses
- Includes system prompts with tool descriptions
- Multiple assistant responses with tool calls

**Example Entry:**
```python
{
"instance_id": "starlette_10457",
"messages": [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Find the bug in Config.__call__..."},
{"role": "assistant", "content": "<tool_call>{...}</tool_call>"},
{"role": "user", "content": "<tool_response>...</tool_response>"},
...
],
"buggy_info": {
"file_path": "/testbed/starlette/config.py",
"buggy_line": 130,
"view_range": [121, 138]
},
"num_turns": 32,
"tool_calls_made": 15
}
```

### 2. Single-Turn Dataset

**Purpose:** Training models to directly identify and view the buggy file in a single response.

**Schema:**
```python
{
"instance_id": str, # Unique identifier
"messages": List[Dict], # Single-turn conversation (system, user, assistant)
"buggy_file": str, # Path to the file containing the bug
"buggy_line": int, # Line number of the bug (-1 if unknown)
"bug_description": str # Truncated description of the bug
}
```

**Message Format:**
- Simplified single-turn format
- System prompt explains the task
- User provides bug description
- Assistant responds with a single view tool call

**Example Entry:**
```python
{
"instance_id": "starlette_10457",
"messages": [
{
"role": "system",
"content": "You are a code search assistant..."
},
{
"role": "user",
"content": "Repository location: /testbed\n\nBug description:\n..."
},
{
"role": "assistant",
"content": "I'll examine the file...\n\n<tool_call>{...}</tool_call>"
}
],
"buggy_file": "/testbed/starlette/config.py",
"buggy_line": 130,
"bug_description": "Possible bug in Config.__call__..."
}
```

## Integration with Existing Components

### 1. CodeViewTool (`tool_vllm.py`)

The `CodeViewTool` class processes tool calls from model outputs:

```python
from open_instruct.tool_utils.tool_vllm import CodeViewTool

tool = CodeViewTool(
api_endpoint="http://localhost:1234",
repo_name="repository-name",
start_str="<tool_call>",
end_str="</tool_call>"
)

# Process model output containing tool calls
result = tool(model_output)
```

### 2. CodeSearchVerifier (`ground_truth_utils.py`)

The `CodeSearchVerifier` evaluates model predictions:

```python
from open_instruct.ground_truth_utils import CodeSearchVerifier, CodeVerifierConfig

config = CodeVerifierConfig(
code_api_url="http://localhost:1234",
code_max_execution_time=5.0,
code_pass_rate_reward_threshold=0.5,
code_apply_perf_penalty=True
)

verifier = CodeSearchVerifier(config)

# Evaluate prediction
result = verifier(
tokenized_prediction=[],
prediction=model_output,
label=expected_files,
query=original_query
)
```

### 3. View File API (`api.py`)

The API endpoint for viewing repository files:

```python
# POST /view_file
{
"repo_name": "cool-RR/PySnooper",
"path": "pysnooper/pycompat.py",
"view_range": [86, 88], # Optional
"base_commit": "abc123" # Optional
}

# Response
{
"content": "file content here...",
"repo_path": "/path/to/cloned/repo"
}
```

## Loading and Using the Datasets

```python
from datasets import load_from_disk

# Load datasets
multi_step = load_from_disk('code_search_datasets/multi_step_tool_dataset')
single_turn = load_from_disk('code_search_datasets/single_turn_dataset')

# Access samples
for sample in multi_step:
print(f"Instance: {sample['instance_id']}")
print(f"Tool calls: {sample['tool_calls_made']}")

for sample in single_turn:
print(f"Bug file: {sample['buggy_file']}")
print(f"Bug line: {sample['buggy_line']}")
```

## Training Considerations

### For Multi-Step Models:
- Use the full conversation history
- Train to generate appropriate tool calls based on context
- Handle tool responses and continue reasoning

### For Single-Turn Models:
- Focus on extracting key information from bug descriptions
- Generate precise file paths and view ranges
- Optimize for accuracy in first attempt

## Evaluation Metrics

The CodeSearchVerifier provides several metrics:
- **File Match Score**: Whether the correct file was viewed
- **Line Coverage**: Whether the buggy line was within the view range
- **Efficiency Penalty**: Penalty for viewing unnecessary files
- **Response Time**: Time taken to identify the bug

## ChatML Template Support

Both datasets support the ChatML template format implemented in `dataset_transformation.py`:

```python
from open_instruct.dataset_transformation import TokenizerConfig

tc = TokenizerConfig(
tokenizer_name_or_path="your-model",
chat_template_name="chatml" # Use ChatML format
)
```

This formats messages as:
```
<|im_start|>system
You are a code search assistant.<|im_end|>
<|im_start|>user
Find the bug in this code.<|im_end|>
<|im_start|>assistant
I'll search for the bug.<|im_end|>
```

## Future Enhancements

1. **Multi-repository support**: Extend to handle bugs across multiple repositories
2. **Enhanced scoring**: More sophisticated evaluation metrics for partial matches
3. **Difficulty levels**: Categorize bugs by complexity
4. **Cross-file bugs**: Support bugs that span multiple files
5. **Test generation**: Include test cases that reproduce the bugs
Loading