Artifact repository for the paper Challenging Bug Prediction and Repair Models with Synthetic Bugs, accepted at SCAM 2025, Auckland, New Zealand. Authors are Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, and Reyhaneh Jabbarvand.
BugFarm is a framework that generates synthetic bugs through the analysis of least-attended tokens and statements in code. These synthetic bugs challenge and evaluate bug prediction and repair models. The pipeline involves extracting methods from projects, analyzing attention weights, determining least-attended components, and using LLMs to generate plausible bugs.
Please visit Zenodo to access the results of BugFarm. We will refer to certain files from this archive in the following sections.
The easiest way to set up BugFarm is using Docker:
# Build the Docker image
docker build -t bugfarm .
# Run the container
docker run -it bugfarm bash
If you prefer a manual setup:
-
Install
miniconda
-
Create and activate the environment:
conda env create -f environment.yaml conda activate bugfarm
-
Set up the tokenizer tool
-
Install dependencies and download projects:
bash setup.sh
This module extracts methods from projects and analyzes attention weights to determine least attended tokens (LAT) and least attended statements (LAS).
Key steps:
- Extract methods from projects
- Extract attention weights
- Analyze attention weights to determine LAT/LAS
For detailed instructions, see Attention Analyzer README.
This module uses LLMs to generate synthetic bugs based on the attention analysis results.
Key steps:
- Prompt LLM with LAT/LAS information
- Parse LLM responses to extract buggy methods
- Select the most suitable bugs
For detailed instructions, see Bug Generator README.
We provide synthetic bugs on Zenodo. Please download mutants.zip
from the BugFarm Zenodo archive.
This module creates datasets for training and evaluating bug detection models using various sources:
- BugSwarm
- Mockito-Closure (from Defects4J)
- RegMiner
- LEAM
- muBERT
For detailed instructions, see Create Defect Dataset README.
We provide defect datasets on Zenodo. Please download defect_datasets.zip
from the BugFarm Zenodo archive.
This module finetunes models for bug prediction using the created defect datasets.
For detailed instructions, see Finetuning README.
We use artifacts of FitRepair for performing bug repair on the generated mutants. Please refer to the original repository for details on how to use FitRepair. We provide the generated patches from FitRepair on Zenodo. Please download apr.zip
from the BugFarm Zenodo archive.
Please refer to human_study.zip
in the BugFarm Zenodo archive for the results of our human study on the generated bugs. You can also find human labeler results directly on UIUCPlus. Please refer to different branches for different human labelers and mutants.
This module generates mutants using the LEAM framework.
For detailed instructions, see LEAM README.
This module generates mutants using the muBERT framework.
For detailed instructions, see muBERT README.
For any questions or issues, please contact Ali Reza Ibrahimzada or open an issue on GitHub.