Skip to content

Artifact repository for the paper "Challenging Bug Prediction and Repair Models with Synthetic Bugs", In Proceedings of The 25th IEEE International Conference on Source Code Analysis & Manipulation (SCAM 2025), Auckland, New Zealand, September 2025

License

Notifications You must be signed in to change notification settings

Intelligent-CAT-Lab/BugFarm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BugFarm

Artifact repository for the paper Challenging Bug Prediction and Repair Models with Synthetic Bugs, accepted at SCAM 2025, Auckland, New Zealand. Authors are Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, and Reyhaneh Jabbarvand.

Table of Contents

Overview

BugFarm is a framework that generates synthetic bugs through the analysis of least-attended tokens and statements in code. These synthetic bugs challenge and evaluate bug prediction and repair models. The pipeline involves extracting methods from projects, analyzing attention weights, determining least-attended components, and using LLMs to generate plausible bugs.

Data Archive

Please visit Zenodo to access the results of BugFarm. We will refer to certain files from this archive in the following sections.

Getting Started

Using Docker (Recommended)

The easiest way to set up BugFarm is using Docker:

# Build the Docker image
docker build -t bugfarm .

# Run the container
docker run -it bugfarm bash

Manual Setup

If you prefer a manual setup:

  1. Install miniconda

  2. Create and activate the environment:

    conda env create -f environment.yaml
    conda activate bugfarm
  3. Set up the tokenizer tool

  4. Install dependencies and download projects:

    bash setup.sh

Project Modules

Attention Analyzer

This module extracts methods from projects and analyzes attention weights to determine least attended tokens (LAT) and least attended statements (LAS).

Key steps:

  1. Extract methods from projects
  2. Extract attention weights
  3. Analyze attention weights to determine LAT/LAS

For detailed instructions, see Attention Analyzer README.

Bug Generator

This module uses LLMs to generate synthetic bugs based on the attention analysis results.

Key steps:

  1. Prompt LLM with LAT/LAS information
  2. Parse LLM responses to extract buggy methods
  3. Select the most suitable bugs

For detailed instructions, see Bug Generator README.

We provide synthetic bugs on Zenodo. Please download mutants.zip from the BugFarm Zenodo archive.

Create Defect Dataset

This module creates datasets for training and evaluating bug detection models using various sources:

  • BugSwarm
  • Mockito-Closure (from Defects4J)
  • RegMiner
  • LEAM
  • muBERT

For detailed instructions, see Create Defect Dataset README.

We provide defect datasets on Zenodo. Please download defect_datasets.zip from the BugFarm Zenodo archive.

Bug Prediction

This module finetunes models for bug prediction using the created defect datasets.

For detailed instructions, see Finetuning README.

Bug Repair

We use artifacts of FitRepair for performing bug repair on the generated mutants. Please refer to the original repository for details on how to use FitRepair. We provide the generated patches from FitRepair on Zenodo. Please download apr.zip from the BugFarm Zenodo archive.

Human Study

Please refer to human_study.zip in the BugFarm Zenodo archive for the results of our human study on the generated bugs. You can also find human labeler results directly on UIUCPlus. Please refer to different branches for different human labelers and mutants.

LEAM

This module generates mutants using the LEAM framework.

For detailed instructions, see LEAM README.

muBERT

This module generates mutants using the muBERT framework.

For detailed instructions, see muBERT README.

Contact

For any questions or issues, please contact Ali Reza Ibrahimzada or open an issue on GitHub.

About

Artifact repository for the paper "Challenging Bug Prediction and Repair Models with Synthetic Bugs", In Proceedings of The 25th IEEE International Conference on Source Code Analysis & Manipulation (SCAM 2025), Auckland, New Zealand, September 2025

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •