This is an open-source implementation of several versions of the SZZ algorithm for detecting bug-inducing commits.
To run PySZZ you need:
- Python 3
- srcML (https://www.srcml.org/) (i.e., the
srcmlcommand should be in the system path) - git >= 2.23
- Tested on Unix-like OSes (Linux/MacOS)
Run the following command to install the required python dependencies:
pip3 install --no-cache-dir -r requirements.txt
To run the tool, simply execute the following command:
python3 main.py /path/to/bug-fixes.json /path/to/configuration-file.yml /path/to/repo-directory
where:
bug-fixes.jsoncontains a list of information about bug-fixing commits and (optionally) issues 1. This is an example json that can be used with pyszz:
[
{
"repo_name": "amirmikhak/3D-GIF",
"fix_commit_hash": "645496dd3c5c89faee9dab9f44eb2dab1dffa3b9"
"best_scenario_issue_date": "2015-04-23T07:41:52"
},
...
]
alternatively:
[
{
"repo_name": "amirmikhak/3D-GIF",
"fix_commit_hash": "645496dd3c5c89faee9dab9f44eb2dab1dffa3b9",
"earliest_issue_date": "2015-04-23T07:41:52"
},
...
]
without issue date 1:
[
{
"fix_commit_hash": "30ae3f5421bcda1bc4ef2f1b18db6a131dcbbfd3",
"repo_name": "grosa1/szztest_mod_change"
},
...
]
The issue date filter can be enabled using the param issue_date_filter: true in the config file. This filter removes all the commits when the authored_date is after the issue date reported as earliest_issue_date or best_scenario_issue_date. Note that if the issue date is reported without the timezone info, it is assumed to be UTC.
To avoid infinite loops during blame, a default timeout of 1 hour is used. It can be manually modified at szz.ma_szz.MASZZ.find_bic()#135. This will impact on MA-SZZ, R-SZZ, L-SZZ, A-SZZ and DU-SZZ.
configuration-file.ymlis one of the following, depending on the SZZ variant you want to run:conf/agszz.yaml: runs AG-SZZconf/lszz.yaml: runs L-SZZconf/rszz.yaml: runs R-SZZconf/maszz.yaml: runs MA-SZZconf/raszz.yaml: runs RA-SZZconf/pdszz.yaml: runs PyDriller-SZZconf/aszz.yaml: runs A-SZZ@Rconf/aszz_ma.yaml: runs A-SZZ@MAconf/dfszz.yaml: runs DU-SZZ@Rconf/dfszz_ma.yaml: runs DU-SZZ@MAconf/rszz+.yaml: runs REV-SZZ@Rconf/maszz+.yaml: runs REV-SZZ@MA
Also, there are some variants of the default configuration files. For example, the conf files with the _issues_filter suffix run the corresponding SZZ variant with the issue filter enabled.
In each configuration file there is a comment for each param that explains its purpose. Note that for the param file_ext_to_parse, the file extension has to be lowercase because the filter calls tolower() on each file extension to check if there is a match.
repo-directoryis a folder which contains all the repositories that are required bybug-fixes.json. This parameter is not mandatory. In the case of therepo-directoryis not specified, pyszz will download each repo required by each bug-fix commit in a temporary folder. In the other case, pyszz searches for each required repository in therepo-directoryfolder. The directory structure must be the following:
.
|-- repo-directory
| |-- repouser
| |-- reponame
.To have different run configurations, just create or edit the configuration files. The available parameters are described in each yml file. In order to use the issue date filter, you have to enable the parameter provided in each configuration file.
N.B. the difference between best_scenario_issue_date and earliest_issue_date is described in our paper. Simply, you can use earliest_issue_date if you have the date of the issue linked to the bug-fix commit.
1 You need to edit the flag issue_date_filter provided in the configuration files at conf/ in order to enable/disable the issue date filter for SZZ.
start_example1.sh,start_example2.shandstart_example3.share example usages of pyszz;start_test_lszz.shandstart_test_rszz.share test cases for L-SZZ and R-SZZ;- The
testdirectory contains some example resources, such asrepos_test.zipandrepos_test_with_issues.zip. They contain some downloaded repositories to be used withbugfix_commits_test.jsonandbugfix_commits_with_issues_test.json, which are two examples of input json containing bug-fixing commits; postfilter_lszz.pyandpostfilter_rszz.pycan be used to apply only the heuristics of L-SZZ and R-SZZ to the output json of other SZZ (e.g., MA-SZZ) without performing a complete execution.
@article{rosa2023szzvariants,
title = {A comprehensive evaluation of SZZ Variants through a developer-informed oracle},
author = {Rosa, Giovanni and Pascarella, Luca and Scalabrino, Simone and Tufano, Rosalia and Bavota, Gabriele and Lanza, Michele and Oliveto, Rocco},
journal = {Journal of Systems and Software},
volume = {202},
pages = {111729},
year = {2023},
publisher = {Elsevier}
doi = {10.1016/j.jss.2023.111729}
}
The replication package of the study is available here