GithubHelp home page GithubHelp logo

dangermon1987 / bugsplainer-icse23 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from raisedal/bugsplainer-icse23

0.0 0.0 0.0 9.54 MB

Replication package for the Bugsplainer tool published in ICSE 2023

License: Other

Shell 2.84% Python 97.16%

bugsplainer-icse23's Introduction

Replication Package of Bugsplainer

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Accepted Papers at ICSE 2023

Parvez Mahbub, Ohiduzzaman Shuvo, and M. Masudur Rahman. Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation. In Proceeding of The 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023), pp. 12, Melbourne, Australia, May 2023 (To appear).

Abstract

Software bugs claim approximately 50% of development time and cost the global economy billions of dollars. Once a bug is reported, the assigned developer attempts to identify and understand the source code responsible for the bug and then corrects the code. Over the last five decades, there has been significant research on automatically finding or correcting software bugs. However, there has been little research on automatically explaining the bugs to the developers, which is essential but a highly challenging task. In this paper, we propose Bugsplainer, a transformer-based generative model, that generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits. Bugsplainer can leverage structural information and buggy patterns from the source code to generate an explanation for a bug. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard, and can outperform multiple baselines from the literature. We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.

File Structure

  • data: contains the data-files
  • models: contains the model checkpoints. Each subdirectory has the format <checkpoint-name>/output/checkpoint-best-bleu.
  • output: contains sample explanations from Bugsplainer, Bugsplainer 220M, and the baselines.
  • runs: contains the output of each run during the replication. Each subdirectory is named after the --desc parameter discussed below.
  • scrape_github: python module that we used to scrape bug-fix commits from GitHub. Simply run python __main__.py to scrape a new dataset.
  • src:
    • bugsplainer: necessary scripts for replication
    • compute_scores_from_explanations.py: computes BLEU score, Semantic Similarity, and Exact Match for the generated explanations. If no argument given, it computes these scores from the sample from output/run4 directory. To evaluate pyflakes specifically (using max score scheme described in the paper), run as python -m src.compute_scores_from_explanations pyflakes.
    • explore_scores.py: Performs statistical tests for the metric scores in 5 runs.
  • requirements.txt: the requirements file for pip.

Replicate

In the following sections, we describe how to replicate the results for our RQ1.

Prerequisites

We recommend using a virtual environment to install the packages and run the application. Learn to use a virtual environment here.

In order to replicate our test scores, a CUDA supported GPU with at least 16 GB memory is recommended. In plain words, any NVIDIA GPU is CUDA supported. You can reduce the value of --eval_batch_size parameter (discussed in details later) to run in a smaller GPU, which increase the execution time.

Download Dataset and Model Checkpoints

Now, download one or all available model variants and data from DOI. Extract the downloaded file bugsplainer.zip. Then replace the data directory of this repository with the extracted data directory and models directory of this repository with the extracted models directory.

Installation

If you are running on a Linux system, then you can run ./scripts/main.sh to replicate all the experiments at once.

Make sure you have python 3.9, CUDA 11.3 and the latest pip installed on your machine. Then, to install pytorch, run

pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0+cu113 torchtext==0.11.0 -f https://download.pytorch.org/whl/cu113/torch_stable.html

If you want to run the Bugsplainer in CPU, remove +cu113 part from all package versions. Note that we do not guaranty a proper replication on CPU as the tensor operations may vary based on CPU architecture.

Once pytorch is installed, install the remaining packages by running

pip install -r requirements.txt

You can run python -m src.verify_installation to validate the installation. If it executes without any error, then all the necessary packages are installed.

Run The Application

This replication package replicates 4 experiments from our RQ1, which evaluate Bugsplainer with different automatic metric scores. Running each of these experiments takes approximately 30 minutes.

To replicate the first experiment of RQ1, run

python -m src.bugsplainer \
 --do_test \
 --model_name_or_path=models/262.finetune-sbt-random-512-64-16-60m/output/checkpoint-best-bleu \
 --config_name=models/config_60m.json \
 --tokenizer_name=Salesforce/codet5-base \
 --task="finetune" \
 --sub_task="sbt-random" \
 --desc="replicating-bugsplainer" \
 --data_dir=data \
 --max_source_length=512 \
 --max_target_length=64 \
 --eval_batch_size=16 \
 --cache_path=cache \
 --output_dir=output \
 --res_dir=result

For Windows Command Line, replace \ with ^ for line continuation

python -m src.bugsplainer ^
 --do_test ^
 --model_name_or_path=models/262.finetune-sbt-random-512-64-16-60m/output/checkpoint-best-bleu ^
 --config_name=models/config_60m.json ^
 --tokenizer_name=Salesforce/codet5-base ^
 --task="finetune" ^
 --sub_task="sbt-random" ^
 --desc="replicating-bugsplainer" ^
 --data_dir=data ^
 --max_source_length=512 ^
 --max_target_length=64 ^
 --eval_batch_size=16 ^
 --cache_path=cache ^
 --output_dir=output ^
 --res_dir=result

To replicate the remaining experiments, you can replace the name of the model subdirectory for parameter --model_name_or_path. In such a case, you also need to change the value of --config_name and --sub_task parameters encoded in the subdirectory name. The subdirectory names are in format โ€” {experiment_no}.{task}-{sub_task}-{input_len}-{output_len}-{batch_size}-{model_size}. The mapping from our experiment name to the checkpoint name is as follows.

RQ Experiment Checkpoint BLEU Semantic Similarity Exact Match
1 Bugsplainer 262.finetune-sbt-random-512-64-16-60m 33.15 55.76 22.37
1 Bugsplainer Cross-project 264.finetune-sbt-project-512-64-16-60m 17.16 44.98 7.15
1 Bugsplainer 220M 268.finetune-sbt-random-512-64-16-220m 33.87 56.35 23.50
1 Bugsplainer 220M Cross-project 270.finetune-sbt-project-512-64-16-220m 23.83 49.00 15.47

Compute Scores

Once you have the generated explanations, you can compute their metric scores by running

python -m src.compute_scores_from_explanations <desc>

where <desc> should be replaced with the value of --desc flag in the previous step.

Training Details

We validate the model after each epoch by BLEU score. Training runs for 100 epochs with early stopping. We use the model with the best result on the validation set as our final model. We then compute the average BLEU score on the test set. We call this whole process a run. In RQ3, we report the average score of five such runs with different random initialization.

Hyper-parameters

The hyperparameters used during the experiment are listed below

Hyper-parameter Value
Optimizer AdamW
Adam Epsilon 1e-8
Beam Size 10
Batch Size 16
Learning Rate 5e-5
Input Length 512
Output Length 64
Epoch 100
Early Stopping Patience 5
Learning Rate Scheduler Linear
Warmup Steps 100

The model-specific hyperparameters can be found in the config files in the models directory (download from Zenodo).


License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Something not working as expected? Create an issue here.

bugsplainer-icse23's People

Contributors

parvezmrobin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.