GithubHelp home page GithubHelp logo

emd's Introduction

Large Language Models for Equivalent Mutant Detection: How Far are We?

drawing

In this study, we empirically investigate various LLMs with different learning strategies for equivalent mutant detection. This is a replication package for our empirical study.


1. Environment

  • Python 3.7.7

  • PyTorch 1.13.1+cu117

  • Sciki-learn 1.2.2

  • Transformers 4.37.0.dev0

  • TRL 0.7.11

  • Numpy 1.18.1

  • Pandas 1.3.0

  • Matplotlib 3.4.2

  • Openai 1.2.3


2. Dataset

(1) Statistics of Java programs from MutantBench

drawing

We construct a (Java) Equivalent Mutant Detection dataset based on the MutantBench, which consists of MutantBenchtrain for fine-tuning and MutantBenchtest for testing. Specifically, the dataset can be divided into two parts:

  • Codebase (i.e., ./dataset/MutantBench_code_db_java.csv) contains 3 columns that we used to conduct our experiments: (1) id (int): The code id is used for retrieving the Java methods. (2) code (str): The original method/mutant written in Java. (3) operator (str): The type of mutation operators.

  • Mutant-Pair Datasets (i.e., MutantBenchtrain and MutantBenchtest) contains 4 columns that we used to conduct our experiments: (1) id (int): The id of mutant pair. (2) code_id_1 (int): The code id is used to retrieve the Java methods in Codebase. (3) code_id_2 (int): The code id is used to retrievethe Java methods in Codebase. (4) label (int): The label that determines whether a mutant pair is equivalent or not (i.e., 1 indicates equivalent, 0 indicates non-equivalent).

(2) How to access the dataset

All the pre-processed data used in our experiments can be downloaded from ./dataset.


3. Models

How to access the models

All the models' checkpoints in our experiments can be downloaded from our anonymous Zenodo(link1,link2).


4. Experiment Replication

For running the open-source LLMs, we recommend using GPU with 48 GB up memory for training and testing, since StarCoder (7B), CodeT5+ (7B), and Code Llama (7B) are computing intensive.

For running the closed-source LLMs (i.e., ChatGPT and Text-Embedding Models), you should prepare your own OpenAI account and API KEY.

Demo

Let's take the pre-trained UniXCoder as an example. The ./dataset folder contains the training and test data.

(1) Training phase

You can train the model through the following commands:

cd ./UniXCoder/code;
python train.py;

(2) Inference phase

To run the fine-tuned model to make inferences on the test dataset, run the following commands:

cd ./UniXCoder/code;
python test.py;

How to run the remaining models and strategies All the code can be accessed from respective directories. Please find their README.md files to run respective models.


5. Experimental Results


1) The performance of baselines and state-of-the-art LLMs on equivalent mutant detection.

drawing


2) The performance of different LLM strategies on equivalent mutant detection.

drawing


3) Unique correct detections (↑) and unique incorrect detections (↓) across studied EMD techniques.

drawing


4) Detection performance on Top-10 mutation operators across various EMD techniques (x-axis shows mutation operators and y-axis shows the correct detection percentage).

4-1) Performance of 4 EMD categories on Top-10 mutation operators. Detailed results for all 28 mutation operators are available in ./results/EMD_categories_all_operators.csv.

drawing


4-2) Performance of 5 LLM strategies on Top-10 mutation operators. Detailed results for all 28 mutation operators are available in ./results/LLM_strategies_all_operators.csv.

drawing


5) t-SNE plots showing the embedding of mutant pairs. EQ/NEQ represents equivalent/non-equivalent, respectively.

drawing


emd's People

Contributors

tianzhaotju avatar

Stargazers

 avatar  avatar Honglin Shu avatar Sᴜᴘᴇʀ Lᴇᴇ avatar

Watchers

 avatar

Forkers

spanshu96

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.