GithubHelp home page GithubHelp logo

lv2020 / multipas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pmorvalho/multipas

0.0 0.0 0.0 354 KB

MultIPAs : Applying Program Transformations to Introductory Programming Assignments for Data Augmentation

License: MIT License

Shell 1.76% Python 15.85% C 82.33% Batchfile 0.06%

multipas's Introduction

MultIPAS

MultIPAs : Applying Program Transformations to Introductory Programming Assignments for Data Augmentation

MultIPAs is a program transformation framework capable of augmenting small imperative C program benchmarks by performing six different syntactic program mutations and three semantic program mutilations. MultIPAs keeps the information about the types and the number of bugs present in each incorrect program generated, which can be used to train ML-based program repair frameworks. Furthermore, MultIPAs produces a variable mapping between the original program given as input and the mutated/mutilated program.

MultIPAs is divided into two modules: Program Mutator and Program Mutilator.

Demonstration Video.

Program Mutator

The following six syntactic program mutations are available on MultIPAs' program mutator:

  • M1 - Comparison Expression Mirroring (CEM): MultIPAs mirrors one or several comparison expressions e.g. $a \ge b$ becomes $b \le a$;
  • M2 - If-else-statements Swapping (IES): MultIPAs swaps the if-branch and the else-branch and negates the if-condition. This is done only for simple if-else-statements, i.e., there are no more if-statements inside the else-branch;
  • M3 - Increment/Decrement Operators Mirroring (IOM): MultIPAs mirrors the two increment (and decrement) operators in the C programming language (e.g. c++ and ++c), only when the return value of the expression that contains the increment/decrement operator is discarded e.g. the increment step of a for-loop;
  • M4 - Variable Declarations Reordering (VDR): MultIPAs reorders the variables' declarations present in each code block. For this, our framework takes into account the dependencies between the variables' declarations i.g if a variable declaration depends on other variables, this is done by computing all possible topological orders of the variables' declarations;
  • M5 - For-2-While Translation (F2W): MultIPAs translates for-loops into while-loops. Just in cases of for-loops that do not contain any continue instructions;
  • M6 - Variable Addition (VA): MultIPAs introduces a new dummy variable declaration in the program. The mutated program does not have the same set of variables as the original program.

Usage:

usage: prog_mutator.py [-h] [-c] [-if] [-io] [-dv] [-rd] [-fw] [-a] [-p PERCENTAGE_TOTAL_PROGS] [-info] [-ea] [-v] -d INPUT_DIR -o OUTPUT_DIR 

optional arguments:
  -h, --help            show this help message and exit
  -c, --comp_ops        Swaps the comparison operators.
  -if, --if_else        Swaps the simple if-else-statements.
  -io, --incr_ops       Swaps the increment operators (e.g. i++, ++i and i+=1) if these are not used in a binary operation or in an assignment.
  -dv, --dummy_var      Declares a dummy variable in the beginning of the main function.
  -rd, --reord_decls    Reorder the order of variable declarations, when it is possible i.e., when two variables' declarations do not depend on each other
  -fw, --for_2_while    Translates simple for-loops (without any continue instruction) into a while-loop.
  -a, --all_mut         Performs all the mutations above.
  -p PERCENTAGE_TOTAL_PROGS, --percentage_total_progs PERCENTAGE_TOTAL_PROGS
                        Instead of generating all possible mutations the script only generates this percentage. Default 0.01 if the total number of possible mutations is higher than 100k or 0.1 otherwise.
  -d INPUT_DIR, --input_dir INPUT_DIR
                        Name of the input directory.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Name of the output directory.
  -info, --info         Prints the total number of programs the required mutations can produced and exits without producing the sets of programs.
  -ea, --enumerate_all  Enumerates all possible mutated programs. NOTE: Sometimes the number of mutated programs is more than 200K Millions of programs.
  -v, --verbose         Prints debugging information.

Program Mutilator

The following three semantic program mutilations are available on MultIPAs' program mutilator:

  • B1 - Wrong Comparison Operator (WCO): MultIPAs swaps an expression's comparison operators for some syntactically similar operator e.g. swaps the operator < for <=. MultIPAs can also swap > for >=, <= for <, >= for >, == for =, and != for ==;
  • B2 - Variable Misuse (VM): MultIPAs swaps a variable in the program by another variable of the same type. The resulting mutilated program can be compiled successfully since MultIPAS ensures that both variables are of the same type;
  • B3 - Assignment Deletion (AD): MultIPAs deletes an assignment expression in the program.
usage: prog_mutilator.py [-h] [-c] [-vm] [-ad] [-a] [-s] [-n NUM_MUT] [-pp NUM_PROGS_2_PROCESS] [-info] [-v] -d INPUT_DIR -o OUTPUT_DIR 

optional arguments:
  -h, --help            show this help message and exit
  -c, --comp_ops        Swaps the comparison operators.
  -vm, --var_mu         Introduces a bug of variable misuse.
  -ad, --asg_del        Introduces a bug of expression deletion (assignments).
  -a, --all_mut         Performs all the mutilations above.
  -s, --single          Generates a single erroneous programs, instead of enumerating all possibilities.
  -n NUM_MUT, --num_mut NUM_MUT
                        Number of mutilations to be performed. (Default = 1).
  -pp NUM_PROGS_2_PROCESS, --num_progs_2_process NUM_PROGS_2_PROCESS
                        Number of programs to process from the input directory. (Default = 50).
  -info, --info         Prints the total number of programs the required mutilations can produced and exits without producing the sets of programs.
  -d INPUT_DIR, --input_dir INPUT_DIR
                        Name of the input directory.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Name of the output directory.
  -v, --verbose         Prints debugging information.

Variable Mapping

Every time MultIPAs mutates or mutilates a program, a mapping between the original program's set of variables and the mutated/mutilated program's sets of variables is generated. This variable mapping can help program repair frameworks that rely on mappings between the sets of variables of the correct implementation and the incorrect program they are trying to repair.

References

P. Orvalho, M. Janota, and V. Manquinho. MultIPAs: Applying Program Transformations to Introductory Programming Assignments for Data Augmentation. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022. PDF

Introductory Programming Assignments (IPAs) Datasets

Installation Requirements

  • Python 3.8.5
  • pycparser : version 2.21
    pip install pycparser==2.21
    
  • numpy : version 1.19.2
    pip install numpy==1.19.2
    

multipas's People

Contributors

lv2020 avatar pmorvalho avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.