GithubHelp home page GithubHelp logo

aflplusplus / grammar-mutator Goto Github PK

View Code? Open in Web Editor NEW
205.0 205.0 17.0 557 KB

A grammar-based custom mutator for AFL++

License: Apache License 2.0

CMake 1.92% Makefile 2.41% Python 74.86% C 13.76% C++ 5.63% Ruby 1.43%
afl afl-fuzz aflplusplus fuzzing grammar-fuzzer

grammar-mutator's Introduction

American Fuzzy Lop plus plus (AFL++)

AFL++ logo

Release version: 4.20c

GitHub version: 4.21a

Repository: https://github.com/AFLplusplus/AFLplusplus

AFL++ is maintained by:

Originally developed by Michal "lcamtuf" Zalewski.

AFL++ is a superior fork to Google's AFL - more speed, more and better mutations, more and better instrumentation, custom module support, etc.

You are free to copy, modify, and distribute AFL++ with attribution under the terms of the Apache-2.0 License. See the LICENSE for details.

Getting started

Here is some information to get you started:

  • For an overview of the AFL++ documentation and a very helpful graphical guide, please visit docs/README.md.
  • To get you started with tutorials, go to docs/tutorials.md.
  • For releases, see the Releases tab and branches. The best branches to use are, however, stable or dev - depending on your risk appetite. Also take a look at the list of important changes in AFL++ and the list of features.
  • If you want to use AFL++ for your academic work, check the papers page on the website.
  • To cite our work, look at the Cite section.
  • For comparisons, use the fuzzbench aflplusplus setup, or use afl-clang-fast with AFL_LLVM_CMPLOG=1. You can find the aflplusplus default configuration on Google's fuzzbench.

Building and installing AFL++

To have AFL++ easily available with everything compiled, pull the image directly from the Docker Hub (available for both x86_64 and arm64):

docker pull aflplusplus/aflplusplus
docker run -ti -v /location/of/your/target:/src aflplusplus/aflplusplus

This image is automatically published when a push to the stable branch happens (see branches). If you use the command above, you will find your target source code in /src in the container.

Note: you can also pull aflplusplus/aflplusplus:dev which is the most current development state of AFL++.

To build AFL++ yourself - which we recommend - continue at docs/INSTALL.md.

Quick start: Fuzzing with AFL++

NOTE: Before you start, please read about the common sense risks of fuzzing.

This is a quick start for fuzzing targets with the source code available. To read about the process in detail, see docs/fuzzing_in_depth.md.

To learn about fuzzing other targets, see:

Step-by-step quick start:

  1. Compile the program or library to be fuzzed using afl-cc. A common way to do this would be:

    CC=/path/to/afl-cc CXX=/path/to/afl-c++ ./configure --disable-shared
    make clean all
    
  2. Get a small but valid input file that makes sense to the program. When fuzzing verbose syntax (SQL, HTTP, etc.), create a dictionary as described in dictionaries/README.md, too.

  3. If the program reads from stdin, run afl-fuzz like so:

    ./afl-fuzz -i seeds_dir -o output_dir -- \
    /path/to/tested/program [...program's cmdline...]
    

    To add a dictionary, add -x /path/to/dictionary.txt to afl-fuzz.

    If the program takes input from a file, you can put @@ in the program's command line; AFL++ will put an auto-generated file name in there for you.

  4. Investigate anything shown in red in the fuzzer UI by promptly consulting docs/afl-fuzz_approach.md#understanding-the-status-screen.

  5. You will find found crashes and hangs in the subdirectories crashes/ and hangs/ in the -o output_dir directory. You can replay the crashes by feeding them to the target, e.g. if your target is using stdin:

    cat output_dir/crashes/id:000000,* | /path/to/tested/program [...program's cmdline...]
    

    You can generate cores or use gdb directly to follow up the crashes.

  6. We cannot stress this enough - if you want to fuzz effectively, read the docs/fuzzing_in_depth.md document!

Contact

Questions? Concerns? Bug reports?

Branches

The following branches exist:

  • release: the latest release
  • stable/trunk: stable state of AFL++ - it is synced from dev from time to time when we are satisfied with its stability
  • dev: development state of AFL++ - bleeding edge and you might catch a checkout which does not compile or has a bug. We only accept PRs (pull requests) for the 'dev' branch!
  • (any other): experimental branches to work on specific features or testing new functionality or changes.

Help wanted

We have several ideas we would like to see in AFL++ to make it even better. However, we already work on so many things that we do not have the time for all the big ideas.

This can be your way to support and contribute to AFL++ - extend it to do something cool.

For everyone who wants to contribute (and send pull requests), please read our contributing guidelines before you submit.

Special thanks

Many of the improvements to the original AFL and AFL++ wouldn't be possible without feedback, bug reports, or patches from our contributors.

Thank you! (For people sending pull requests - please add yourself to this list :-)

List of contributors
  Jann Horn                             Hanno Boeck
  Felix Groebert                        Jakub Wilk
  Richard W. M. Jones                   Alexander Cherepanov
  Tom Ritter                            Hovik Manucharyan
  Sebastian Roschke                     Eberhard Mattes
  Padraig Brady                         Ben Laurie
  @dronesec                             Luca Barbato
  Tobias Ospelt                         Thomas Jarosch
  Martin Carpenter                      Mudge Zatko
  Joe Zbiciak                           Ryan Govostes
  Michael Rash                          William Robinet
  Jonathan Gray                         Filipe Cabecinhas
  Nico Weber                            Jodie Cunningham
  Andrew Griffiths                      Parker Thompson
  Jonathan Neuschaefer                  Tyler Nighswander
  Ben Nagy                              Samir Aguiar
  Aidan Thornton                        Aleksandar Nikolich
  Sam Hakim                             Laszlo Szekeres
  David A. Wheeler                      Turo Lamminen
  Andreas Stieger                       Richard Godbee
  Louis Dassy                           teor2345
  Alex Moneger                          Dmitry Vyukov
  Keegan McAllister                     Kostya Serebryany
  Richo Healey                          Martijn Bogaard
  rc0r                                  Jonathan Foote
  Christian Holler                      Dominique Pelle
  Jacek Wielemborek                     Leo Barnes
  Jeremy Barnes                         Jeff Trull
  Guillaume Endignoux                   ilovezfs
  Daniel Godas-Lopez                    Franjo Ivancic
  Austin Seipp                          Daniel Komaromy
  Daniel Binderman                      Jonathan Metzman
  Vegard Nossum                         Jan Kneschke
  Kurt Roeckx                           Marcel Boehme
  Van-Thuan Pham                        Abhik Roychoudhury
  Joshua J. Drake                       Toby Hutton
  Rene Freingruber                      Sergey Davidoff
  Sami Liedes                           Craig Young
  Andrzej Jackowski                     Daniel Hodson
  Nathan Voss                           Dominik Maier
  Andrea Biondo                         Vincent Le Garrec
  Khaled Yakdan                         Kuang-che Wu
  Josephine Calliotte                   Konrad Welc
  Thomas Rooijakkers                    David Carlier
  Ruben ten Hove                        Joey Jiao
  fuzzah                                @intrigus-lgtm
  Yaakov Saxon                          Sergej Schumilo

Cite

If you use AFL++ in scientific work, consider citing our paper presented at WOOT'20:

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. “AFL++: Combining incremental steps of fuzzing research”. In 14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association, Aug. 2020.
BibTeX
@inproceedings {AFLplusplus-Woot20,
author = {Andrea Fioraldi and Dominik Maier and Heiko Ei{\ss}feldt and Marc Heuse},
title = {{AFL++}: Combining Incremental Steps of Fuzzing Research},
booktitle = {14th {USENIX} Workshop on Offensive Technologies ({WOOT} 20)},
year = {2020},
publisher = {{USENIX} Association},
month = aug,
}

grammar-mutator's People

Contributors

0x7fancy avatar andreafioraldi avatar h1994st avatar hexcoder- avatar realmadsci avatar vanhauser-thc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

grammar-mutator's Issues

Issue with parallel build

make -j8 fails with

make[2]: Entering directory '/home/andrea/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[2]: *** No rule to make target 'antlr4-cpp-runtime-src/runtime/src/ANTLRErrorListener.o', needed by 'libantlr4-runtime.a'.  Stop.
make[2]: *** Waiting for unfinished jobs....

Statistics

I this issue I will collect statistics during my tests over the next 8 weeks :)

optimized syntax '+' cause 'random_recursive_mutation' error

          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse:
Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse:
Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to 0x7Fancy@6eae7d1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in #17 (comment)

A question about data length

Hi,I had a problem when I using Grammar-Mutator.
I want define a long hex in my setup file, like this
image

But when I use it in afl++, the program ran there will meet a bug, like this
image
I change the code to printf the "ret". In normal ,the "ret" should be 1. But in my program "ret" is 0, like this
image
As "ret" is 0, the program will not run in Grammar-Mutator mode. So can you help me? Thank you very much.

Grammar mutator issue : _pick_non_term_node

Hello .

When running Grammar mutator on a target, there is a problem right before running AFL++ on the target .

Here is the log :

mic@mic-System-Product-Name:~/Documents/AFLplusplus$ ./afl-fuzz -m 128 -d -i testcases/others/js/ -o myouts4 -- /home/mic/Documents/jerryscript/build/bin/jerry @@
[+] Loaded environment variable AFL_CUSTOM_MUTATOR_ONLY with value 1
[+] Loaded environment variable AFL_CUSTOM_MUTATOR_LIBRARY with value /home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so
afl-fuzz++4.00c based on afl by Michal Zalewski and a large online community
[+] afl++ is maintained by Marc "van Hauser" Heuse, Heiko "hexcoder" Eißfeldt, Andrea Fioraldi and Dominik Maier
[+] afl++ is open source, get it at https://github.com/AFLplusplus/AFLplusplus
[+] NOTE: This is v3.x which changes defaults and behaviours - see README.md
[+] No -M/-S set, autoconfiguring for "-S default"
[*] Getting to work...
[+] Using exponential power schedule (FAST)
[+] Enabled testcache with 50 MB
[*] Checking core_pattern...
[*] Checking CPU scaling governor...
[+] You have 24 CPU cores and 2 runnable tasks (utilization: 8%).
[+] Try parallel jobs - see docs/parallel_fuzzing.md.
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Checking CPU core loadout...
[+] Found a free CPU core, try binding to #0.
[*] Loading custom mutator library from '/home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so'...
[*] optional symbol 'afl_custom_post_process' not found.
[*] optional symbol 'afl_custom_havoc_mutation' not found.
[*] optional symbol 'afl_custom_havoc_mutation_probability' not found.
[*] Symbol 'afl_custom_describe' not found.
[+] Custom mutator '/home/mic/Documents/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-javascript.so' installed successfully.
[*] Scanning 'testcases/others/js/'...
[+] Loaded a total of 1 seeds.
[*] Creating hard links for all input files...
[*] Validating target binary...
[*] Spinning up the fork server...
[+] All right - fork server is up.
[*] Target map size: 65536
[*] No auto-generated dictionary tokens to reuse.
[*] Attempting dry run with 'id:000000,time:0,execs:0,orig:small_script.js'...
    len = 20, map size = 1386, exec speed = 174 us
[+] All test cases processed.
[+] Here are some useful stats:

    Test case count : 1 favored, 0 variable, 0 ignored, 1 total
       Bitmap range : 1386 to 1386 bits (average: 1386.00 bits)
        Exec timing : 174 to 174 us (average: 174 us)

[*] No -t option specified, so I'll use an exec timeout of 20 ms.
[+] All set and ready to roll!
_pick_non_term_node returns NULL: No such file or directory

_pick_non_term_node returns NULL: No such file or directory

Flags :

export RANDOM_MUTATION_STEPS=10000
export RANDOM_RECURSIVE_MUTATION_STEPS=10000
export SPLICING_MUTATION_STEPS=10000
export AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-javascript.so
export AFL_CUSTOM_MUTATOR_ONLY=1

Ubuntu 20.04
AFL++ 4.00

Any ideas ?

Test compilation error

When compilings with ENABLE_TESTING=ON, I get the following issue:

<some_path>/Grammar-Mutator/include/custom_mutator.h:26:9: error: empty struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat] typedef struct afl { ^ 1 error generated. make[2]: *** [tests/CMakeFiles/test_custom_mutator.dir/test_custom_mutator.cpp.o] Error 1 make[1]: *** [tests/CMakeFiles/test_custom_mutator.dir/all] Error 2 make: *** [all] Error 2

I think we need to remove this error somehow inside the code.

BTW I use the default Xcode compiler.

Is it possible to automatically eliminate indirect left-recursion

I generated the Lua grammar file using the following command:

python3 nautilus_py_grammar_to_json.py . /nautilus_py_grammars/lua.py . /lua.json

But when compiling it prompts :

make GRAMMAR_FILE=grammars/lua.json

error(119): /home/eqqie/Fuzz/Grammar-Mutator/grammars/Grammar.g4::: The following sets of rules are mutually left-recursive [node_FUNCTIONCALL, node_EXPR]

Even though I can manually eliminate such easy indirect left-recursion, it's a big headache for complex syntax, is it currently possible to do it automatically? 😂

Grammar Mutator crashes due to null pointer dereference on write_tree_to_file

A crash happens when writing some trees to file:

881 ret = write(fd, tree->ser_buf, tree->ser_len);
(gdb) bt
#0 0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650788 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
#1 0x00007ffff6a38335 in afl_custom_queue_new_entry (data=0x55555564f6e0, filename_new_queue=0x55555a6e7720 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", filename_orig_queue=0x55555565ac00 "fuzzer/custom/queue/id:000028,time:0,orig:34")
at grammar_mutator.c:563
#2 0x00005555555926e6 in add_to_queue (afl=0x7ffff7655010, fname=0x55555a6e7720 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", len=, passed_det=) at src/afl-fuzz-queue.c:473
#3 0x0000555555562c91 in save_if_interesting (afl=afl@entry=0x7ffff7655010, mem=mem@entry=0x7ffff7ffb000, len=109, fault=0 '\000') at src/afl-fuzz-bitmap.c:516
#4 0x000055555556ded3 in sync_fuzzers (afl=) at src/afl-fuzz-run.c:667
#5 0x000055555555f578 in main (argc=, argv_orig=, envp=) at src/afl-fuzz.c:2037
0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650a58 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
881 ret = write(fd, tree->ser_buf, tree->ser_len);
(gdb) p fd
$4 = 14
(gdb) p tree->ser_buf
Cannot access memory at address 0x20
(gdb) p tree
$5 = (tree_t *) 0x0

As we can see, 'tree' here is NULL, and yet tree->ser_buf is dereferenced:

(gdb) l
876 return;
877
878 }
879
880 // Write the data
881 ret = write(fd, tree->ser_buf, tree->ser_len);
882 if (unlikely(ret < 0)) {
883
884 perror("Unable to write (write_tree_to_file)");
885 return;
(gdb) bt
#0 0x00007ffff6a3a40a in write_tree_to_file (tree=0x0, filename=0x555555650a58 "fuzzer/custom/trees/id:000110,sync:main-pexploit,src:000001,+cov") at tree.c:881
#1 0x00007ffff6a38335 in afl_custom_queue_new_entry (data=0x55555564f9b0, filename_new_queue=0x55555577e7f0 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", filename_orig_queue=0x55555565af10 "fuzzer/custom/queue/id:000028,time:0,orig:34")
at grammar_mutator.c:563
#2 0x00005555555926e6 in add_to_queue (afl=0x7ffff7655010, fname=0x55555577e7f0 "fuzzer/custom/queue/id:000110,sync:main-pexploit,src:000001,+cov", len=, passed_det=) at src/afl-fuzz-queue.c:473
#3 0x0000555555562c91 in save_if_interesting (afl=afl@entry=0x7ffff7655010, mem=mem@entry=0x7ffff7ffb000, len=109, fault=0 '\000') at src/afl-fuzz-bitmap.c:516
#4 0x000055555556ded3 in sync_fuzzers (afl=) at src/afl-fuzz-run.c:667
#5 0x000055555555f578 in main (argc=, argv_orig=, envp=) at src/afl-fuzz.c:2037

My setup:
AFL_IMPORT_FIRST=1 AFL_NO_AFFINITY=1 AFL_MAP_SIZE=137344 afl-fuzz -x dict.dict -i ~/Grammar-Mutator/seeds/ -o fuzzer -S custom ./program
other children (and main) are not using grammar-mutator.
fuzzer/custom/trees/ exists and is populated per the manual.

SEGV in afl_custom_fuzz_count

I'm trying to fuzz mruby using the testcases in mruby/test/t/ (and not the testcases generated with grammar_generator) to test the antlr shim and I get

==1141== Memcheck, a memory error detector
==1141== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1141== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1141== Command: afl-fuzz -i in -o out3 -m none -- ./bin/mruby @@
==1141== 
==1141== Conditional jump or move depends on uninitialised value(s)
==1141==    at 0x12775F: bind_to_free_cpu (afl-fuzz-init.c:215)
==1141==    by 0x10F6D4: main (afl-fuzz.c:1091)
==1141== 
==1141== Invalid read of size 4
==1141==    at 0x6D29781: afl_custom_fuzz_count (grammar_mutator.c:304)
==1141==    by 0x13AFD0: fuzz_one_original (afl-fuzz-one.c:1679)
==1141==    by 0x138004: fuzz_one (afl-fuzz-one.c:4893)
==1141==    by 0x10FC14: main (afl-fuzz.c:1437)
==1141==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==1141== 
==1141== 
==1141== Process terminating with default action of signal 11 (SIGSEGV)
==1141==  Access not within mapped region at address 0x4
==1141==    at 0x6D29781: afl_custom_fuzz_count (grammar_mutator.c:304)
==1141==    by 0x13AFD0: fuzz_one_original (afl-fuzz-one.c:1679)
==1141==    by 0x138004: fuzz_one (afl-fuzz-one.c:4893)
==1141==    by 0x10FC14: main (afl-fuzz.c:1437)
==1141==  If you believe this happened as a result of a stack
==1141==  overflow in your program's main thread (unlikely but
==1141==  possible), you can try to increase the size of the
==1141==  main thread stack using the --main-stacksize= flag.
==1141==  The main thread stack size used in this run was 8388608.
==1141== 
==1141== HEAP SUMMARY:
==1141==     in use at exit: 4,641,445 bytes in 44,091 blocks
==1141==   total heap usage: 76,448 allocs, 32,357 frees, 9,150,336 bytes allocated
==1141== 
==1141== LEAK SUMMARY:
==1141==    definitely lost: 0 bytes in 0 blocks
==1141==    indirectly lost: 0 bytes in 0 blocks
==1141==      possibly lost: 16,896 bytes in 2 blocks
==1141==    still reachable: 4,624,549 bytes in 44,089 blocks
==1141==         suppressed: 0 bytes in 0 blocks
==1141== Rerun with --leak-check=full to see details of leaked memory
==1141== 
==1141== For counts of detected and suppressed errors, rerun with: -v
==1141== Use --track-origins=yes to see where uninitialised values come from
==1141== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

Exception in the ANTLR shim

After few minutes of fuzzing mruby using the testcases from test/t/

terminate called after throwing an instance of 'std::range_error'
  what():  wstring_convert::from_bytes
==13251== 
==13251== Process terminating with default action of signal 6 (SIGABRT)
==13251==    at 0x4E7AF47: raise (raise.c:51)
==13251==    by 0x4E7C8B0: abort (abort.c:79)
==13251==    by 0x71F2256: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD605: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD670: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71FD904: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x71F4C0B: std::__throw_range_error(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==13251==    by 0x6D94479: std::__cxx11::wstring_convert<std::codecvt_utf8<char32_t, 1114111ul, (std::codecvt_mode)0>, char32_t, std::allocator<char32_t>, std::allocator<char> >::from_bytes(char const*, char const*) (locale_conv.h:324)
==13251==    by 0x6D94032: antlrcpp::utf8_to_utf32[abi:cxx11](char const*, char const*) (StringUtils.h:43)
==13251==    by 0x6D934D4: antlr4::ANTLRInputStream::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ANTLRInputStream.cpp:40)
==13251==    by 0x6D9322B: antlr4::ANTLRInputStream::ANTLRInputStream(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (ANTLRInputStream.cpp:22)
==13251==    by 0x6D932CA: antlr4::ANTLRInputStream::ANTLRInputStream(char const*, unsigned long) (ANTLRInputStream.cpp:26)
==13251== 
==13251== HEAP SUMMARY:
==13251==     in use at exit: 1,082,303,564 bytes in 18,122,690 blocks
==13251==   total heap usage: 59,484,413 allocs, 41,361,723 frees, 3,887,947,618 bytes allocated
==13251== 
==13251== LEAK SUMMARY:
==13251==    definitely lost: 0 bytes in 0 blocks
==13251==    indirectly lost: 0 bytes in 0 blocks
==13251==      possibly lost: 69,776 bytes in 4 blocks
==13251==    still reachable: 1,082,233,788 bytes in 18,122,686 blocks
==13251==                       of which reachable via heuristic:
==13251==                         stdstring          : 52 bytes in 1 blocks
==13251==         suppressed: 0 bytes in 0 blocks
==13251== Rerun with --leak-check=full to see details of leaked memory
==13251== 
==13251== For counts of detected and suppressed errors, rerun with: -v
==13251== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted

A question about convert the ASCII

Hi,I had a problem when I using this tool.
I want use some numbers (eg. 1, 2, 3 ......) as a seed content then I will send it to a network server to fuzzing. But I found when I use socket to send it , the number will change to ASCII. I want the number can keep hex model(eg. 1 -> 01, 30 -> 30).
If you could tell me, think you very much.

Readme and copyright note

After the gsoc deadline, before that we make it public, make sure to include copyright headers like this:

/*
   american fuzzy lop++ - grammar mutator
   --------------------------------------

   Written by Shengtuo Hu

   Copyright 2020 AFLplusplus Project. All rights reserved.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at:

     http://www.apache.org/licenses/LICENSE-2.0

   A grammar-based custom mutator written for GSoC '20.

 */

In the readme, provide a bit of context (this is a gsoc project etc), a brief discussion about compatible grammars (antlr, json atm)
and a simple how to with a screenshot maybe.

Feebdack

I used the mruby example and got this just when starting up:

┘mutation error: No such file or directory

[-] PROGRAM ABORT : Error in custom_fuzz. Size returned: 0
         Location : fuzz_one_original(), src/afl-fuzz-one.c:1747

It should all be there:

AFL_CUSTOM_MUTATOR_ONLY=1
AFL_CUSTOM_MUTATOR_LIBRARY=/prg/Grammar-Mutator/trunk/src/libgrammarmutator.so
afl-fuzz -i in -o out -- mruby/bin/mruby @@
ls out/trees/
...
id:000070,time:0,orig:70  id:000156,time:0,orig:156  id:000242,time:0,orig:242
id:000071,time:0,orig:71  id:000157,time:0,orig:157  id:000243,time:0,orig:243
id:000072,time:0,orig:72  id:000158,time:0,orig:158  id:000244,time:0,orig:244
id:000073,time:0,orig:73  id:000159,time:0,orig:159  id:000245,time:0,orig:245
...

more feedback:

  • IMHO the GRAMMAR_FILE env var should always be required. having a JSON default is not helpful.

  • ./grammar_generator 123 100 1000 /tmp/seeds /tmp/trees -> not found. it is src/grammar_generator.
    better copy the grammar_generator and the .so to the project root when done compiling, maybe even with the grammar type in their filename?

export export AFL_CUSTOM_MUTATOR_LIBRARY=/path/to/libgrammarmutator.so -> double export, also again below

dont put -o to /tmp, this is not best practice. just leave paths away so the example work in the the current directory

How to add extras dynamically during fuzzing

As we know, naive afl would add some extras to auto_extras directory in queue and use them as a 'dynamic dictionary'.
I would like to implement this function also in grammar mutator, then when the fuzzer extracts some strings during execution, they could be passed to this mutator.
Are there any plan to make this feature ? Or I have to repeat modifying the grammar json file and rebuilding the .so library.

Long recursive calls cause afl to segfault

As discussed in #14 the following grammar causes a segfault from AFL (maybe only on startup?): https://paste.pr0.tips/rm

This is due to really long recursion:

#764 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#765 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#766 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#767 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#768 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#769 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#770 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#771 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#772 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#773 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#774 0x00007fffeed762f8 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#775 0x00007fffeed7a67a in antlr4::atn::ParserATNSimulator::closure_(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so
#776 0x00007fffeed763a5 in antlr4::atn::ParserATNSimulator::closureCheckingStopState(std::shared_ptr<antlr4::atn::ATNConfig> const&, antlr4::atn::ATNConfigSet*, std::unordered_set<std::shared_ptr<antlr4::atn::ATNConfig>, antlr4::atn::ATNConfig::Hasher, antlr4::atn::ATNConfig::Comparer, std::allocator<std::shared_ptr<antlr4::atn::ATNConfig> > >&, bool, bool, int, bool) () from /home/jrogers/Grammar-Mutator/libgrammarmutator-http.so

Issue with recursive javascript grammar

Hi @h1994st

I'm trying to use nautilus grammars. With ruby grammar just fine, but javascript grammar I have error with it.

cityoflight@v8:~/Grammar-Mutator$ make -j8 GRAMMAR_FILE=grammars/javascript.json
Found antlr-4.8-complete: /usr/local/lib/antlr-4.8-complete.jar
Selected grammar name: javascript (from /home/cityoflight/Grammar-Mutator/grammars/javascript.json)
python3 grammars/f1_c_gen.py /home/cityoflight/Grammar-Mutator/grammars/javascript.json /home/cityoflight/Grammar-Mutator
python3 grammars/f1_c_gen.py /home/cityoflight/Grammar-Mutator/grammars/javascript.json /home/cityoflight/Grammar-Mutator
make[1]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/Cyan4973_xxHash'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/Cyan4973_xxHash'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/rxi_map'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/rxi_map'
make[2]: Entering directory '/home/cityoflight/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party/antlr4-cpp-runtime'
make[1]: Leaving directory '/home/cityoflight/Grammar-Mutator/third_party'

^CTraceback (most recent call last):
  File "grammars/f1_c_gen.py", line 646, in <module>
Traceback (most recent call last):
  File "grammars/f1_c_gen.py", line 646, in <module>
    main(json.load(fp), sys.argv[2])
  File "grammars/f1_c_gen.py", line 632, in main
    fuzz_hdr, fuzz_src = CFuzzer(c_grammar).fuzz_src()
  File "grammars/f1_c_gen.py", line 317, in __init__
    main(json.load(fp), sys.argv[2])
  File "grammars/f1_c_gen.py", line 632, in main
    super().__init__(grammar)
  File "grammars/f1_c_gen.py", line 262, in __init__
    self.compute_rule_recursion()
  File "grammars/f1_c_gen.py", line 310, in compute_rule_recursion
    fuzz_hdr, fuzz_src = CFuzzer(c_grammar).fuzz_src()
  File "grammars/f1_c_gen.py", line 317, in __init__
    self.rule_recursion[n] = self.is_rule_recursive(n, rule, set())
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    super().__init__(grammar)
  File "grammars/f1_c_gen.py", line 262, in __init__
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    self.compute_rule_recursion()
  File "grammars/f1_c_gen.py", line 310, in compute_rule_recursion
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  [Previous line repeated 16 more times]
KeyboardInterrupt
    self.rule_recursion[n] = self.is_rule_recursive(n, rule, set())
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  File "grammars/f1_c_gen.py", line 285, in is_rule_recursive
    v = self.is_rule_recursive(rname, trule, seen | {rn})
  [Previous line repeated 15 more times]
  File "grammars/f1_c_gen.py", line 276, in is_rule_recursive
    for token in rule:
KeyboardInterrupt
make: *** [GNUmakefile:102: include/f1_c_fuzz.h] Interrupt
make: *** [GNUmakefile:102: src/f1_c_fuzz.c] Interrupt

I have to stop it with control C. And changing ctx.rule(u'PROGRAM',u'{STATEMENT}\n{PROGRAM}') to ctx.rule(u'PROGRAM',u'{STATEMENT}\n') still get same error.

I don't know the issue is in the javascript nautilus grammar file, or the generator grammars/f1_c_gen.py. If the recursive issue is in the grammar, what kind of pattern should I avoid?

Inconsistency between compilations

Hi!

I was testing the project with this simple grammar rules:

{
	"<START>": [
		["<?php", "<FUZZ>", "\n?>"]
	],
	"<FUZZ>": [
		["\ntry {\n try {\n", "<DEFCLASS>", "\n} catch (Exception $e){}} catch(Error $e){}", "<FUZZ>"],[]
	],
	"<DEFCLASS>" : [
		["class ", "<CLASSNAME>", " {\n\t", "<CLASSBODY>","}\n"],[]
	],
	"<CLASSNAME>" : [
		["Class01"],
		["Class02"],
		["Class03"],
		["Class04"],
		["Class05"]
	],
	"<CLASSBODY>" : [
		["test01;\n"],
		["test02;\n"],
		["test03;\n"]
	]
}

Every time I compile (make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json) and test the rules with the generator I obtain different results: sometimes it only picks a rule, other a small chain of rules...:

 psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
 psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
class Class03 {
	test01;
}
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   make clean && make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json && ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
<?php
try {
 try {
class Class04 {
	test03;

}

} catch (Exception $e){}} catch(Error $e){}
?>
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   make clean && make -j$(nproc) GRAMMAR_FILE=grammars/phpexcept.json && ./grammar_generator-phpexcept 100 1000 ./seeds ./trees
psyconauta@insulaalchimia ᐓ  ~/Grammar-Mutator |stable⚡ ᐓ   cat seeds/1
Class01

I am not sure if I am creating the rules in a wrong way (but checking documentation and the ruby example it looks fine to me).

Progress Report

Hi Shengtuo,

please put your weekly progress reports in this issue.
Besides, I have not seen any progress the last 5 days. Do you need assistance?

json to g4 only with "parser" cause some syntax error

In my experimental environment, I found json to g4 only with "parser" cause some syntax error, syntax parsing errors may lead to the possibility of losing a large amount of mutated data.

I made mincase lex.json:

{
    "<A>": [["<NUMBER>", "<STRING>", "\n"]],
    "<NUMBER>": [["10"], ["99"]],
    "<STRING>": [["(", "<HEXSTRING>", ")"]],
    "<HEXSTRING>": [["<CHAR>", "<HEXSTRING>"], []],
    "<CHAR>": [
            ["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
            ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"]
    ]
}

Grammar-Mutator make it, generate Grammar.g4 is:

grammar Grammar;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
node_STRING
    : '(' node_HEXSTRING ')'
    ;
node_HEXSTRING
    : 
    | node_CHAR node_HEXSTRING
    ;
node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

we prepared input data seed1 / seed2, and use antlr4-parse to testing:

Screen Shot 2024-01-18 at 17 03 03

why is 10(10) parsed incorrectly? because antlr4 is divided into two stages: lexer and parser. during lexer stage, node_NUMBER:10 will be recognized as TOKEN, and in the parser stage, the result is node_NUMBER (node_NUMBER), so an error occurred.

in the antlr4 grammar, lex rules begin with an uppercase letter, parser rules begin with a lowercase letter, so we should tell antlr4 the lexical rules clearly, patch Grammar_patch.g4:

grammar Grammar_patch;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER Node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
Node_STRING
    : '(' Node_HEXSTRING ')'
    ;
Node_HEXSTRING
    : 
    | Node_CHAR Node_HEXSTRING
    ;
Node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;

testing again:

Screen Shot 2024-01-18 at 17 18 58

the "warning" prompts us it can match the empty string, this may cause antlr4 parsing backtrace issues, but we can easily mark it with fragment Node_HEXSTRING

maybe we can optimize the json to g4 generation code, to distinguish between lexer and parser?

Replace C++ with C

Current C++ parts:

  • Chunk store relies on
  • Some parts rely on std::string, which can be replaced by C char[] and related functions

TBD:

  • googletesting is implemented in C++. Other candidates for the testing framework: CUnit, CMocka

`tree_from_buf` hangs when parsing a small test case

Environment

Ubuntu 20.04.1 on amd64. Grammar-mutator commit cbe5e32752773945e0142fac9f1b7a0ccb5dcdff and afl++ version 4.01a

Description

The grammar mutator takes an exceptionally long time to generate a tree from a given input (even when that input is empty) when using a custom grammar file attached below. More specifically, it hangs at the function tree_from_buf, a complete backtrace is included below.
I suspect this might have to do with the grammar file, since I haven't been able to reproduce the bug with other grammars. But since it is also true that the generated test case is the simplest possible case according to the grammar, I am unsure as to what might be causing it.

This is concerning because a fuzzing campaign using this grammar can only be started from a few (at most) inputs, and when interrupted the campaign cannot be resumed since the time cost becomes prohibitive.

How to reproduce

  • Build Grammar-Mutator using the file attached below with make ENABLE_DEBUG=1 GRAMMAR_FILE=python.json
  • We don't need any specific instrumented binary, since the issue takes place before the fuzzing starts, so I am going to use echo as an example
  • Generate a simple test case: grammar_generator-python 1 50 in out/default/trees. In my case the generated input was:
a=1
b=1.203213
foo='123213'
bar=b'123\x01'
1=a

which is one of the simplest test cases possible according to the grammar. With tree file (base64 encoded):

AQAAAAAAAAACAAAAAAAAAAIAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAArAAAAYT0xCmI9MS4y
MDMyMTMKZm9vPScxMjMyMTMnCmJhcj1iJzEyM1x4MDEnCgMAAAAAAAAAAgAAAAAAAAAKAAAAAAAA
AAEAAAAAAAAACwAAAAAAAAACAAAAAAAAAAwAAAAAAAAAAQAAAAAAAAAOAAAAAAAAAAIAAAAAAAAA
DwAAAAAAAAABAAAAAAAAAC8AAAAAAAAAAQAAAAAAAAAzAAAAAAAAAAEAAAAAAAAANAAAAAAAAAAB
AAAAAAAAADUAAAAAAAAAAQAAAAAAAAA2AAAAAAAAAAEAAAAAAAAAOQAAAAAAAAABAAAAAAAAADoA
AAAAAAAAAQAAAAAAAAA7AAAAAAAAAAEAAAAAAAAAPAAAAAAAAAABAAAAAAAAAD0AAAAAAAAAAQAA
AAAAAAA+AAAAAAAAAAEAAAAAAAAAPwAAAAAAAAABAAAAAAAAAEEAAAAAAAAAAgAAAAAAAABDAAAA
AQAAAAEAAAAAAAAAXAAAAAEAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAxQAAAAAAAAAAAAAAA
AAAAAA0AAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAAAAABAAAAPQ8AAAAAAAAAAQAAAAAAAAAvAAAA
AAAAAAEAAAAAAAAAMwAAAAAAAAABAAAAAAAAADQAAAAAAAAAAQAAAAAAAAA1AAAAAAAAAAEAAAAA
AAAANgAAAAAAAAABAAAAAAAAADkAAAAAAAAAAQAAAAAAAAA6AAAAAAAAAAEAAAAAAAAAOwAAAAAA
AAABAAAAAAAAADwAAAAAAAAAAQAAAAAAAAA9AAAAAAAAAAEAAAAAAAAAPgAAAAAAAAABAAAAAAAA
AD8AAAAAAAAAAQAAAAAAAABBAAAAAAAAAAIAAAAAAAAAQwAAAAAAAAABAAAAAAAAAFgAAAAAAAAA
AQAAAAAAAAAAAAAAAAAAAAAAAAABAAAAYUAAAAAAAAAAAAAAAAAAAABaAAAAAAAAAAEAAAAAAAAA
AAAAAAAAAAAAAAAAAQAAAApaAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAo=
  • Start fuzzing: AFL_CUSTOM_MUTATOR_ONLY=1 AFL_CUSTOM_MUTATOR_LIBRARY=libgrammarmutator-python.so afl-fuzz -i in -o out -- ./echo @@

The grammar file is:

# python.json
{
	"<START>": [["<defs>", "<program>"]],
	"<defs>": [["a=1\nb=1.203213\nfoo='123213'\nbar=b'123\\x01'\n"]],
	"<program>": [["<stmt>", "<NEWLINE>"], ["<stmt>", "<NEWLINE>", "<program>"]],
	
	"<decorator>": [["@", "<dotted_name>", "(", "<arglist>", ")", "<NEWLINE>"],
		["@", "<dotted_name>", "<NEWLINE>"]],
	"<decorators>": [["<decorator>"], ["<decorator>", "<decorators>"]],
	"<decorated>": [["<decorators>", "<classdef>"], ["<decorators>", "<funcdef>"]],
	"<funcdef>": [["def ", "<NAME>", "<parameters>", "->", "<test>", ":", "<suite>"],
		["def ", "<NAME>", "<parameters>", ":", "<suite>"]],
	"<parameters>": [["(", ")"], ["(", "<typedargslist>", ")"]],
	"<typedargslist>": [
		["<NAME>", ", ", "<typedargslist>"], ["<NAME>"], ["*", "<NAME>"], ["**", "<NAME>"], ["*", "<NAME>", ", ", "**", "<NAME>"], []
	],
	
	"<stmt>": [["<simple_stmt>"], ["<compound_stmt>"]],
	"<simple_stmt>": [["<small_stmt>", "<NEWLINE>"], ["<small_stmt>", ";", "<simple_stmt>"]],
	"<small_stmt>": [["<expr_stmt>"], ["<del_stmt>"], ["<pass_stmt>"], ["<flow_stmt>"],
		["<import_stmt>"], ["<global_stmt>"], ["<nonlocal_stmt>"], ["<assert_stmt>"]],
	"<aux_equals_sequence>": [["=", "<testlist_star_expr>"], ["=", "<yield_expr>"],
		["=", "<testlist_star_expr>", "<aux_equals_sequence>"], ["=", "<yield_expr>", "<aux_equals_sequence>"]],
	"<expr_stmt>": [["<testlist_star_expr>", "<augassign>", "<yield_expr>"],
		["<testlist_star_expr>", "<augassign>", "<testlist>"],
		["<testlist_star_expr>", "<aux_equals_sequence>"]],
	"<testlist_star_expr>": [["<test>"], ["<star_expr>"], ["<star_expr>", ",", "<testlist_star_expr>"],
		["<test>", ",", "<testlist_star_expr>"]],
	"<augassign>": [["+="], ["-="], ["*="], ["/="], ["%="],  ["&="],  ["|="],  ["^="], 
		["<<="],  [">>="],  ["**="],  ["//="]],
	"<del_stmt>": [["del ", "<exprlist>"]],
	"<pass_stmt>": [["pass"]],
	"<flow_stmt>": [["<break_stmt>"], ["<continue_stmt>"], ["<return_stmt>"], ["<raise_stmt>"], ["<yield_stmt>"]],
	"<break_stmt>": [["break"]],
	"<continue_stmt>": [["continue"]],
	"<return_stmt>": [["return ", "<testlist>"], ["return"]],
	"<yield_stmt>": [["<yield_expr>"]],
	"<raise_stmt>": [["raise"], ["raise ", "<test>"], ["raise ", "<test>", " from ", "<test>"]],
	"<import_stmt>": [["<import_name>"]],
	"<import_name>": [["import ", "<dotted_import_as_names>"]],
	"<dotted_import_as_name>": [["<dotted_import_name>"], ["<dotted_import_name>", " as ", "<NAME>"]],
	"<dotted_import_as_names>": [["<dotted_import_as_name>"], ["<dotted_import_as_name>", ", ", "<dotted_import_as_names>"]],
	"<aux_trailing_dots>": [[".", "<NAME>", "<aux_trailing_dots>"], [".", "<NAME>"]],
	"<dotted_import_name>": [["<IMODULE_NAME>"], ["<IMODULE_NAME>", "<aux_trailing_dots>"]],
	"<dotted_name>": [["<NAME>"], ["<IMODULE_NAME>"], ["<NAME>", "<aux_trailing_dots>"], ["<IMODULE_NAME>", "<aux_trailing_dots>"]],
	"<global_stmt>": [["global ", "<NAME>"]],
	"<nonlocal_stmt>": [["nonlocal ", "<NAME>"]],
	"<assert_stmt>": [["assert ", "<test>"]],
	"<compound_stmt>": [["<if_stmt>"], ["<while_stmt>"], ["<for_stmt>"], 
		["<try_stmt>"], ["<with_stmt>"], ["<funcdef>"], ["<classdef>"], ["<decorated>"]],
	"<aux_elif_stmts>": [[], ["elif ", "<test>", ":", "<suite>", "<aux_elif_stmts>"]],
	"<if_stmt>": [["if ", "<test>", ":", "<suite>", "<aux_elif_stmts>"], ["if ", "<test>", ":", "<suite>", "<aux_elif_stmts>", "else", ":", "<suite>"]],
	"<while_stmt>": [["while ", "<test>", ":", "<suite>"], ["while ", "<test>", ":", "<suite>", "else", ":", "<suite>"]],
	"<for_stmt>": [["for ", "<exprlist>", " in ", "<testlist>", " : ", "<suite>"],
		["for ", "<exprlist>", " in ", "<testlist>", ":", "<suite>", "else", ":", "<suite>"]],
	"<aux_except_seq>": [["<except_clause>", ":", "<suite>"], ["<except_clause>", ":", "<suite>", "<aux_except_seq>"]],
	"<try_stmt>": [["try", ":", "<suite>", "<aux_except_seq>", "else", ":", "<suite>"],
		["try", ":", "<suite>", "<aux_except_seq>", "finally", ":", "<suite>"],
		["try", ":", "<suite>", "<aux_except_seq>", "else", ":", "<suite>", "finally", ":", "<suite>"],
		["try", ":", "<suite>", "finally", ":", "<suite>"]],
	"<with_stmt>": [["with ", "<with_item>", ":", "<suite>"]],
	"<with_item>": [["<test>"], ["<test>", " as ", "<expr>"]],
	"<except_clause>": [["except"], ["except ", "<test>"], ["except ", "<test>", " as ", "<NAME>"]],
	"<aux_stmt_seq>": [["<stmt>"], ["<stmt>", "<aux_stmt_seq>"]],
	"<suite>": [["<simple_stmt>"], ["<NEWLINE>", "arrancalasuite", "<NEWLINE>", "<aux_stmt_seq>", "<NEWLINE>", "terminalasuite", "<NEWLINE>"]],
	"<test>": [["<or_test>", " if ", "<or_test>", " else ", "<test>"],
		["<or_test>"], ["<lambdef>"]],
	"<test_nocond>": [["<or_test>"], ["<lambdef_nocond>"]],
	"<lambdef>": [["lambda ", "<typedargslist>", ":", "<test>"], ["lambda ", ":", "<test>"]],
	"<lambdef_nocond>": [["lambda ", "<typedargslist>", ":", "<test_nocond>"], ["lambda ", ":", "<test_nocond>"]],
	"<or_test>": [["<and_test>"], ["<and_test>", " or ", "<or_test>"]],
	"<and_test>": [["<not_test>"], ["<not_test>", " and ", "<and_test>"]],
	"<not_test>": [["not ", "<not_test>"], ["<comparison>"]],
	"<comparison>": [["<expr>"], ["<expr>", "<comp_op>", "<comparison>"]],
	"<comp_op>": [["<"], [">"], ["=="], [">="], ["<="], ["!="], [" in "], [" not in "], [" is "], [" is not "]],
	"<star_expr>": [["*", "<expr>"]],
	"<expr>": [["<xor_expr>"], ["<xor_expr>", "|", "<expr>"]],
	"<xor_expr>": [["<and_expr>"], ["<and_expr>", "^", "<xor_expr>"]],
	"<and_expr>": [["<shift_expr>"], ["<shift_expr>", "&", "<and_expr>"]],
	"<shift_expr>": [["<arith_expr>"], ["<arith_expr>", "<<", "<shift_expr>"], ["<arith_expr>", ">>", "<shift_expr>"]],
	"<arith_expr>": [["<term>"], ["<term>", "+", "<arith_expr>"], ["<term>", "-", "<arith_expr>"]],
	"<term>": [["<factor>"],
		["<factor>", "*", "<term>"],
		["<factor>", "/", "<term>"],
		["<factor>", "%", "<term>"],
		["<factor>", "//", "<term>"]],
	"<factor>": [["+", "<factor>"],
		["-", "<factor>"],
		["~", "<factor>"],
		["<power>"]],
	"<aux_trailer_seq>": [[], ["<trailer>", "<aux_trailer_seq>"]],
	"<power>": [["<atom>", "<aux_trailer_seq>"],
		["<atom>", "<aux_trailer_seq>", "**", "<factor>"]],
	"<aux_string_seq>": [["<STRING>"], ["<STRING>", "<aux_string_seq>"]],
	"<atom>": [["<NAME>"], ["<NUMBER>"], ["<aux_string_seq>"], ["..."], ["None"], ["True"], ["False"],
		["[]"], ["{}"], ["[", "<testlist_comp>", "]"], ["{", "<dictorsetmaker>", "}"], ["()"],
		["(", "<yield_expr>", ")"], ["(", "<testlist_comp>", ")"]],
	"<aux_test_star_seq>": [[], [",", "<test>", "<aux_test_star_seq"], [",", "<star_expr>", "<aux_test_star_seq>"]],
	"<testlist_comp>": [["<test>", "<comp_for>"], ["<star_expr>", "<comp_for>"],
		["<test>", "<aux_test_star_seq>"], ["<star_expr>", "<aux_test_star_seq>"]],
	"<trailer>": [["()"], ["(", "<arglist>", ")"], ["[]"], ["[", "<subscriptlist>", "]"], [".", "<NAME>"]],
	"<subscriptlist>": [["<subscript>"], ["<subscript>", ",", "<subscriptlist>"]],
	"<subscript>": [["<test>"], [":"], ["<test>", ":"], [":", "<test>"], [":", "<sliceop>"], ["<test>", ":", "<test>"], [":", "<test>", "<sliceop>"], ["<test>", ":", "<sliceop>"], ["<test>", ":", "<test>", "<sliceop>"]],
	"<sliceop>": [[":"], [":", "<test>"]],
	"<exprlist>": [["<expr>"], ["<star_expr>"], ["<expr>", ",", "<exprlist>"], ["<star_expr>", ",", "<exprlist>"]],
	"<testlist>": [["<test>"], ["<test>", ",", "<testlist>"]],
	"<aux_test_seq>": [[], [",", "<test>", ":", "<test>", "<aux_test_seq>"]],
	"<aux_test_seq2>": [[], [",", "<test>", "<aux_test_seq>"]],
	"<dictorsetmaker>": [["<test>", ":", "<test>", " ", "<comp_for>"],
		["<test>", ":", "<test>", " ", "<aux_test_seq>"],
		["<test>", " ", "<comp_for>"],
		["<test>", " ", "<aux_test_seq2>"]],
	
	"<classdef>": [["class ", "<NAME>", "(", "<arglist>", ")", ":", "<suite>"], ["class ", "<NAME>", "()", ":", "<suite>"], ["class ", "<NAME>", ":", "<suite>"]],
	
	"<aux_arugment_seq>": [[], ["<argument>", ",", "<aux_arugment_seq>"]],
	"<arglist>": [["<aux_arugment_seq>", "**", "<test>"], ["<aux_arugment_seq>","<argument>"], ["<aux_arugment_seq>", "*", "<test>", ",", "<aux_arugment_seq>", "**", "<test>"], ["<aux_arugment_seq>", "*", "<test>", ",", "<aux_arugment_seq>"]],
	"<argument>": [["<NAME>"], ["<NAME>", "<comp_for>"], ["<NAME>", "=", "<test>"]],
	"<comp_iter>": [["<comp_for>"], ["<comp_if>"]],
	"<comp_for>": [["for ", "<exprlist>", " in ", "<or_test>"], ["for ", "<exprlist>", " in ", "<or_test>", "<comp_iter>"]],
	"<comp_if>": [["if ", "<test_nocond>"], ["if ", "<test_nocond> ", "<comp_iter>"]],
	
	"<yield_expr>": [["yield"], ["yield ", "<yield_arg>"]],
	"<yield_arg>": [["from ", "<test>"], ["from ", "<testlist>"]],
	"<NAME>": [["a"], ["b"], ["foo"], ["bar"], ["foobar"]],
	"<IMODULE_NAME>": [["math"]],
	"<NEWLINE>": [["\n"]],
	"<STRING>": [["'asd'"], ["(0x1000 * 'a')"], ["'aaaaaaaaaaaabbbbbbbbbbbbbbbb'"], ["'zarakatunga'"]],
	"<NUMBER>": [["0"], ["1"], ["0.1"], ["1/2"], ["-1"], ["11111111111"], ["1e100"], ["1e-100"], ["1e999999"], ["3.14"]]
}

I've run afl-fuzz with gdb, let it run for some minutes and interrupted it in order to check where it was hanging and got this backtrace (the same after several tries):

[...]
#61 0x00007fffeebd69dc in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555637860, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:136
#62 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555637920, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#63 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555634240, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (use count 5, weak count 0) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#64 0x00007fffeebd69dc in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x5555556394d0, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (empty) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:136
#65 0x00007fffeebd6b2b in antlr4::atn::LL1Analyzer::_LOOK (this=0x7fffffffb340, s=0x555555639770, stopState=0x0, ctx=std::shared_ptr<antlr4::atn::PredictionContext> (empty) = {...}, look=..., lookBusy=std::unordered_set with 864043 elements = {...}, calledRuleStack=..., seeThruPreds=true, addEOF=true) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:145
#66 0x00007fffeebd620b in antlr4::atn::LL1Analyzer::LOOK (this=0x7fffffffb340, s=0x555555639770, stopState=0x0, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:67
#67 0x00007fffeebd60d7 in antlr4::atn::LL1Analyzer::LOOK (this=0x7fffffffb340, s=0x555555639770, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/LL1Analyzer.cpp:57
#68 0x00007fffeeba7aef in antlr4::atn::ATN::nextTokens (this=0x7fffeed46600 <GrammarParser::_atn>, s=0x555555639770, ctx=0x0) at antlr4-cpp-runtime-src/runtime/src/atn/ATN.cpp:86
#69 0x00007fffeeba7bd1 in antlr4::atn::ATN::nextTokens (this=0x7fffeed46600 <GrammarParser::_atn>, s=0x555555639770) at antlr4-cpp-runtime-src/runtime/src/atn/ATN.cpp:94
#70 0x00007fffeec45bb3 in antlr4::DefaultErrorStrategy::sync (this=0x555555664940, recognizer=0x7fffffffb750) at antlr4-cpp-runtime-src/runtime/src/DefaultErrorStrategy.cpp:103
#71 0x00007fffeeb2f354 in GrammarParser::node_program (this=0x7fffffffb750) at generated/GrammarParser.cpp:233
#72 0x00007fffeeb2eaff in GrammarParser::node_START (this=0x7fffffffb750) at generated/GrammarParser.cpp:132
#73 0x00007fffeeb2e668 in GrammarParser::entry (this=0x7fffffffb750) at generated/GrammarParser.cpp:75
#74 0x00007fffeeb226e6 in tree_from_buf (data_buf=0x7ffff7ffb000 "a=1\nb=1.203213\nfoo='123213'\nbar=b'123\\x01'\n1=a\n\n", data_size=48) at antlr4_shim.cpp:95
#75 0x00007fffeeb1c408 in load_tree_from_test_case (filename=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0") at tree.c:861
#76 0x00007fffeeb1968f in afl_custom_queue_get (data=0x555555665d20, filename=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0") at grammar_mutator.c:216
#77 0x00007fffeeb1a306 in afl_custom_queue_new_entry (data=0x555555665d20, filename_new_queue=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0", filename_orig_queue=0x0) at grammar_mutator.c:681
#78 0x000055555557fb62 in run_afl_custom_queue_new_entry (afl=0x7ffff7552010, q=0x5555556610e0, fname=0x5555556042c0 "out/default/queue/id:000000,time:0,execs:0,orig:0", mother_fname=0x0) at src/afl-fuzz-mutators.c:41
#79 0x000055555559350f in pivot_inputs (afl=0x7ffff7552010) at src/afl-fuzz-init.c:1358
#80 0x000055555558b4d9 in main (argc=8, argv_orig=0x7fffffffdf18, envp=0x7fffffffdf60) at src/afl-fuzz.c:1832

Only part of the trace is included, from that point up it just repeats the last three calls.

Please let me know if I can provide any more info, or if you've got any hint as to what might be the problem.

Wasteful rebuilding of non-terminal trees

tree_get_non_terminal_nodes() always recalculates the non_terminal_node_list, and is called for every new sample at afl_custom_fuzz_count(). Those elements are popped at afl_custom_fuzz() before being passed to the mutator.
Instead, we could keep the non-terminal list in the serialized tree format and update it per mutation. It will not be recalculated for every sample.

Segmentation fault when dealing with hex-ANSII conversion

Hi, I run into some problems when trying to generate a hex corpus and use that in a fuzz.

The version I use is AFL++ 4.01a release version and the latest of Grammar-Mutator in the stable branch. The fuzz target is compiled using afl-gcc-fast.

I'm trying to generate seeds based on the grammar shown below, following the solution in issue #29.

{
    "<start>": [["hex: ", "<hex>", "<hex2>"]],
    "<hex>": [["\u0087"], ["\u005a"]], 
    "<hex2>":[["\u0000"], ["\u0001"], ["\u0002"], ["\u0003"], ["\u0004"], ["\u0005"], ["\u0006"], ["\u0007"],
              ["\u0008"], ["\u0009"], ["\u000a"], ["\u000b"], ["\u000c"], ["\u000d"], ["\u000e"], ["\u000f"],
              ["\u0010"], ["\u0011"], ["\u0012"], ["\u0013"], ["\u0014"], ["\u0015"], ["\u0016"], ["\u0017"],
              ["\u0018"], ["\u0019"], ["\u001a"], ["\u001b"], ["\u001c"], ["\u001d"], ["\u001e"], ["\u001f"]]
}

I can successfully build the grammar mutator without any error.

Seeds can be generated using the grammar generator. I tested a few of them and they seem to be what I expected.

But when running afl-fuzz for the target, it will cause a segmentation fault before going into the fuzzing interface.

[*] Attempting dry run with 'id:000099,time:0,execs:0,orig:0'...
    len = 7, map size = 172, exec speed = 25 us
[!] WARNING: No new instrumentation output, test case may be useless.
[+] All test cases processed.
[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You have lots of input files; try starting small.
[+] Here are some useful stats:

    Test case count : 1 favored, 1 variable, 98 ignored, 100 total
       Bitmap range : 172 to 172 bits (average: 172.00 bits)
        Exec timing : 31 to 112 us (average: 28 us)

[*] No -t option specified, so I'll use an exec timeout of 20 ms.
[+] All set and ready to roll!
Segmentation fault

When I replaced "<hex>": [["\u0087"], ["\u005a"]], with "": [["\u001f"], ["\u001f"]] (some smaller numbers) in the grammar, the fuzzer is working fine.

Can someone help me with this problem? Any help is much appreciated.

Let me know if any other information is needed.

incorrect rule index deduction from ANTLR

During the input parsing shim, nodes are created using
node = node_create_with_rule_id(non_terminal_node->getRuleIndex(), non_terminal_node->getAltNumber() - 1);
However, in my tests the antlr4::ParserRuleContext node's getAltNumber() returns 0 on OUTER recursive grammar nodes. Therefore all nodes up to the inner one will have invalid rule_id.

For example, for this G4 grammar:
A: B | A B B: "MYTOKEN" entry: A

The input "MYTOKEN MYTOKEN MYTOKEN" will be parsed as
entry -> A -> A -> A -> B
++++++|+++| -> B
++++++| -> B

The last A will have rule_id = 0, the previous ones have rule_id = MAXUINT.
While incidentally specifically here this will not screw up the fuzzer behavior, when there are various recursive expansions it is a major issue.

Trimmed data returned by custom mutator is larger than original data

Error message:

[-] PROGRAM ABORT : Trimmed data returned by custom mutator is larger than original data
         Location : trim_case_custom(), src/afl-fuzz-mutators.c:287

The trimming strategies in the grammar mutator aim at reducing the tree size (i.e., the total number of non-terminal nodes). However, this does not guarantee the corresponding string is relatively small. For example, in JSON:

  • An input buffer is "\r-10", which is 4 bytes and has 16 non-terminal nodes.
  • The output trimmed buffer is "false", which is 5 bytes and has 6 non-terminal nodes.

Potential solution: we can allow the execution of the target, even if the trimmed data returned by the custom mutator is larger than the original data. This moves the responsibility of checking the trimming "size" to the custom mutator instead of the fuzzer.

(Need to think more on the solution)

Idea list

Let's collect some ideas on how to improve the grammar mutator.
I am not an expert on this, so some ideas might be not possible, no sense or even makes things worse.

  • Use the dictionary with the grammar (-x + LTO AUTODICT feature)
  • Increase the tree depth with every new cycle without finds (example on how to pass this to the mutator is in examples/honggfuzz/honggfuzz.c)
  • ... ?

Also:
document for a mutation which mutation strategies were used, and if it results in a new path, crash or hang, document these away somewhere (fopen("a")... fwrite() ... fclose() would be fine enough), and learn which types are more effective than others, and then try to improve them. maybe weighting, maybe changing how unsuccessful techniques work, etc. (and of course this feature with an #ifdef TESTING or something like that.

pinging @h1994st @andreafioraldi @eqv for more ideas

Memory leaks in `splicing_mutation`

As indicated by CI results, there are 5 memory leaks in splicing_mutation (see below).

Better to throw an error while encountering memory leaks?

==4365== 8 bytes in 1 blocks are indirectly lost in loss record 1 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED894: node_init_subnodes (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB6F: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 64 bytes in 1 blocks are indirectly lost in loss record 2 of 5
==4365==    at 0x483B723: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x483E017: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48EDA66: node_set_val (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB45: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB8B: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 72 bytes in 1 blocks are indirectly lost in loss record 3 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED7A9: node_create (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB20: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 72 bytes in 1 blocks are indirectly lost in loss record 4 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48ED7A9: node_create (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB20: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EDB8B: node_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EE647: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365== 
==4365== 288 (72 direct, 216 indirect) bytes in 1 blocks are definitely lost in loss record 5 of 5
==4365==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4365==    by 0x48EE63C: tree_clone (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x48EEE9C: splicing_mutation (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/src/libgrammarmutator-ruby.so)
==4365==    by 0x117A81: TreeMutationTest_SplicingMutation_Test::TestBody() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x14AD80: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E369: testing::Test::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13E801: testing::TestInfo::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13EA0D: testing::TestSuite::Run() [clone .part.0] (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x13FACC: testing::internal::UnitTestImpl::RunAllTests() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x140037: testing::UnitTest::Run() (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==    by 0x116E23: main (in /home/runner/work/Grammar-Mutator/Grammar-Mutator/tests/test_tree_mutation)
==4365==

core dump

afl-fuzz coredumps in the grammar mutator with Program received signal SIGSEGV, Segmentation fault.

#0  0x00007ffff7fb6f56 in afl_custom_trim (data=0x555555647690, out_buf=0x7fffffffc638) at /prg/Grammar-Mutator/branches/dev/src/grammar_mutator.cpp:114
#1  0x0000555555562a20 in trim_case_custom (mutator=0x5555556459f0, in_buf=0x7ffff7ffb000 "30E-0", q=0x5555556d9f20, afl=0x5555555c0400) at src/afl-fuzz-mutators.c:277
#2  trim_case (afl=0x5555555c0400, q=0x5555556d9f20, in_buf=0x7ffff7ffb000 "30E-0") at src/afl-fuzz-run.c:629
#3  0x000055555558465d in fuzz_one_original (afl=0x5555555c0400) at src/afl-fuzz-one.c:526
#4  0x000055555555c82e in fuzz_one (afl=0x5555555c0400) at src/afl-fuzz-one.c:4731
#5  main (argc=<optimized out>, argv_orig=<optimized out>, envp=<optimized out>) at src/afl-fuzz.c:1278

command line was

# env|grep AFL
AFL_CUSTOM_MUTATOR_ONLY=1
AFL_CUSTOM_MUTATOR_LIBRARY=/prg/Grammar-Mutator/branches/dev/build/src/libgrammarmutator.so
# afl-fuzz -i in -o out -- ../../json-parser/test_json @@

UnicodeEncodeError

Hi,

I'm trying to use the tool to generate Javascript testcases, however when I run make GRAMMAR_FILE=grammars/javascript.json
I got the following result: UnicodeEncodeError: 'latin-1' codec can't encode character '\u2421' in position 0: ordinal not in range(256)

Then if I try to do the same for the ruby.json, the previous failed execution somehow corrupt the .jar file Error: Invalid or corrupt jarfile /usr/local/lib

Any help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.