Hi!
I have played around with the fuzzers and have several variations and new fuzzers on my private repo. I wanted to file an issue here to have a discussion if and how to upstream this. I also have some thoughts, so I figured I post everything here in case anyone want to comment.
To prove this is not just hot air, here are the issues I have found, all luckily pretty harmless:
Cmake and C++
To make debugging and development easy, I use C++ (it's my primary language) and cmake because it is easy to get an IDE (I use qtcreator) to navigate the code, debug and build it. It is easy to work with, since one can easily set a breakpoint in the debugger when trying to get the fuzzer to reach a certain place in the code. I also refactored the fuzzer to use RAII style so the cleanup is easier.
Using boost asio instead of cstyle select
I use asio to asynchronously wait for network events. Since I am used to it and it is the most well known way to network in C++, it is a good choice. I think I also managed to handle timeouts in a way such that the fuzzing speed is improved, but this is a tricky topic since it involves interaction between curl, asio and waiting for the OS.
I think it is a good foundation to build on, since it is possible to for instance interface openssl through asio, which would open up for fuzzing the contents of https instead of the encrypted layer as the current fuzzers do.
Internal fuzzers
The existing fuzzers use curl as just any libcurl user. I wanted to stress test some functions directly, so I put some fuzzers which access curl from within. I do this by adding an optional section inside a curl CMakefile, which includes files from the curl-fuzzer repo. This will uglify curl, but it is by default off and placed within if (ENABLE_FUZZING)
sections.
The benefit is that these fuzz targets can be very focused, and do not need to go through being invented by the TLV fuzzer and sent over the socket. Not all functions can be reached by the existing fuzzers.
I add internal fuzzers for
- cookie handling
- doh payload encode/decode
- escape/unescape
- netrc parsing
I think it may be a good idea to fuzz setting curloptions here, so one does not have to let that happen only through the existing fuzzers.
Fuzzing a single function like this, for instance the doh parser, is what libFuzzer is excellent at. It will not be limited by network timeouts or tearing up/down sockets.
These internal fuzzers cover almost everything within a few hours, so these are nice to have, but could perhaps run through a CI build instead of wasting slots at oss fuzz (unless they schedule fuzzing effort to avoid it, I do not know).
Structure aware fuzzing
The fuzz data uses TLV (type length value) which means the default mutation strategy will very likely break the content. Most of the input the fuzzer engine generates is most likely garbage, and has to be rejected while unpacking the TLV during setting curl options.
This is inefficient, but it also makes it difficult for the fuzzing engine to make meaningful input since both the type, content and alignment with other blocks have to match.
So I wrote a custom mutator, which parses the TLV while throwing away parts that don't make sense. With a list of blocks, it can now apply mutation on a single block, then serializing it again. It also implements the custom crossover, for mixing two test cases.
This works, but I have a hard time evaluating it since it takes a very long time to build up the corpus for the existing fuzzers. It sure finds new paths when starting from empty, so I figure if it can do that, it will also be good at exposing cases not yet known.
A proposal
I think it is good to discuss before sending pull requests. Here is my suggestion:
- the existing fuzzers are left as is, with build system and all. they work well, and until "proven" it does not make sense to remove them
- a new fuzzer build is made in parallel to the existing. It uses cmake.
- the fuzzers which do what the existing ones do, but with the custom mutator, are added to the new fuzzer build
- the internal fuzzers are added (but maybe not added by default to ossfuzz)
What do I want help with now
I would like to get either access to the corpus from oss fuzz or a copy of it, so I can see which parts of curl are not fuzzed and evaluate the new fuzzers I write. I think the link is this one, based on another project I work with. It needs login through one of the admins: https://console.cloud.google.com/storage/browser/curl-backup.clusterfuzz-external.appspot.com
Thanks,
Paul