GithubHelp home page GithubHelp logo

langston-barrett / treereduce Goto Github PK

View Code? Open in Web Editor NEW
41.0 2.0 1.0 1.94 MB

A fast, parallel, syntax-aware test case reducer based on tree-sitter grammars

Home Page: https://langston-barrett.github.io/treereduce/

License: MIT License

Makefile 0.15% Rust 7.54% Nix 0.04% Shell 0.20% C 12.91% C++ 77.15% Java 0.01% Python 0.30% ANTLR 1.69%
delta-debugging test-reduction tree-sitter program-reduction test-case-minimization test-case-reduction

treereduce's Introduction

treereduce

treereduce is a fast, parallel, syntax-aware test case reducer based on tree-sitter grammars. In other words, treereduce helps you shrink structured data (especially source code) while maintaining some property of interest, for example, that the program causes a compiler crash or outputs a certain message. See the documentation for more information. Documentation is also available online.

treereduce's People

Contributors

dependabot[bot] avatar langston-barrett avatar ligurio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

alexet

treereduce's Issues

v0.2.2

I should cut a new release after all the dependency bumps.

Statistics

--stats option to output:

  • Number of passes, duration of each pass
  • Total duration
  • Initial size, final size, ratio of sizes
  • Bytes deleted per second
  • Attempted and successful reductions (total and per strategy)
  • Completed and pre-empted executions

Check program stdout, stderr against a regular expression

It is faster to check a program's stdout or stderr against a regex natively in Rust, rather than to use a Bash script and grep to do so. We should implement --interesting-stdout-regex and --interesting-stderr-regex for this purpose, similar to --interesting-exit-code.

Awareness of matched delimiters

If a node to be deleted ends in {, [, or (, the next child of the same parent that begins with the matching delimiter should also be considered for deletion. Before introducing this, measurements should be taken to see if it comes up in practice.

hangs in v0.3.0

since the update to v0.3.0, I noticed that treereduce-rust would often start reducing but then just hang in the middle and no longer make any progress.

treereduce-rust  --passes=10  --min-reduction=10 --interesting-exit-code=101 --stats --source  /home/matthias/vcs/github/RUST_CODE/ALL_TESTS/icemaker/04FCCD075E8AFA9FB081E547DEE5329C2F2F0AF68FD1649DF975149BE0F1E10D.rs  -- ~/.rustup/toolchains/local-debug-assertions/bin/rustc   @@.rs
[INFO] Starting pass 1 / 10
[INFO] Original size: 10697
[INFO] Reduced to size: 7836 id=32 kind="delete" priority=2861 size=7836
[INFO] Reduced to size: 5001 id=33 kind="delete" priority=2835 size=5001
[INFO] Reduced to size: 3903 id=34 kind="delete" priority=1098 size=3903
[INFO] Reduced to size: 2893 id=35 kind="delete" priority=1010 size=2893
[INFO] Reduced to size: 2266 id=36 kind="delete" priority=627 size=2266
[INFO] Reduced to size: 1683 id=37 kind="delete" priority=583 size=1683
[INFO] Reduced to size: 1548 id=50 kind="delete" priority=135 size=1548
[INFO] Reduced to size: 1468 id=65 kind="delete" priority=80 size=1468
[INFO] Reduced to size: 1389 id=66 kind="delete" priority=79 size=1389
[INFO] Reduced to size: 1314 id=67 kind="delete" priority=75 size=1314
[INFO] Reduced to size: 1246 id=68 kind="delete" priority=68 size=1246
[INFO] Reduced to size: 1180 id=69 kind="delete" priority=66 size=1180
[INFO] Reduced to size: 1115 id=70 kind="delete" priority=65 size=1115
[INFO] Reduced to size: 1051 id=71 kind="delete" priority=64 size=1051
[INFO] Reduced to size: 987 id=72 kind="delete" priority=64 size=987
[INFO] Reduced to size: 924 id=73 kind="delete" priority=63 size=924
[INFO] Reduced to size: 862 id=74 kind="delete" priority=62 size=862
[INFO] Reduced to size: 805 id=75 kind="delete" priority=57 size=805
[INFO] Reduced to size: 765 id=77 kind="delete" priority=40 size=765
[INFO] Reduced to size: 728 id=79 kind="delete" priority=37 size=728
[INFO] Reduced to size: 692 id=80 kind="delete" priority=36 size=692
[INFO] Reduced to size: 657 id=81 kind="delete" priority=35 size=657
[INFO] Reduced to size: 622 id=82 kind="delete" priority=35 size=622
[INFO] Reduced to size: 588 id=83 kind="delete" priority=34 size=588
[INFO] Reduced to size: 554 id=84 kind="delete" priority=34 size=554
[INFO] Reduced to size: 523 id=86 kind="delete" priority=31 size=523
[INFO] Reduced to size: 497 id=91 kind="delete" priority=26 size=497
[INFO] Reduced to size: 473 id=93 kind="delete" priority=24 size=473
[INFO] Reduced to size: 449 id=94 kind="delete" priority=24 size=449
[INFO] Reduced to size: 425 id=95 kind="delete" priority=24 size=425
[INFO] Reduced to size: 401 id=96 kind="delete" priority=24 size=401
[INFO] Reduced to size: 385 id=98 kind="delete" priority=16 size=385
[INFO] Reduced to size: 370 id=99 kind="delete" priority=15 size=370
[INFO] Reduced to size: 357 id=101 kind="delete" priority=13 size=357

Downgrading to 0.2.2 fixed this for me :/

treereduce_hang.rs.zip

No reduction when interestingness test always succeeds

There's a bug where treereduce won't reduce a program if the interestingness check always succeeds:

treereduce-c --no-verify --jobs 24 --output bench.c -s crates/treereduce/benches/c/basic.c --stats -v -- /nix/store/i9q0jv6qnvg7zal98rqi7aq31k3p89hw-coreutils-9.0/bin/true |& grep -v Idling

Support additional languages

At the very least:

  • C#
  • Java
  • JavaScript
  • OCaml (no static binaries)
  • Python (no static binaries)
  • Rust
  • Souffle
  • Swift

Syntax-unaware reduction pass(es)

Since treereduce operates directly on the AST nodes, its output often has extraneous whitespace. It would be great to have a simple pass that attempts to delete sequences of more than two newlines.

Possibly, this could be (or evolve into) more sophisticated reduction strategies a la halfempty.

`--timeout`

A flag that specifies a timeout on each interestingness test. Possibly --auto-timeout which times out whenever a test takes 10x longer than the first one did.

Caching

This paper points out that sundry situations can cause test-case reducers to retry inputs they've already attempted. It's not clear to me if treereduce does this; in either case I should introduce a cache mapping test-cases (or maybe their hashes) to the result of the interestingness test. At first, treereduce should produce an error when it's about to test an already-tested variant so that I can uncover why it's happening and perhaps just avoid it in the first place. If that's not possible, then the cache should be maintained as it will reduce testing times!

https://ieeexplore.ieee.org/abstract/document/7962327

Synthetic benchmarks

With different implementations of judging if a given change is considered "interesting", we can easily produce synthetic benchmarks that can be used to tune the performance of the reduction algorithm. Some ideas:

  • Never accept reductions
  • Always accept reductions
  • Accept half of reductions, randomly
  • Accept reductions only to well-formed programs (e.g., run a compiler)
  • Accept reductions that preserve some arbitrary set of spans

Scaling benchmarks

It would be nice to see how treereduce (and other tools) scale with an increase in CPU cores.

rust: fails to reduce away items completely

Often when reducing fuzzed code, for example

impl<T> VSet<T, {  }> {
    pub fn new(
        _: for<'a> fn(
            capture: T,
            std::marker::PhantomData<&'a ()>,
        ) -> <M2::Yokeable as Yokeable<'a>>::Output,
    ) -> Self {
        
    }
}

treereduce will manage to reduce away items, but keep the commas separating them
=>

impl<T> VSet<T, {  }> {
    pub fn new(
        ,
    ) -> Self {
        
    }
}

which will then throw annoying syntax errors, hinder further reduction or throw off rustfmt

I think I have seen this with enum/structs too

Zeroing

Zeroing is a language-specific reduction technique that replaces (non-optional) nodes with a specified, pre-configured "small" version. For example, the string node in the Python grammar might be replaced with "".

hang?

I came across this file

#![feature(type_alias_impl_trait)]

#[derive(Clone)]
struct CopyIfEq<T, U>(T, U);

impl<T: Copy> Copy for CopyIfEq<T, T> {}

type E<'a, 'b> = impl Sized;

fn foo<'a: 'b, 'b, 'c>(x: &'static i32, mut y: &'a i32) -> E<'b, 'c> {
    let v = CopyIfEq::<*mut _, *mut _>(&mut { x }, &mut y);

    // This assignment requires that `x` and `y` have the same type due to the
    // `Copy` impl. The reason why we are using a copy to create a constraint
    // is that only borrow checking (not regionck in type checking) enforces
    // this bound.
    let u = v;
    let _: *mut &'a i32 = u.1;
    unsafe {
        let _: &'b i32 = *u.0;
    }
    u.0
    //~^ ERROR hidden type for `E<'b, 'c>` captures lifetime that does not appear in bounds
}

fn CopyIfEq() {}

where treereduce seems to not output anything at all.
Is this some kind of parsing error of the file that messes things up?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.