GithubHelp home page GithubHelp logo

datafrog's Introduction

datafrog

Datafrog is a lightweight Datalog engine intended to be embedded in other Rust programs.

Datafrog has no runtime, and relies on you to build and repeatedly apply the update rules. It tries to help you do this correctly. As an example, here is how you might write a reachability query using Datafrog (minus the part where we populate the nodes and edges initial relations).

extern crate datafrog;
use datafrog::Iteration;

fn main() {
    // Prepare initial values, ..
    let nodes: Vec<(u32,u32)> = vec![
        // ..
    ];
    let edges: Vec<(u32,u32)> = vec![
        // ..
    ];

    // Create a new iteration context, ..
    let mut iteration = Iteration::new();

    // .. some variables, ..
    let nodes_var = iteration.variable::<(u32,u32)>("nodes");
    let edges_var = iteration.variable::<(u32,u32)>("edges");

    // .. load them with some initial values, ..
    nodes_var.insert(nodes.into());
    edges_var.insert(edges.into());

    // .. and then start iterating rules!
    while iteration.changed() {
        // nodes(a,c)  <-  nodes(a,b), edges(b,c)
        nodes_var.from_join(&nodes_var, &edges_var, |_b, &a, &c| (c,a));
    }

    // extract the final results.
    let reachable: Vec<(u32,u32)> = nodes_var.complete();
}

If you'd like to read more about how it works, check out this blog post.

Authorship

Datafrog was initially developed by Frank McSherry and was later transferred to the rust-lang-nursery organization. Thanks Frank!

datafrog's People

Contributors

ecstatic-morse avatar erismart avatar frankmcsherry avatar gabrielmajeri avatar ia0 avatar ljedrz avatar lqd avatar mark-simulacrum avatar michalt avatar nikomatsakis avatar regexident avatar sapphire-arches avatar sscdotopen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datafrog's Issues

Add simple profiling statistics

It could be interesting to have simple statistics, for example behind a feature flag, of the number of tuples and the time it took to merge and create them, for Relations and/of Variables.

There are commented out Drop impls in the code, e.g here as an example of the way to add the final tuple counts statistic.

Similarly, a Duration could be added to the merged relations, updating it in the operator functions, to display the time it took to create those tuples, as described a bit more here by Frank.

Unit tests

We don't presently have any unit tests...for anything. I think we should create a "base layer" of unit tests where we generate some simple operations and some random inputs and check that they get the correct result, using some kind of naive computation to act as an oracle.

Roughly as described here.

Add repo to Cargo.toml

It would be cool if the repository could be added to Cargo.toml. Then it'd be easier to find it on crates.io!

CI no longer runs

Looks like Travis CI is no longer working. Maybe we want to follow rustc and switch to GH Actions?

Investigate the effect of "treefrog leapjoin"-ing more than one Variable

Right now, the TFLJ algorithm reduces the need for temporary variables, when there is one dynamic data source (one Variable) and "leapers" over Relations only.

In some situations, this can mean a temporary relation is still required if one needs to join more than one variable. It would be interesting to investigate how/if being able to remove such temporaries would help the polonius benchmarks.

This might open the possibility to "inline" intermediate relations themselves (and not just the "steps" or indices) completely, for instance the big subset and requires relations, and whose tuples might not all need be produced in the first place.

Some details on how to do this are in this comment by Frank.

(Note: this might need to wait for TFLJ to land on master, polonius using it, and having more diverse benchmarks, checking with the rustc-perf benchmarks as well, or also having datafrog-specific benchmarks)

Datalog query with leapjoin

Hey, I'm just playing around with datafrog and trying to translate some basic datalog queries to the equivalent Rust code. I am wondering what something like the following snippet looks like in datafrog?

node(1, 'A').
node(2, 'B').
node(3, 'C').
node(4, 'D').

edge(1, 2).
edge(2, 3).
edge(3, 1).
edge(1, 4).

depthTwo(W1, W2) :-
    page(T1, W1),
    page(T2, W2),
    link(T1, T2).

I'm mostly interested in leapjoin and have tried something like:

let mut iteration = Iteration::new();

let pages: Relation<_> = vec![
    (1, "super.com"),
    (2, "page.com"),
    (3, "subpage.com"),
    (4, "minpage.com"),
]
.into();
let edges: Relation<(u32, u32)> = vec![(1, 2), (2, 3), (3, 1), (1, 4)].into();
let edges_rev: Relation<_> = edges.iter().map(|&(from, to)| (to, from)).collect();

let var = iteration.variable::<(u32, &str)>("var");
var.insert(pages.clone());

while iteration.changed() {
    var.from_leapjoin(
        &var,
        // So I technically would like to filter by the edges
        // but the nodes i want to filter by are not really there yet?
        // And I can't extend with nodes and filter with the edges
        edges_rev.extend_with(|&(a, _)| a),
        //
        |&(a, b), &c| (a, b),
    );
}

But I can't quite wrap my head around how the leapers work even after reading the comment given in the source code of the repo. Any pointers would be appreciated!

`Relation::from_antijoin` is broken

Relation::from_antijoin always returns an empty Relation, regardless of its inputs. That's because the antijoin helper, which takes a JoinInput as its first parameter, operates only on recent tuples.

datafrog/src/join.rs

Lines 65 to 73 in 5bda2f0

let results = input1
.recent()
.iter()
.filter(|(ref key, _)| {
tuples2 = gallop(tuples2, |k| k < key);
tuples2.first() != Some(key)
})
.map(|(ref key, ref val)| logic(key, val))
.collect::<Vec<_>>();

This is correct for variables, but Relations, which don't change during iteration, only have stable tuples. See #36 (comment) for the reason this must be the case.

To fix this, we should refactor the antijoin helper to work directly on Relations, and pass the proper input from Variable::from_antijoin and Relation::from_antijoin. A regression test is needed as well.

Disjunctive (OR) *filters*

@lqd opened rust-lang/polonius#157 a while ago, which solves the Location::All problem in what I think is the "correct" way. Essentially, it transforms all occurrences of origin_live_at(O, P) in rule bodies into (origin_live_at(O, P) OR placeholder(O)). In other words, placeholder regions are live at all points in the CFG.

Unfortunately it's actually kind of difficult to express that simple idea as a datafrog program with the current API. The PR manually expanded each rule body into two (one for each side of the OR), but this leads to code that is really difficult to maintain. Another option would be to create an intermediate Relation, origin_live_at_or_placeholder(O, P) defined below, to hold the disjunction. That would waste memory, however, and we would also need a new is_point relation that holds all possible points in the CFG.

origin_live_at_or_placeholder(O, P) :- origin_live_at(O, P).
origin_live_at_or_placeholder(O, P) :- placeholder(O), is_point(P).

Ideally, we would express this disjunction directly as a leaper. This is possible, but is more work than you might expect. An impl that's generic over Leapers won't work, since (among other things) there's no way to implement a disjunctive intersect efficiently with the current Leaper interface. I think you could do something like the following, but you'd need to handle heterogeneous Value types as well:

struct<A, B> LeaperOr(A, B);

/* This is the combination of concrete leapers we need, though ideally all combinations would be implemented.*/
impl<...> Leaper<...> for LeaperOr<ExtendWith<...>, FilterWith<...>> {
    /* ... */
}

Obviously, this doesn't scale, but maybe it's okay to implement just what we need for rust-lang/polonius#157? The alternative is to adjust the Leaper interface so that it composes better, but I don't see a straightforward way to do this without resorting to boxed trait objects, which is not an option since they would be dispatched in a tight loop (GATs + RPIT in trait fns would solve this, however).

Datafrog does not support early termination

It seems that the following assertions ensure that datafrog will not give partial results.

assert!(self.recent.borrow().is_empty());

This is often useful for completeness, but unproductive in real world situations where the are often unbounded numbers solutions to problems and 'good enough' is fine (e.g. estimation, path finding etc.).

Are patches accepted here? I'd like to add an incomplete method which would just be the 'complete' method without the assertions, and then replace the body of complete with just the assertions and the result of incomplete.

Execution Result is different from racket datalog engine

datalog code

#lang datalog
edge(0, 1). edge(1, 2). edge(2, 3).
path(X, Y) :- edge(X, Y).
path(X, Y) :- edge(X, Z), path(Z, Y).

exec path(A, B)?
I get

path(2, 3).
path(1, 2).
path(0, 1).
path(0, 2).
path(0, 3).
path(1, 3).

rust code

use datafrog::Iteration;

fn main() {
    // Prepare initial values, ..
    let path: Vec<(u32, u32)> = vec![
        (0, 1),
        (1, 2),
        (2, 3), // ..
    ];
    let edges: Vec<(u32, u32)> = vec![
        (0, 1),
        (1, 2),
        (2, 3), // ..
    ];

    // Create a new iteration context, ..
    let mut iteration = Iteration::new();

    // .. some variables, ..
    let path_var = iteration.variable::<(u32, u32)>("path");
    let edges_var = iteration.variable::<(u32, u32)>("edges");

    // .. load them with some initial values, ..
    path_var.insert(path.into());
    edges_var.insert(edges.into());

    // .. and then start iterating rules!
    while iteration.changed() {
        // b: k, a: v1, c: v2 
        // path(a,c)  <-  edges(a,b), path(b,c)
        path_var.from_join(&edges_var, &path_var, |_b, &a, &c| (a, c));
    }

    // extract the final results.
    let reachable = path_var.complete();
    for (a, b) in reachable.iter() {
        println!("({}, {})", a, b);
    }
}

I get

(0, 1)
(1, 1)
(1, 2)
(2, 1)
(2, 2)
(2, 3)
(3, 1)
(3, 2)
(3, 3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.