rust-lang / datafrog Goto Github PK
View Code? Open in Web Editor NEWA lightweight Datalog engine in Rust
License: Apache License 2.0
A lightweight Datalog engine in Rust
License: Apache License 2.0
Hey, I'm just playing around with datafrog and trying to translate some basic datalog queries to the equivalent Rust code. I am wondering what something like the following snippet looks like in datafrog?
node(1, 'A').
node(2, 'B').
node(3, 'C').
node(4, 'D').
edge(1, 2).
edge(2, 3).
edge(3, 1).
edge(1, 4).
depthTwo(W1, W2) :-
page(T1, W1),
page(T2, W2),
link(T1, T2).
I'm mostly interested in leapjoin
and have tried something like:
let mut iteration = Iteration::new();
let pages: Relation<_> = vec![
(1, "super.com"),
(2, "page.com"),
(3, "subpage.com"),
(4, "minpage.com"),
]
.into();
let edges: Relation<(u32, u32)> = vec![(1, 2), (2, 3), (3, 1), (1, 4)].into();
let edges_rev: Relation<_> = edges.iter().map(|&(from, to)| (to, from)).collect();
let var = iteration.variable::<(u32, &str)>("var");
var.insert(pages.clone());
while iteration.changed() {
var.from_leapjoin(
&var,
// So I technically would like to filter by the edges
// but the nodes i want to filter by are not really there yet?
// And I can't extend with nodes and filter with the edges
edges_rev.extend_with(|&(a, _)| a),
//
|&(a, b), &c| (a, b),
);
}
But I can't quite wrap my head around how the leapers work even after reading the comment given in the source code of the repo. Any pointers would be appreciated!
It seems that the following assertions ensure that datafrog will not give partial results.
Line 307 in 5455139
This is often useful for completeness, but unproductive in real world situations where the are often unbounded numbers solutions to problems and 'good enough' is fine (e.g. estimation, path finding etc.).
Are patches accepted here? I'd like to add an incomplete
method which would just be the 'complete' method without the assertions, and then replace the body of complete with just the assertions and the result of incomplete.
Right now, the TFLJ algorithm reduces the need for temporary variables, when there is one dynamic data source (one Variable) and "leapers" over Relation
s only.
In some situations, this can mean a temporary relation is still required if one needs to join more than one variable. It would be interesting to investigate how/if being able to remove such temporaries would help the polonius benchmarks.
This might open the possibility to "inline" intermediate relations themselves (and not just the "steps" or indices) completely, for instance the big subset
and requires
relations, and whose tuples might not all need be produced in the first place.
Some details on how to do this are in this comment by Frank.
(Note: this might need to wait for TFLJ to land on master, polonius using it, and having more diverse benchmarks, checking with the rustc-perf benchmarks as well, or also having datafrog-specific benchmarks)
@lqd opened rust-lang/polonius#157 a while ago, which solves the . Essentially, it transforms all occurrences of Location::All
problem in what I think is the "correct" wayorigin_live_at(O, P)
in rule bodies into (origin_live_at(O, P) OR placeholder(O))
. In other words, placeholder regions are live at all points in the CFG.
Unfortunately it's actually kind of difficult to express that simple idea as a datafrog
program with the current API. The PR manually expanded each rule body into two (one for each side of the OR), but this leads to code that is really difficult to maintain. Another option would be to create an intermediate Relation
, origin_live_at_or_placeholder(O, P)
defined below, to hold the disjunction. That would waste memory, however, and we would also need a new is_point
relation that holds all possible points in the CFG.
origin_live_at_or_placeholder(O, P) :- origin_live_at(O, P).
origin_live_at_or_placeholder(O, P) :- placeholder(O), is_point(P).
Ideally, we would express this disjunction directly as a leaper. This is possible, but is more work than you might expect. An impl that's generic over Leaper
s won't work, since (among other things) there's no way to implement a disjunctive intersect
efficiently with the current Leaper
interface. I think you could do something like the following, but you'd need to handle heterogeneous Value
types as well:
struct<A, B> LeaperOr(A, B);
/* This is the combination of concrete leapers we need, though ideally all combinations would be implemented.*/
impl<...> Leaper<...> for LeaperOr<ExtendWith<...>, FilterWith<...>> {
/* ... */
}
Obviously, this doesn't scale, but maybe it's okay to implement just what we need for rust-lang/polonius#157? The alternative is to adjust the Leaper
interface so that it composes better, but I don't see a straightforward way to do this without resorting to boxed trait objects, which is not an option since they would be dispatched in a tight loop (GATs + RPIT in trait fn
s would solve this, however).
It would be cool if the repository could be added to Cargo.toml. Then it'd be easier to find it on crates.io!
datalog code
#lang datalog
edge(0, 1). edge(1, 2). edge(2, 3).
path(X, Y) :- edge(X, Y).
path(X, Y) :- edge(X, Z), path(Z, Y).
exec path(A, B)?
I get
path(2, 3).
path(1, 2).
path(0, 1).
path(0, 2).
path(0, 3).
path(1, 3).
rust code
use datafrog::Iteration;
fn main() {
// Prepare initial values, ..
let path: Vec<(u32, u32)> = vec![
(0, 1),
(1, 2),
(2, 3), // ..
];
let edges: Vec<(u32, u32)> = vec![
(0, 1),
(1, 2),
(2, 3), // ..
];
// Create a new iteration context, ..
let mut iteration = Iteration::new();
// .. some variables, ..
let path_var = iteration.variable::<(u32, u32)>("path");
let edges_var = iteration.variable::<(u32, u32)>("edges");
// .. load them with some initial values, ..
path_var.insert(path.into());
edges_var.insert(edges.into());
// .. and then start iterating rules!
while iteration.changed() {
// b: k, a: v1, c: v2
// path(a,c) <- edges(a,b), path(b,c)
path_var.from_join(&edges_var, &path_var, |_b, &a, &c| (a, c));
}
// extract the final results.
let reachable = path_var.complete();
for (a, b) in reachable.iter() {
println!("({}, {})", a, b);
}
}
I get
(0, 1)
(1, 1)
(1, 2)
(2, 1)
(2, 2)
(2, 3)
(3, 1)
(3, 2)
(3, 3)
Relation::from_antijoin
always returns an empty Relation
, regardless of its inputs. That's because the antijoin
helper, which takes a JoinInput
as its first parameter, operates only on recent
tuples.
Lines 65 to 73 in 5bda2f0
This is correct for variables, but Relation
s, which don't change during iteration, only have stable
tuples. See #36 (comment) for the reason this must be the case.
To fix this, we should refactor the antijoin
helper to work directly on Relation
s, and pass the proper input from Variable::from_antijoin
and Relation::from_antijoin
. A regression test is needed as well.
The compiler team is working on a standard policy around its external crates (see rust-lang/compiler-team#19) -- we need to bring this crate into conformance, though the policy is not yet finalized, so it's not 100% clear what this means.
Looks like Travis CI is no longer working. Maybe we want to follow rustc
and switch to GH Actions?
It could be interesting to have simple statistics, for example behind a feature flag, of the number of tuples and the time it took to merge and create them, for Relation
s and/of Variable
s.
There are commented out Drop impls in the code, e.g here as an example of the way to add the final tuple counts statistic.
Similarly, a Duration
could be added to the merged relations, updating it in the operator functions, to display the time it took to create those tuples, as described a bit more here by Frank.
We don't presently have any unit tests...for anything. I think we should create a "base layer" of unit tests where we generate some simple operations and some random inputs and check that they get the correct result, using some kind of naive computation to act as an oracle.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.