google / badwolf Goto Github PK

Temporal graph store abstraction layer.

License: Apache License 2.0

Go 100.00%

badwolf's Introduction

BadWolf

BadWolf is a temporal graph store loosely modeled after the concepts introduced by the Resource Description Framework (RDF). It presents a flexible storage abstraction, efficient query language, and data-interchange model for representing a directed graph that accommodates the storage and linking of arbitrary objects without the need for a rigid schema.

BadWolf began as a triplestore, but triples have been expanded to quads to allow simpler and flexible temporal reasoning. Because BadWolf is designed for generalized relationship storage, most of the web-related idiosyncrasies of RDF are not used and have been toned down or directly removed and focuses on its time reasoning aspects.

In case you are curious about the name, BadWolf is named after the BadWolf entity as it appeared in Dr. Who series episode "The Parting Of Ways" after Rose Tyler looked into the Time Vortex itself. The BadWolf entity scattered events in time as self encode messages, creating a looped ontological paradox. Hence, naming a temporal graph store after the entity seemed appropriate.

You can find more detailed information on each of the components of BadWolf below:

Please keep in mind that this project is under active development and there will be no guarantees on API stability till the first stable 1.0 release.

You can reach us here or via @badwolf_project on Twitter.

For more information, presentation, or to find other related projects that are using BadWolf check the project website.

badwolf's People

Contributors

Stargazers

Watchers

badwolf's Issues

Implement LIMIT clause

Implement the result returning LIMIT clause.

Cut final stable release

Do one last pass to the initial conformance tests and if everything checks out cut the first release.

Implement arbitrary Table filtering

Extend the Table functionality to allow arbitrary row filtering with the provided filtering functions. This is requires to solve issue #19.

Still active?

The idea of using an immutable graph store in conjunction with event sourcing make a lot of sense for an upcoming project, however it looks like BadWolf has been abandoned. Is this the case?

Extend the semantic parser to validate the CONSTRUCT statement

Implementation of #45 requires to update the Statement to collect the relevant information on how to construct the new triples.

Implement SELECT table projection

Implement binding projection for the resulting results Table.

Implement a Boolean expression evaluator

The Boolean expression evaluator is required to implement the HAVING clause listed on issue 19.

CONTRIBUTING guide has typo/extra text

The CONTRIBUTING.md file has a cryptic sentence at the end:

This commit can be part of your first [Differential][] code review.

This appears to be missing a link and out-of-context. What's "Differential"? Is it a component of Phabricator? But then how does it relate to GitHub PRs which is presumably the way to contribute to this project?

Either there should be a link there to explain where Differential comes in, or this sentence should be removed entirely.

Implement GLOBAL TIME BOUNDS clause

Add the collection of anchor bounds and properly compute the intervals, enforce validation, and extend the query planner to properly use the provided bounds.

Implement arbitrary Table grouping

Extend the Table functionality to allow arbitrary aggregation with the provided aggregation functions.

Ability to retract a triple or set an "unanchor" time

A triple is asserted with an anchor time, but there is no mechanism for unanchoring the triple, that is invalidating the triple. One approach I thought about was have a nil type that denotes the triple has been retracted. Another method could be to implement this in the logic of the storage layer. When a triple is "deleted", the triple is stored in a retracted set. When triples are requested, matched triples would need to be evaluated against the retracted set before being returned. Have you thought about this at all?

Merge efforts with Cayley

Hey! I'm the maintainer of the other Google graph project, https://github.com/google/cayley

I know I've been out of the Google-sphere for a year now, but I've still been contributing to Cayley and it's been used in a number of projects and a few production instances. In short, it works pretty well and I'd like to grow it more.

I've read over your repo (you've written in Go too, good choice ;) ). For storage, you've got 99% the same primitives as Cayley. ~~Triples~~ Quads (went down that road, believe me), memory store, indices. Your methods for things like "TriplesForPredicateAndObject" are pretty standard Cayley iterators.

You're doing some nice things with regard to RDF literals I'd be excited to add to Cayley.

It seems like most of your novel work is in BQL. I'm just reading up on it now, so I haven't quite gotten the full notion of what makes this a new and interesting query language (would love to discuss), but even proposed as a black box, I'd be happy to add it as a query language in Cayley.

I've always been more of a storage guy, so if your interest is in inference and query languages, that works great. The advantages you'd get would be all sorts of backends, various optimizations on the iterators, while still being able to push forward with your temporal graph idea. Everybody wins. What do you think?

Infinite loop? when querying time anchor.

create graph ?world;

insert data into ?world {
  /room<Hallway> "connects_to"@[] /room<Kitchen>.

  /room<Kitchen> "connects_to"@[] /room<Hallway>.
  /room<Kitchen> "connects_to"@[] /room<Bathroom>.
  /room<Kitchen> "connects_to"@[] /room<Bedroom>.

  /room<Bathroom> "connects_to"@[] /room<Kitchen>.

  /room<Bedroom> "connects_to"@[] /room<Kitchen>.
  /room<Bedroom> "connects_to"@[] /room<Fire Escape>.

  /room<Fire Escape> "connects_to"@[] /room<Kitchen>.

  /item/book<000> "in"@[2016-04-10T4:21:00.000000000Z] /room<Hallway>.
  /item/book<000> "in"@[2016-04-10T4:23:00.000000000Z] /room<Kitchen>.
  /item/book<000> "in"@[2016-04-10T4:25:00.000000000Z] /room<Bedroom>
};

select ?item, ?t from ?world where {
  ?item "in"@[?t] /room<Bedroom>
};

drop graph ?world;

results in an infinite loop at

Processing statement (3/4):
select ?item, ?t from ?world where { ?item "in"@[?t] /room<Bedroom> };

Write a Cayley driver for the current storage.go interface.

As a first step towards unifying efforts with http://github.com/google/cayley we are going to target creating a driver implementation against Cayley together with @barakmich.

DeleteRow improvement introduced non deterministic behavior

In commit 9845651 DeleteRow introduced non deterministic behavior. This is a bit strange and should be investigated further.

Add a new planner for the CONSTRUCT statement

Implementing #45 will require to extend the planner to be able to create and insert he new facts based on the retrieved data.

AS keyword error

According to the BQL overview document, the "as" keyword should be able to be used to return a different name for variables. However, the keyword causes an error when running the program. The keyword only works when used with an aggregation.

When running this program:

# Create a graph.
CREATE GRAPH ?family;

# Insert some data into the graph.
INSERT DATA INTO ?family {
  /u<joe> "parent_of"@[] /u<mary> .
  /u<joe> "parent_of"@[] /u<peter> .
  /u<peter> "parent_of"@[] /u<john> .
  /u<peter> "parent_of"@[] /u<eve>
};

# Find all Joe's offspring names.
# Works fine without "as" keyword.
SELECT ?name
FROM ?family
WHERE {
  /u<joe> "parent_of"@[] ?offspring ID ?name
};

# Find all Joe's offspring names.
# Fails with "as" keyword.
SELECT ?name as ?n
FROM ?family
WHERE {
  /u<joe> "parent_of"@[] ?offspring ID ?name
};

# Count offspring.
# Works with "as" keyword.
SELECT ?parent_name, count(?name) as ?n
FROM ?family
WHERE {
  ?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name
}
GROUP BY ?parent_name;

# Drop the graph.
DROP GRAPH ?family;

The output is:

Processing file bug.bql

Processing statement (1/6):
CREATE GRAPH ?family;

Result:
OK

Processing statement (2/6):
INSERT DATA INTO ?family { /u<joe> "parent_of"@[] /u<mary> . /u<joe> "parent_of"@[] /u<peter> . /u<peter> "parent_of"@[] /u<john> . /u<peter> "parent_of"@[] /u<eve> };

Result:
OK

Processing statement (3/6):
SELECT ?name FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };

Result:
?name
mary
peter

OK

Processing statement (4/6):
SELECT ?name as ?n FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };

[FAIL] [ERROR] Failed to execute BQL statement with error cannot project against unknow binding ?n; known bindinds are [?offspring ?name]

Processing statement (5/6):
SELECT ?parent_name, count(?name) as ?n FROM ?family WHERE { ?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name } GROUP BY ?parent_name;

Result:
?parent_name    ?n
joe "2"^^type:int64
peter   "2"^^type:int64

OK

Processing statement (6/6):
DROP GRAPH ?family;

Result:
OK

Add a way to retrieve all available graphs names from Store

Add a method to retrieve all available graph names from the store.

Assert type covariance in query?

How does one use type covariance in a query? The documentation doesn't cover it.

[RFC] Advanced graph structural query operations

In preparation for 2017, besides working on extending BQL (see issues #45, #46, #47, and #48), we are planning to start exploring providing support for graph structural query operations. Some examples we could focus one could cover:

Predicate transitive closures and traversals.
Compute the minimal spanning tree.
Path calculations, including shortest path.
Basic structural measures (e.g. betweenness.)

At this point we are considering the list above more or less in the order we would approach them. Is there any other operation you would need to get added? Do you have a pressing operation that would simplify your usage?

Consider SPARQL-style "named graphs" construct for temporal aspects

In W3C SPARQL, a collection of triples can be structured in terms of named graphs, and a query expression can refer to these (directly by identifier, or using variables). Have you considered applying this structure for your temporal query facilities? e.g. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#namedGraphs and nearby.

Implement tracing functionality for query excecution

To improve execution debugging and performance improvements, we should add a simple tracing mechanism to see detailed traces of query execution.

Implement GROUP BY

Use the Table grouping implemented on issue #17 to provide a functional group by clause.

Implement sort ORDER BY

Add the collection of bindings and directions to the Statement, enforce validation, and extend the query planner to use the table sort functionality.

Plumb context.Context to storage interfaces

Plumb context.Contexxt

https://godoc.org/golang.org/x/net/context

to all methods on the storage interfaces and fix the volatile memory driver accordingly.

Add DECONSTRUCT query to BQL

Deconstruct queries allow to removed derived facts from a graphs. The facts are defined based on the bindings provided in the WHERE clause. Basic filtering capabilities are provided by adding a HAVING clause. This is the complementary statement for CONSTRUCT introduced in issue #45.

DECONSTRUCT { 
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    ?gender 
}
HAVING ?gender == /gender<male>;

It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT query.

DECONSTRUCT {
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    /gender<male>
};

_ are not allowed on DECONSTRUCT clauses.

Cut RC2 and final pre release candidate

Should be the last pre release cut before the stable initial 0.1.0 release.

BW BQL entry seems to have sticky parser errors

$ bw --driver=VOLATILE bql
...
bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00Z] /u<julia>
};
[ERROR] failed to parse BQL statement with error predicate.Parse failed to parse time anchor
2016-12-12T15:00Z in "parent_of"@[2016-12-12T15:00Z] with error parsing time 
"2016-12-12T15:00Z" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "Z" as ":"

bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to
create a predicate, got &{NODE /u<joe> } instead

# This second error is spurious, but sticky. Only quitting and restarting bw seems to allow data to be
# inserted. If I enter the same sequence but with an acceptable timestamp, the second error does not
# occur.

bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to create a predicate, got &{NODE /u<joe> } instead

bql> quit;
Thanks for all those BQL queries!

$ bw --driver=VOLATILE bql
Welcome to BadWolf vCli (0.5.1-dev @141940248)
Using driver "VOLATILE". Type quit; to exit
Session started at 2016-12-21 13:50:07.001652809 -0500 EST

bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00:00Z] /u<julia>
};
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[OK]

Table merging may have a bug

This is not the expected result. It seems like the merge table is a bad merge.

Welcome to BadWolf vCli (0.4.2-dev)
Using driver "VOLATILE". Type quit; to exit

Session started at 2016-05-17 12:46:28.098374381 -0700 PDT

bql> create graph ?family;
[OK]
bql> load /tmp/family.txt ?family;
Successfully processed 6 lines from file "/tmp/family.txt".
Triples loaded into graphs:
    - ?family
bql> select ?grandparent from ?family where {?s "parent of"@[] /person<Amy Schumer> . ?grandparent "parent of"@[] ?s};
?grandparent
/person<Gavin Belson>
/person<Gavin Belson>
/person<Mary Belson>
/person<Mary Belson>

[OK]
bql>

The data used to run this command in /tmp/family.txt is

/person<Gavin Belson>  "born in"@[]    /city<Springfield>
/person<Gavin Belson>  "parent of"@[]  /person<Peter Belson>
/person<Gavin Belson>  "parent of"@[]  /person<Mary Belson>
/person<Mary Belson>   "parent of"@[]  /person<Amy Schumer>
/person<Mary Belson>   "parent of"@[]  /person<Joe Schumer

Modify the BQL grammar to parse CONSTRUCT

Implementing #45 requires to modify the grammar to accept the new CONSTRUCT statement.

TestIsEmptyClause fails

In bql/semantic/semantic_test.go, there is a typo in a test name that causes it to be ignored:

func TesIsEmptyClause(t *testing.T) {
    testTable := []struct {
        in  *GraphClause
        out bool
    }{
        {
            in:  &GraphClause{},
            out: true,
        },
        {
            in:  &GraphClause{SBinding: "?foo"},
            out: true,
        },
    }
    for _, entry := range testTable {
        if got, want := entry.in.IsEmpty(), entry.out; got != want {
            t.Errorf("IsEmpty for %v returned %v, but should have returned %v", entry.in, got, want)
        }
    }

}

After changing the name to the proper TestIsEmptyClause and running the tests, the test fails:

--- FAIL: TestIsEmptyClause (0.00s)
    semantic_test.go:148: IsEmpty for &{<nil> ?foo    <nil>       <nil> <nil>   false <nil>        <nil> <nil>   false} returned false, but should have returned true
FAIL
FAIL    github.com/google/badwolf/bql/semantic  0.028s

Add the test workbench to the bw CLI tool

The test workbench build on issues #25 should be available to run via the command line tool.

Comparison of triples and GUID may not be stable

The internal representation of time can lead to difference on equal. Also, since the text version is used on the serialization, it can also affect stability of GUIDs for triples.

Add literals, node, or predicate to the HAVING clause

https://github.com/google/badwolf/blob/master/bql/grammar/grammar.go#L612

Does not contain any of the elements that form and object making the comparison bounded on flexibility.

Group by heck should also validate aggregation functions

Check needs validation that bindings outside the group by ones all have aggregation functions.

Filtering clause returns unexpected failure

Given the following data set

/_<c175b457-e6d6-4ce3-8312-674353815720>	"_predicate"@[]	"/some/immutable/id"@[]
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_owner"@[2017-05-23T16:41:12.187373-07:00]	/gid<0x9>
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_subject"@[]	/aid</some/subject/id>
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_object"@[]	/aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_object"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_owner"@[2017-05-23T16:41:12.187373-07:00]	/gid<0x6>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_predicate"@[2017-05-23T16:41:12.187373-07:00]	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_subject"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/subject/id>
/aid</some/subject/id>	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>
/aid</some/subject/id>	"/some/immutable/id"@[]	/aid</some/object/id>
/aid</some/subject/id>	"/some/ownerless_temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>

The following query succeeds as expected.

bql> SELECT ?bn,?p, ?o 
     FROM ?test 
     WHERE { 
          ?bn "_subject"@[,]    /aid</some/subject/id>. 
          ?bn "_predicate"@[,] ?p .
          ?bn "_object"@[,] ?o 
      };

?bn	?p	?o
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>

[OK] Time spent:  578.963µs

However, when you specify ?o, it fails with a filtering error.

bql> SELECT ?bn,?p  
     FROM ?test 
     WHERE { 
          ?bn "_subject"@[,]    /aid</some/subject/id>. 
          ?bn "_predicate"@[,] ?p . 
          ?bn "_object"@[,] /aid</some/object/id>
      };

[ERROR] planner.Execute: failed to execute insert plan with error failed to fully specify clause { ?bn "_object"@[,] /aid</some/object/id> } for row map[?bn:/_<cd8bae87-be96-41af-b1a8-27df990c9825>]
Time spent:  514.294µs

Given the this is just and update on the query above, this should have not failed and return one row with ?bn and ?p bindings.

Add CONSTRUCT query to BQL

Construct queries allow to create new facts to be added to graphs. The facts are defined based on the bindings provided in the WHERE clause. Basic filtering capabilities are provided by adding a HAVING clause.

A simple example adding new facts based on the current ones:

CONSTRUCT { 
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    ?gender 
}
HAVING ?gender == /gender<male>;

It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT query.

CONSTRUCT {
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    /gender<male>
};

Subjects are allowed to specify _ instead of a WHERE clause binding. This will inject a new blank node.

Questions: EAV vs Triplestore, Gremlin, Geographical data, immutability, further readings?

Héllo,

First and foremost thanks for sharing this project! This is very interesting!

Me, Myself and I

I am a database modeling enthusiast, I created a database in Python called AjguDB, which is a graphdb on top EAV (on top of wiredtiger ordered key value store, it's similar to boltdb). I did a similar project in Scheme which is can be queried using miniKanren (a logic language embedded in Scheme language). My inspiration is mostly datomic database even if I skipped the immutable part.

EAV vs Triplestores

I used to think that EAV was a triplestore.I am reconsidering that fact. It seems like EAV model is less generic that triplestore model. My understanding, is that both are good at modeling sparse matrix / multidimentional data but EAV is really good at representing documents whereas triplestores are good at representing triples (or facts). One might say, that a document is a set of triples. But in EAV model you don't have control over the entity, it's randomly generated. At the end of the day, I think triplestore is just EAV with E that is not a unique identifier. WDYT?

Gremlin querying

Is it possible to adapt Gremlin to work on quads?

Geographical data

I am surprised that there is no mention of geographical data in some way. Is it something you plan to add?

Immutability

How do you cope with immutability during querying? Here is a pratical example, say there is triple that says that «there is hundred people in a twon in 2017». Now, it's 2018, do I need to create a new triple or update the old triple? Do triples have an history? It seems to me that a database present in BadWolf must be clean, you can not fix typos or it will kludge results.

google / badwolf Goto Github PK

badwolf's Introduction

BadWolf

badwolf's People

Contributors

Stargazers

Watchers

Forkers

badwolf's Issues

Me, Myself and I

EAV vs Triplestores

Gremlin querying

Geographical data

Immutability

Further readings

Recommend Projects

Recommend Topics

Recommend Org

Jobs