GithubHelp home page GithubHelp logo

badwolf's Introduction

BadWolf

Test Status Go Report Card GoDoc

BadWolf is a temporal graph store loosely modeled after the concepts introduced by the Resource Description Framework (RDF). It presents a flexible storage abstraction, efficient query language, and data-interchange model for representing a directed graph that accommodates the storage and linking of arbitrary objects without the need for a rigid schema.

BadWolf began as a triplestore, but triples have been expanded to quads to allow simpler and flexible temporal reasoning. Because BadWolf is designed for generalized relationship storage, most of the web-related idiosyncrasies of RDF are not used and have been toned down or directly removed and focuses on its time reasoning aspects.

In case you are curious about the name, BadWolf is named after the BadWolf entity as it appeared in Dr. Who series episode "The Parting Of Ways" after Rose Tyler looked into the Time Vortex itself. The BadWolf entity scattered events in time as self encode messages, creating a looped ontological paradox. Hence, naming a temporal graph store after the entity seemed appropriate.

You can find more detailed information on each of the components of BadWolf below:

Please keep in mind that this project is under active development and there will be no guarantees on API stability till the first stable 1.0 release.

You can reach us here or via @badwolf_project on Twitter.

For more information, presentation, or to find other related projects that are using BadWolf check the project website.

badwolf's People

Contributors

aebrahim avatar aerostitch avatar altmas5 avatar apr94 avatar brcooley avatar chrisdusovic avatar claushellsing avatar defrager avatar hickford avatar iccananea avatar lsb avatar mrisher avatar pombredanne avatar rbkloss avatar rogerlucena avatar rossdakin avatar sidnei avatar thiagovas avatar xllora avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

badwolf's Issues

Cut final stable release

Do one last pass to the initial conformance tests and if everything checks out cut the first release.

Still active?

The idea of using an immutable graph store in conjunction with event sourcing make a lot of sense for an upcoming project, however it looks like BadWolf has been abandoned. Is this the case?

CONTRIBUTING guide has typo/extra text

The CONTRIBUTING.md file has a cryptic sentence at the end:

This commit can be part of your first [Differential][] code review.

This appears to be missing a link and out-of-context. What's "Differential"? Is it a component of Phabricator? But then how does it relate to GitHub PRs which is presumably the way to contribute to this project?

Either there should be a link there to explain where Differential comes in, or this sentence should be removed entirely.

Implement GLOBAL TIME BOUNDS clause

Add the collection of anchor bounds and properly compute the intervals, enforce validation, and extend the query planner to properly use the provided bounds.

Ability to retract a triple or set an "unanchor" time

A triple is asserted with an anchor time, but there is no mechanism for unanchoring the triple, that is invalidating the triple. One approach I thought about was have a nil type that denotes the triple has been retracted. Another method could be to implement this in the logic of the storage layer. When a triple is "deleted", the triple is stored in a retracted set. When triples are requested, matched triples would need to be evaluated against the retracted set before being returned. Have you thought about this at all?

Merge efforts with Cayley

Hey! I'm the maintainer of the other Google graph project, https://github.com/google/cayley

I know I've been out of the Google-sphere for a year now, but I've still been contributing to Cayley and it's been used in a number of projects and a few production instances. In short, it works pretty well and I'd like to grow it more.

I've read over your repo (you've written in Go too, good choice ;) ). For storage, you've got 99% the same primitives as Cayley. Triples Quads (went down that road, believe me), memory store, indices. Your methods for things like "TriplesForPredicateAndObject" are pretty standard Cayley iterators.

You're doing some nice things with regard to RDF literals I'd be excited to add to Cayley.

It seems like most of your novel work is in BQL. I'm just reading up on it now, so I haven't quite gotten the full notion of what makes this a new and interesting query language (would love to discuss), but even proposed as a black box, I'd be happy to add it as a query language in Cayley.

I've always been more of a storage guy, so if your interest is in inference and query languages, that works great. The advantages you'd get would be all sorts of backends, various optimizations on the iterators, while still being able to push forward with your temporal graph idea. Everybody wins. What do you think?

Infinite loop? when querying time anchor.

create graph ?world;

insert data into ?world {
  /room<Hallway> "connects_to"@[] /room<Kitchen>.

  /room<Kitchen> "connects_to"@[] /room<Hallway>.
  /room<Kitchen> "connects_to"@[] /room<Bathroom>.
  /room<Kitchen> "connects_to"@[] /room<Bedroom>.

  /room<Bathroom> "connects_to"@[] /room<Kitchen>.

  /room<Bedroom> "connects_to"@[] /room<Kitchen>.
  /room<Bedroom> "connects_to"@[] /room<Fire Escape>.

  /room<Fire Escape> "connects_to"@[] /room<Kitchen>.

  /item/book<000> "in"@[2016-04-10T4:21:00.000000000Z] /room<Hallway>.
  /item/book<000> "in"@[2016-04-10T4:23:00.000000000Z] /room<Kitchen>.
  /item/book<000> "in"@[2016-04-10T4:25:00.000000000Z] /room<Bedroom>
};

select ?item, ?t from ?world where {
  ?item "in"@[?t] /room<Bedroom>
};

drop graph ?world;

results in an infinite loop at

Processing statement (3/4):
select ?item, ?t from ?world where { ?item "in"@[?t] /room<Bedroom> };

AS keyword error

According to the BQL overview document, the "as" keyword should be able to be used to return a different name for variables. However, the keyword causes an error when running the program. The keyword only works when used with an aggregation.

When running this program:

# Create a graph.
CREATE GRAPH ?family;

# Insert some data into the graph.
INSERT DATA INTO ?family {
  /u<joe> "parent_of"@[] /u<mary> .
  /u<joe> "parent_of"@[] /u<peter> .
  /u<peter> "parent_of"@[] /u<john> .
  /u<peter> "parent_of"@[] /u<eve>
};

# Find all Joe's offspring names.
# Works fine without "as" keyword.
SELECT ?name
FROM ?family
WHERE {
  /u<joe> "parent_of"@[] ?offspring ID ?name
};

# Find all Joe's offspring names.
# Fails with "as" keyword.
SELECT ?name as ?n
FROM ?family
WHERE {
  /u<joe> "parent_of"@[] ?offspring ID ?name
};

# Count offspring.
# Works with "as" keyword.
SELECT ?parent_name, count(?name) as ?n
FROM ?family
WHERE {
  ?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name
}
GROUP BY ?parent_name;

# Drop the graph.
DROP GRAPH ?family;

The output is:

Processing file bug.bql

Processing statement (1/6):
CREATE GRAPH ?family;

Result:
OK

Processing statement (2/6):
INSERT DATA INTO ?family { /u<joe> "parent_of"@[] /u<mary> . /u<joe> "parent_of"@[] /u<peter> . /u<peter> "parent_of"@[] /u<john> . /u<peter> "parent_of"@[] /u<eve> };

Result:
OK

Processing statement (3/6):
SELECT ?name FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };

Result:
?name
mary
peter

OK

Processing statement (4/6):
SELECT ?name as ?n FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };

[FAIL] [ERROR] Failed to execute BQL statement with error cannot project against unknow binding ?n; known bindinds are [?offspring ?name]

Processing statement (5/6):
SELECT ?parent_name, count(?name) as ?n FROM ?family WHERE { ?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name } GROUP BY ?parent_name;

Result:
?parent_name    ?n
joe "2"^^type:int64
peter   "2"^^type:int64

OK

Processing statement (6/6):
DROP GRAPH ?family;

Result:
OK

[RFC] Advanced graph structural query operations

In preparation for 2017, besides working on extending BQL (see issues #45, #46, #47, and #48), we are planning to start exploring providing support for graph structural query operations. Some examples we could focus one could cover:

  • Predicate transitive closures and traversals.
  • Compute the minimal spanning tree.
  • Path calculations, including shortest path.
  • Basic structural measures (e.g. betweenness.)

At this point we are considering the list above more or less in the order we would approach them. Is there any other operation you would need to get added? Do you have a pressing operation that would simplify your usage?

Implement GROUP BY

Use the Table grouping implemented on issue #17 to provide a functional group by clause.

Implement sort ORDER BY

Add the collection of bindings and directions to the Statement, enforce validation, and extend the query planner to use the table sort functionality.

Add DECONSTRUCT query to BQL

Deconstruct queries allow to removed derived facts from a graphs. The facts are defined based on the bindings provided in the WHERE clause. Basic filtering capabilities are provided by adding a HAVING clause. This is the complementary statement for CONSTRUCT introduced in issue #45.

DECONSTRUCT { 
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    ?gender 
}
HAVING ?gender == /gender<male>;

It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT query.

DECONSTRUCT {
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    /gender<male>
};

_ are not allowed on DECONSTRUCT clauses.

BW BQL entry seems to have sticky parser errors

$ bw --driver=VOLATILE bql
...
bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00Z] /u<julia>
};
[ERROR] failed to parse BQL statement with error predicate.Parse failed to parse time anchor
2016-12-12T15:00Z in "parent_of"@[2016-12-12T15:00Z] with error parsing time 
"2016-12-12T15:00Z" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "Z" as ":"

bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to
create a predicate, got &{NODE /u<joe> } instead

# This second error is spurious, but sticky. Only quitting and restarting bw seems to allow data to be
# inserted. If I enter the same sequence but with an acceptable timestamp, the second error does not
# occur.

bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to create a predicate, got &{NODE /u<joe> } instead

bql> quit;
Thanks for all those BQL queries!

$ bw --driver=VOLATILE bql
Welcome to BadWolf vCli (0.5.1-dev @141940248)
Using driver "VOLATILE". Type quit; to exit
Session started at 2016-12-21 13:50:07.001652809 -0500 EST

bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00:00Z] /u<julia>
};
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[OK]

Table merging may have a bug

This is not the expected result. It seems like the merge table is a bad merge.

Welcome to BadWolf vCli (0.4.2-dev)
Using driver "VOLATILE". Type quit; to exit

Session started at 2016-05-17 12:46:28.098374381 -0700 PDT

bql> create graph ?family;
[OK]
bql> load /tmp/family.txt ?family;
Successfully processed 6 lines from file "/tmp/family.txt".
Triples loaded into graphs:
    - ?family
bql> select ?grandparent from ?family where {?s "parent of"@[] /person<Amy Schumer> . ?grandparent "parent of"@[] ?s};
?grandparent
/person<Gavin Belson>
/person<Gavin Belson>
/person<Mary Belson>
/person<Mary Belson>

[OK]
bql> 

The data used to run this command in /tmp/family.txt is

/person<Gavin Belson>  "born in"@[]    /city<Springfield>
/person<Gavin Belson>  "parent of"@[]  /person<Peter Belson>
/person<Gavin Belson>  "parent of"@[]  /person<Mary Belson>
/person<Mary Belson>   "parent of"@[]  /person<Amy Schumer>
/person<Mary Belson>   "parent of"@[]  /person<Joe Schumer

TestIsEmptyClause fails

In bql/semantic/semantic_test.go, there is a typo in a test name that causes it to be ignored:

func TesIsEmptyClause(t *testing.T) {
    testTable := []struct {
        in  *GraphClause
        out bool
    }{
        {
            in:  &GraphClause{},
            out: true,
        },
        {
            in:  &GraphClause{SBinding: "?foo"},
            out: true,
        },
    }
    for _, entry := range testTable {
        if got, want := entry.in.IsEmpty(), entry.out; got != want {
            t.Errorf("IsEmpty for %v returned %v, but should have returned %v", entry.in, got, want)
        }
    }

}

After changing the name to the proper TestIsEmptyClause and running the tests, the test fails:

--- FAIL: TestIsEmptyClause (0.00s)
    semantic_test.go:148: IsEmpty for &{<nil> ?foo    <nil>       <nil> <nil>   false <nil>        <nil> <nil>   false} returned false, but should have returned true
FAIL
FAIL    github.com/google/badwolf/bql/semantic  0.028s

Filtering clause returns unexpected failure

Given the following data set

/_<c175b457-e6d6-4ce3-8312-674353815720>	"_predicate"@[]	"/some/immutable/id"@[]
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_owner"@[2017-05-23T16:41:12.187373-07:00]	/gid<0x9>
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_subject"@[]	/aid</some/subject/id>
/_<c175b457-e6d6-4ce3-8312-674353815720>	"_object"@[]	/aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_object"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_owner"@[2017-05-23T16:41:12.187373-07:00]	/gid<0x6>
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_predicate"@[2017-05-23T16:41:12.187373-07:00]	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"_subject"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/subject/id>
/aid</some/subject/id>	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>
/aid</some/subject/id>	"/some/immutable/id"@[]	/aid</some/object/id>
/aid</some/subject/id>	"/some/ownerless_temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>

The following query succeeds as expected.

bql> SELECT ?bn,?p, ?o 
     FROM ?test 
     WHERE { 
          ?bn "_subject"@[,]    /aid</some/subject/id>. 
          ?bn "_predicate"@[,] ?p .
          ?bn "_object"@[,] ?o 
      };

?bn	?p	?o
/_<cd8bae87-be96-41af-b1a8-27df990c9825>	"/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]	/aid</some/object/id>

[OK] Time spent:  578.963µs

However, when you specify ?o, it fails with a filtering error.

bql> SELECT ?bn,?p  
     FROM ?test 
     WHERE { 
          ?bn "_subject"@[,]    /aid</some/subject/id>. 
          ?bn "_predicate"@[,] ?p . 
          ?bn "_object"@[,] /aid</some/object/id>
      };

[ERROR] planner.Execute: failed to execute insert plan with error failed to fully specify clause { ?bn "_object"@[,] /aid</some/object/id> } for row map[?bn:/_<cd8bae87-be96-41af-b1a8-27df990c9825>]
Time spent:  514.294µs

Given the this is just and update on the query above, this should have not failed and return one row with ?bn and ?p bindings.

Add CONSTRUCT query to BQL

Construct queries allow to create new facts to be added to graphs. The facts are defined based on the bindings provided in the WHERE clause. Basic filtering capabilities are provided by adding a HAVING clause.

A simple example adding new facts based on the current ones:

CONSTRUCT { 
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    ?gender 
}
HAVING ?gender == /gender<male>;

It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT query.

CONSTRUCT {
       ?p "grandmother of"@[] ?g .
       ?g "grandchild of"@[]  ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
       ?p         "parent of"@[] ?parent .
       ?parent    "parent of"@[] ?g .
       ?p         "gender"@[]    /gender<male>
};

Subjects are allowed to specify _ instead of a WHERE clause binding. This will inject a new blank node.

Questions: EAV vs Triplestore, Gremlin, Geographical data, immutability, further readings?

Héllo,

First and foremost thanks for sharing this project! This is very interesting!

Me, Myself and I

I am a database modeling enthusiast, I created a database in Python called AjguDB, which is a graphdb on top EAV (on top of wiredtiger ordered key value store, it's similar to boltdb). I did a similar project in Scheme which is can be queried using miniKanren (a logic language embedded in Scheme language). My inspiration is mostly datomic database even if I skipped the immutable part.

EAV vs Triplestores

I used to think that EAV was a triplestore.I am reconsidering that fact. It seems like EAV model is less generic that triplestore model. My understanding, is that both are good at modeling sparse matrix / multidimentional data but EAV is really good at representing documents whereas triplestores are good at representing triples (or facts). One might say, that a document is a set of triples. But in EAV model you don't have control over the entity, it's randomly generated. At the end of the day, I think triplestore is just EAV with E that is not a unique identifier. WDYT?

Gremlin querying

Is it possible to adapt Gremlin to work on quads?

Geographical data

I am surprised that there is no mention of geographical data in some way. Is it something you plan to add?

Immutability

How do you cope with immutability during querying? Here is a pratical example, say there is triple that says that «there is hundred people in a twon in 2017». Now, it's 2018, do I need to create a new triple or update the old triple? Do triples have an history? It seems to me that a database present in BadWolf must be clean, you can not fix typos or it will kludge results.

Further readings

Can you recommend me stuff to read about BadWolf.

I will dive into boltdb drivers.

Set up continuous testing

Right now I am manually running all tests before commits. We should set up continuous testing for the whole project to run at least all the available unit test.

Cut a first release candidate.

Do another pass to the document ion and compliance stories. Once done, label the latest master commit as RC1 after updating the version number.

Long running instance

Excuse for the probably very naive question. I think I have a working badwolf instance, which I obtained by

go get golang.org/x/net/context
go get github.com/peterh/liner
go get github.com/google/badwolf/...

(is this the right way? It is not mentioned anywhere how to install).

In any case, I am able to use the bw tool and follow the examples, use bw bql to get a REPL and so on.

The question is: how do I leave a long running instance of badwolf? Even assuming I want to keep the data in RAM (persistence is not a priority right now, even though I see there are persistent backends ) each time I run the bw an entire new instance of badwolf is created and apparently destroyed.

I assume there must be some way to leave badwolf running in the background and keep querying the existing graphs (even using the bw tool, preferably with some kind of driver/network interface) but I could not find any information on this

For instance, it is not clear to me how to use the bw export command: by the time I run a new bw process, everything in the previous runs is lost, hence there is nothing to export. Similarly, I can run bw load, but then data is lost as soon as the command returns. I am sure I am missing something obvious and fundamental here

example using wikidata

Is there any wikidata example available ?

If not, what would be the rough steps to use badwolf with wikidata

Implement HAVING clause

Add the collection of bindings and conditions, directions to the Statement, enforce validation, and extend the query planner to use the table filter functionality.

Build BQL test workbench

Build a test corpus to validate BQL behavior. Also this workbench needs to support multiple backends. T

Bad interaction between parsing predicates and string literals.

create graph ?world;

insert data into ?world {
  /room<000> "named"@[] "Hallway"^^type:text.
  /room<000> "connects_to"@[] /room<001>
};

fails with:

[FAIL] [ERROR] Failed to parse BQL statement with error Parser.parse: Failed to consume symbol INSERT_OBJECT, with error Parser.consume: could not consume token &{ERROR "Hallway" [lexer:0:57] predicates require time anchor information; missing "@[} in production INSERT_OBJECT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.