GithubHelp home page GithubHelp logo

pouch-datalog's People

Contributors

dahjelle avatar nolanlawson avatar sajinshrestha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pouch-datalog's Issues

Case insensitive support.

I don't know how Datomic handles this, but perhaps this article on IndexedDB might have some hints on other ways of accomplishing case-insensitive search.

One could also perhaps just .toLowerCase() everything going into the index and coming in as searches, which might be easier and perhaps more efficient in this case.

More Examples

While Learn Datalog Today and the Datomic docs are a good start, it'd be nice to have some more examples here—both for learning purposes and for seeing what kinds of queries work and/or don't work.

Random feedback

With the caveat that I still know very little about Datalog, here's some thoughts on going whole hog while not needing tons of index space:

According to https://github.com/tonsky/datascript#project-status you need:

EAVT, AEVT and AVET indexes

I understand these to be permutations of the Entity Attribute Value Transaction concepts from reading e.g. http://docs.datomic.com/query.html#sec-2. In our case "entity" is basically doc._id; "attribute" is the name of a property within that doc (i.e. key); "value" is the value of said property, and I'm beginning to suspect that "transaction" is just doc._local_seq.

[Within a single instance of a database doc._local_seq stores a monontically increasing number — the database "_changes" sequence at that point. So for the same _id/_rev in a replica it may be different, but locally it's kinda what you need. Although I bet your trouble's gonna be that the view is only going to include emits from the most recent/winning version…so if you go to use it your data could actually disappear mid-execution which is obviously the opposite of the point!]

Code review

So anyway, putting aside transactions for a bit (have a few more thoughts, maybe lower down ;-) let's go to reviewing the code at https://github.com/dahjelle/pouch-datalog/blob/c9ae7f421c93d3f5b5ac64681fca5fe07b567912/index.js#L21:

callback(data.rows.map(function (el) {
  return el.value;
}));

Clean up emits

For starters why not just drop the emit(tuple, tuple) silliness and allow just emit(tuple) by a simple change here:

callback(data.rows.map(function (row) {
  return row.key;
}));

Right?

Avoid redundant _id

For your AVE view you really just need to emit([k,v]). This will save a bit of redundant space usage. Your query connector can fix it back up without much trouble:

callback(data.rows.map(function (row) {
  if (index === 'ave') row.key.push(row.id);
  return row.key;
}));

Note that [since we've left transactions out — this is actually the AVET index!] this doesn't change the sort order or anything — internally when you emit(k,v) CouchDB sort of stores/sorts that as [k,id] -> v, for array keys its like you have one more item at the end that you can hop to via ?startkey_docid.

Getting rid of views, pt. 1

I'm of a mixed mind on this one, but for completeness: the EAV[T] index is sort of redundant with the built-in _all_docs one, assuming it always just gets called for a single E+A lookup. What I mean is that you would do something like this pseudocode in the scope above the code we've been reviewing:

if (index === 'eav') db.get(E).then(callback([doc.id, A, doc[A]])
else /* code above, though could now be simplified to always just push id onto key */;

That is, an index by E is basically just what CouchDB has underlying db.get (aka db.allDocs if you do need a range of E, which seems unlikely in this case?) and then picking out the relevant keys.

Now I'm a mixed mind, because if your documents are large and you only want to fetch a single attribute across each, having this index will save I/O and parsing overhead — so it's a tradeoff between storing an additional copy of your database in the form of this index, versus better query performance. So keeping it is in the Couch tradition of optimizing for disk usage last.

Getting rid of views, alternate universe

So what is interesting about this from a user of this library perspective is that really my job is just to emit whatever [A,V[, T]] tuples I think are relevant from a document!

There's a couple tricks that might be useful in this vein:

  • at least in CouchDB [assuming PouchDB too] you can CommonJS require() logic in your views
  • you can actually just emit the index type as the first element of the array (assuming you do keep the EAV index)
  • so you could actually host several "Datalog" stores within a single database (or even a single design document!) by doing all the emits for each from a single map function, perhaps even with a helper.

So you could things like:

var emitTuple = require('…datalog helper…')(doc)
emitTuple('name', doc.firstName+' '+doc.lastName);

where (unbecarest to the user) the helper would be something like:

module.exports = function (doc) {
  function emitTuple(k,v, t) { // t optional
    emit(['eavt', doc._id, k, v, t]);
    emit(['avet', k, v, doc._id, t]);   // or see simplification without id/t above, but the point is caller is insulated from this anyway!
  }
}

Or I can almost imagine even an inverted version where pouch-datalog provides the "real" map and the user provides the "tuple" one, but basically splitting the actual CouchDB details from the Datalog tuple concept.

There's kind of a tradeoff here, where you do insulate the user from the internal indexing details but now they have to copy-paste the right version of your code into their ddoc before using this plugin (or is Kanso not actually dead yet?)

Although! You could just add an initialization phase to your plugin? Something like:

ddb = db.configureDataquery(function (doc) { emitTuple(…); })
db.dataquery(…);
db.dataquery(…);
db.dataquery(…);

…and it would basically write/overwrite "_design/datalog" with something generated from, roughly:

function configureDataquery(tupleMapper) {
  var ddoc = {};
  function realMap(doc) { function emitTuple(k,v,t) { …as above… } (TUPLE_MAP_PLACEHOLDER)(); }
  ddoc.view.index = realMap.toString().replace('TUPLE_MAP_PLACEHOLDER', tupleMapper.toString());
  ddoc._id = "_design/datalog";
  //ddoc.options.local_seq = true;
  db.put(ddoc);
}

Okay, in the time I spent explaining this I could have experimented with this for real, but at least now you know what I think you should do ;-)

One more thing

Oh, on the transaction thing, which I think is kind of cool although also agree it makes sense to at least leave optional, basically you tell users (or give them a helper method…) that instead of saving over top the old doc, basically just post a new doc each time it changes. So instead of having a replacing sequence (pretend its a changes feed):

{_id:"mydoc", _rev:"1-abc", _local_seq:1, …}
{_id:"mydoc", _rev:"2-cba", _local_seq:2, …}
{_id:"mydoc", _rev:"3-acb", _local_seq:3, …}

you store:

{_id:"mydoc@after@0", _rev:"1-abcd", _local_seq:1, …}
{_id:"mydoc@after@1-abcd", _rev:"1-cbad", _local_seq:2, …}
{_id:"mydoc@after@2-cbad", _rev:"1-acbd", _local_seq:3, …}

You'll still get conflict detection since the ids are deterministically based on the same MVCC token, but now your emitTuple can use doc._id.split("@after@")[0] as E and doc._local_seq as T (and the provided k/v as A and V, natch) and voilà your user gets Datalog when their ops team gave them CouchDB!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.