iconcmo / pouch-datalog Goto Github PK
View Code? Open in Web Editor NEWThis project forked from pouchdb/plugin-seed
Datalog query engine for PouchDB.
This project forked from pouchdb/plugin-seed
Datalog query engine for PouchDB.
I don't know how Datomic handles this, but perhaps this article on IndexedDB might have some hints on other ways of accomplishing case-insensitive search.
One could also perhaps just .toLowerCase()
everything going into the index and coming in as searches, which might be easier and perhaps more efficient in this case.
While Learn Datalog Today and the Datomic docs are a good start, it'd be nice to have some more examples here—both for learning purposes and for seeing what kinds of queries work and/or don't work.
With the caveat that I still know very little about Datalog, here's some thoughts on going whole hog while not needing tons of index space:
According to https://github.com/tonsky/datascript#project-status you need:
EAVT, AEVT and AVET indexes
I understand these to be permutations of the Entity Attribute Value Transaction
concepts from reading e.g. http://docs.datomic.com/query.html#sec-2. In our case "entity" is basically doc._id; "attribute" is the name of a property within that doc (i.e. key); "value" is the value of said property, and I'm beginning to suspect that "transaction" is just doc._local_seq.
[Within a single instance of a database doc._local_seq stores a monontically increasing number — the database "_changes" sequence at that point. So for the same _id/_rev in a replica it may be different, but locally it's kinda what you need. Although I bet your trouble's gonna be that the view is only going to include emit
s from the most recent/winning version…so if you go to use it your data could actually disappear mid-execution which is obviously the opposite of the point!]
So anyway, putting aside transactions for a bit (have a few more thoughts, maybe lower down ;-) let's go to reviewing the code at https://github.com/dahjelle/pouch-datalog/blob/c9ae7f421c93d3f5b5ac64681fca5fe07b567912/index.js#L21:
callback(data.rows.map(function (el) {
return el.value;
}));
For starters why not just drop the emit(tuple, tuple)
silliness and allow just emit(tuple)
by a simple change here:
callback(data.rows.map(function (row) {
return row.key;
}));
Right?
For your AVE view you really just need to emit([k,v])
. This will save a bit of redundant space usage. Your query connector can fix it back up without much trouble:
callback(data.rows.map(function (row) {
if (index === 'ave') row.key.push(row.id);
return row.key;
}));
Note that [since we've left transactions out — this is actually the AVET index!] this doesn't change the sort order or anything — internally when you emit(k,v)
CouchDB sort of stores/sorts that as [k,id] -> v
, for array keys its like you have one more item at the end that you can hop to via ?startkey_docid.
I'm of a mixed mind on this one, but for completeness: the EAV[T] index is sort of redundant with the built-in _all_docs one, assuming it always just gets called for a single E+A lookup. What I mean is that you would do something like this pseudocode in the scope above the code we've been reviewing:
if (index === 'eav') db.get(E).then(callback([doc.id, A, doc[A]])
else /* code above, though could now be simplified to always just push id onto key */;
That is, an index by E is basically just what CouchDB has underlying db.get
(aka db.allDocs
if you do need a range of E, which seems unlikely in this case?) and then picking out the relevant keys.
Now I'm a mixed mind, because if your documents are large and you only want to fetch a single attribute across each, having this index will save I/O and parsing overhead — so it's a tradeoff between storing an additional copy of your database in the form of this index, versus better query performance. So keeping it is in the Couch tradition of optimizing for disk usage last.
So what is interesting about this from a user of this library perspective is that really my job is just to emit whatever [A,V[, T]] tuples I think are relevant from a document!
There's a couple tricks that might be useful in this vein:
require()
logic in your viewsSo you could things like:
var emitTuple = require('…datalog helper…')(doc)
emitTuple('name', doc.firstName+' '+doc.lastName);
where (unbecarest to the user) the helper would be something like:
module.exports = function (doc) {
function emitTuple(k,v, t) { // t optional
emit(['eavt', doc._id, k, v, t]);
emit(['avet', k, v, doc._id, t]); // or see simplification without id/t above, but the point is caller is insulated from this anyway!
}
}
Or I can almost imagine even an inverted version where pouch-datalog provides the "real" map and the user provides the "tuple" one, but basically splitting the actual CouchDB details from the Datalog tuple concept.
There's kind of a tradeoff here, where you do insulate the user from the internal indexing details but now they have to copy-paste the right version of your code into their ddoc before using this plugin (or is Kanso not actually dead yet?)
Although! You could just add an initialization phase to your plugin? Something like:
ddb = db.configureDataquery(function (doc) { emitTuple(…); })
db.dataquery(…);
db.dataquery(…);
db.dataquery(…);
…and it would basically write/overwrite "_design/datalog" with something generated from, roughly:
function configureDataquery(tupleMapper) {
var ddoc = {};
function realMap(doc) { function emitTuple(k,v,t) { …as above… } (TUPLE_MAP_PLACEHOLDER)(); }
ddoc.view.index = realMap.toString().replace('TUPLE_MAP_PLACEHOLDER', tupleMapper.toString());
ddoc._id = "_design/datalog";
//ddoc.options.local_seq = true;
db.put(ddoc);
}
Okay, in the time I spent explaining this I could have experimented with this for real, but at least now you know what I think you should do ;-)
Oh, on the transaction thing, which I think is kind of cool although also agree it makes sense to at least leave optional, basically you tell users (or give them a helper method…) that instead of saving over top the old doc, basically just post a new doc each time it changes. So instead of having a replacing sequence (pretend its a changes feed):
{_id:"mydoc", _rev:"1-abc", _local_seq:1, …}
{_id:"mydoc", _rev:"2-cba", _local_seq:2, …}
{_id:"mydoc", _rev:"3-acb", _local_seq:3, …}
you store:
{_id:"mydoc@after@0", _rev:"1-abcd", _local_seq:1, …}
{_id:"mydoc@after@1-abcd", _rev:"1-cbad", _local_seq:2, …}
{_id:"mydoc@after@2-cbad", _rev:"1-acbd", _local_seq:3, …}
You'll still get conflict detection since the ids are deterministically based on the same MVCC token, but now your emitTuple can use doc._id.split("@after@")[0]
as E and doc._local_seq
as T (and the provided k/v as A and V, natch) and voilà your user gets Datalog when their ops team gave them CouchDB!
We should be able to get the document's contents back with the query instead. In some cases, this might be rather more data; not sure what the tradeoff might be.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.