GithubHelp home page GithubHelp logo

mapreduce's People

Contributors

adboomlodestar avatar calvinmetcalf avatar carlo-colombo avatar daleharvey avatar ddouglascarr avatar klaemo avatar marten-de-vries avatar neojski avatar ngconsulti avatar nolanlawson avatar peterdavehello avatar valotas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mapreduce's Issues

dist

Why here? Why outdated?

port over toPromise

I believe the main issue is going to be dealing with temp views, currently toPromise just checks if the last argument is a function and if so assume it's a callback, but with temp views you will sometimes have cases of wanting to do db.query(function).then.. so we probobly would want to switch it to check if the last argument is a function and the arguments length is greater then 2 as I can't see a situation in this plugin of only passing a callback and nothing else.

Intermittent error: local with temp views, Testing query with keys

Occasionally when I run npm test I see:

  174 passing (11s)
  1 failing

  1) local with temp views: views Testing query with keys:
     AssertionError: returns one doc: expected [] to have a length of 1 but got 0

This may just be a leveldb problem, since I haven't seen it in the browser yet.

mapreduce depends on pouchdb#master

I'm not a dependencies master but currently if someone downloads the repo it's broken because mapreduce depends on some newer pouchdb.

Shouldn't we do sth like devDependency.pouchdb = daleharvey/pouchdb#master?

Add more tests to confirm [key, docid, value] ordering

Before we release persisted mapreduce to the world in PouchDB 2.2.0, we should revisit the sort order and confirm that it's in line with CouchDB. I just checked, and the only test where we confirm the assumed [key, docid, value] ordering is this one.

It would be nice to test e.g. values with different object types (arrays, objects, booleans, the doc itself, the doc with a joined id, etc.).

Handle multiple emits

See discussion in #12. Apparently we just need to sort on values in case the key and the docid are both the same (i.e. a doc emits the same key multiple times, but with different values).

Support Group Level

Right now only group_level=exact is supported.
CouchDB supports numeric group_level values:

The group and group_level options control whether the reduce function reduces to a > set of distinct keys or to a single result row. group_level lets you specify how many > items of the key array are used in grouping; group=true is effectively the same as group_level=999 (for an arbitrarily high value of 999.) Don't specify both group and group_level; the second one given will override the first.

(see HTTP view API)

Return error if using multi-key fetch and group=false

Another weird edge case. CouchDB returns this error if you use keys with >1 key and group=false:

{
  "status":400,
  "name":"query_parse_error",
  "message":"Multi-key fetchs for reduce views must use `group=true`"
}

Sugar functions for creating/deleting/fetching indexes

Also it would be nice if the index names didn't have to have a slash in them. For instance, if they had a slash we could do what we normally do, but if not we could just rename myIndex to myIndex/myIndex. It's a good Couch practice anyway to have one map/reduce function per design doc.

_conflict test uses wrong assertion

This will exist and it does not test what it's suppose to test:

should.exist(res.rows[0].value);

By the way it bothers me really much that if I add to that line not (should.not.exist(...)) then the test report is:

  1) local views Views should include _conflicts:
     Error: timeout of 2000ms exceeded
      at null.<anonymous> (/home/neo/mapreduce/node_modules/mocha/lib/runnable.js:165:14)
      at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)

And it makes no sense. I suppose it has to do with assertion library throwing errors and so done is not fired but it should not work like that.

query swallows callback errors

When providing a callback to db.query(...) any exceptions thrown in that callback are silently swallowed. For example:

function myView (callback) {
  db.query('mydesign/myview', {startkey: 'a'}, function (err, result) {
    throw new Error('whoops!')
    callback(err, result)
  })
}

The error will never be seen, and instead you end up with what appears to be a hung query. This makes sense if you are planning to return the promise higher up the call stack:

function myView () {
  return db.query('mydesign/myview', {startkey: 'a'})
}

but that pretty much forces your clients to use promises, which is a bit obnoxious. In my app, I've been doing this where I have calls to query:

function myView (callback) {
  db.query('mydesign/myview', {startkey: 'a'}, function (err, result) {
    process.nextTick(callback.bind(this, err, result))
  })
}

That way any exceptions thrown in my callback bubble up as expected (crashing my server or whatever, which is what I expect them to do).

I propose that query should do this automatically when a callback is provided, as there is no meaningful way to handle callback exceptions in a promise without also forcing clients to consume promises.

Since @calvinmetcalf is a maintainer both here and for lie, I'd also humbly suggest that lie should implement something like Bluebirds nodeify so that pouchdb.query can behave as outlined above when given a node-style callback.

Duplicated test

I think we can remove fake design test as a duplicate of the more advanced one](

mapreduce/test/test.js

Lines 505 to 523 in a2442c4

it("Query non existing view returns error", function (done) {
pouch(dbName, function (err, db) {
var doc = {
_id: '_design/barbar',
views: {
scores: {
map: 'function(doc) { if (doc.score) { emit(null, doc.score); } }'
}
}
};
db.post(doc, function (err, info) {
db.query('barbar/dontExist', {key: 'bar'}, function (err, res) {
err.name.should.equal('not_found');
err.message.should.equal('missing_named_view');
done();
});
});
});
});
)

Thoughts?

Batch the index update operations

I've been working hard to improve performance of IDB/WebSQL's basic get/put/allDocs/bulkDocs operations, which are the main bottleneck in persisted mapreduce, since we use a regular PouchDB under the hood.

However, the main drag on performance is just that we do a single atomic operation for each change we get from db.changes(). There's a lot of unnecessary reading and writing from _local docs that we could avoid.

I'm going to wait until #100 is merged, but for now I'm thinking of aiming for a batch size of about 50/100 or so. Holding 50 docs in memory shouldn't cause OutOfMemory errors on most mobile devices for most doc sizes, and if it's a problem in the future, we can make it configurable.

emit('error')

In my big commit (dfe44b0) I removed emit('error'). This was undocumented, untested so can anyone tell me what it was supposed to do?

Documentation

need docs on how to run tests, probably links back to the pouchdb repo / contributors file etc

start and end keys when decending is true

moved from daleharvey/pouchdb#1432

If I am querying CouchDB
AND I provide a startkey + endkey
AND I provide descending = true
THEN I am also expected to reverse the values of startkey and endkey parameters for the query to work

see http://docs.couchdb.org/en/latest/couchapp/views/intro.html?highlight=descending

If I am querying CouchDB
AND I provide a startkey + endkey
AND I provide descending = true
THEN I am NOT expected to reverse the values of startkey and endkey parameters

Is this a deliberate feature of PouchDB? PouchDB does indeed seem to reverse the order of the returned values but I don't need to also switch the startkey/endkey parameters.

total_rows different from couchdb

Throw here some console.log:

data.rows.should.have.length(2, 'returns 2 non-empty docs');

I'd be delighted to see somewhere in the documentation exact description of total_rows because it's just my guess that it describes the number of documents before any filtering (startkey, endkey, keys, key)

What's more I'm not sure what offset says. I feel like it's the offset (position on the full list) of the first returned row.

Mark mapreduce dbs as such in a local doc

If we ever need to run migrations on the IDs themselves (e.g. because we find out we made a mistake in toIndexableString), it'd be useful to be able to know we're in a mapreduce db.

Rejection reason if sumsqr fails

I'm not sure whether it's correct reason of this promise being rejected

{ sum: '0lala',
  min: NaN,
  max: NaN,
  count: 1,
  sumsqr: { [invalid_value: builtin _stats function requires map values to be numbers] name: 'invalid_value', status: 500 } }

Should it just be the thrown error? (ie. this.sumsqr)

I have to dig a little more because it looks like my couchdb kinda crashes (does not respond to the view query) if I emit non-number value and use _stats.

Handle keys containing NaN, Infinity, dates, etc.

If you include any of these weird objects as a key or part of a key, Couch converts:

null, undefined, Infinity, -Infinity, NaN -> null
date -> JSON.stringify(date)
'' -> '' // no conversion 

We're already doing undefined, null, and the empty string correctly, but not the others. Is there anything I missed?

Keys lookup does not work for complex keys

I had bad feelings about a10c698 and it looks like I were right.

You can find broken tests in my commit.

Reason: you can only use strings for keys in js objects. Also: I don't like that this lookup is needed only for keys but part of its implementation is even inside emit. It should be enclosed inside mapUsingKeys.

cache queries

currently all non-http queries are done from scratch each time, we could save the result from the map query to a _local document and the sequence number, subsequent queries to avoid having to iterate through the whole database, we could even listen to the changes feed and update cache every time a document is created/updated.

Sugar for view names

If a view name contains a slash, then create a design doc with the name on the left and a view with the name on the right.

If the view name doesn't contain a slash, then just use the same string for both.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.