GithubHelp home page GithubHelp logo

couchtato's Introduction

Avatar

Build Status Dependencies Status Coverage Status Published Version npm Badge

Couchtato

Couchtato is a CouchDB database iterator tool.

This is handy when you want to apply a set of JavaScript functions against all documents in a CouchDB database or view, or only some of them by specifying a start and/or an end key(s). On each JavaScript function, you can save a document, remove a document, log a message, or count the documents.

Performance and resource utilisation can be tuned by tweaking how many documents to retrieve per retrieval page, how many documents to update/remove per bulk update, and how many milliseconds interval between page retrievals.

Installation

npm install -g couchtato

Usage

Create sample couchtato.js configuration file:

couchtato config

Iterate through all documents in a CouchDB database:

couchtato iterate -u http://user:pass@host:port/db

Iterate through all documents in a CouchDB view:

couchtato iterate -u http://user:pass@host:port/db/design/view

Use custom configuration file:

couchtato iterate -u http://user:pass@host:port/db -c ../somecouchtato.js

Iterate through documents within a range of IDs:

couchtato iterate -u http://user:pass@host:port/db -s Astartkey -e Zendkey

Only iterate the first 5 pages where each page contains 1000 documents:

couchtato iterate -u http://user:pass@host:port/db -n 5 -p 1000

Save/remove docs in bulk of 20000 documents at a time:

couchtato iterate -u http://user:pass@host:port/db -b 20000

Pause for 5 seconds between each page retrieval:

couchtato iterate -u http://user:pass@host:port/db -i 5000

Hide progress and summary info:

couchtato iterate -u http://user:pass@host:port/db -q

Configuration

Specify the task functions in config file. Each function in exports.conf.tasks will be applied to each retrieved document one by one.

exports.conf = {
    "tasks": {
        "log-all-docs": function (util, doc) {
            util.log(doc);
        },
        "log-by-criteria": function (util, doc) {
            if (doc.title.match(/^The/)) {
                util.log(doc);
            }
        },
        "update-by-criteria": function (util, doc) {
            if (doc.status === 'new') {
                doc.owner = 'Bob McFred';
                util.save(doc);
            }
        },
        "delete-by-criteria": function (util, doc) {
            if (doc.status === 'spam') {
                util.remove(doc);
            }
        },
        "count-by-field": function (util, doc) {
            util.count(doc.status);
        },
        "hash-doc": function (util, doc) {
            const hash = util.hash(doc);
            util.log('hash:' + hash);
        },
        "audit-object": function (util, doc) {
            util.audit(doc);
        },
        "whatever": function (util, doc) {
            // you need to implement whatever function
            whatever(doc);
        }
    }
}};

Database driver is available via util.driver from the task function, it returns nano(url).use(db) :

exports.conf = {
    "tasks": {
        "use-database-driver": function (util, doc) {
            util.driver.something();
        }
    }
}};

Note that you can also require other Node.js modules in the config file if you need to.

The util variable

That 'util' in function (util, doc) is a utility variable, it provides you with the following convenient functions:

# save the document back to the database
util.save(doc)

# remove the document from the database
util.remove(doc)

# increment a counter associated with a particular key
# all counters will be displayed in the summary report
util.count('somekey')

# log a message to both the console and to couchtato.log file
# if you only want to display a message on the console,
# simply use good old console.log(message)
util.log(message)

# generate a SHA256 hash for a given document, object, or string
util.hash(doc)

# add an object to the audit array, which is returned in the
# callback and can be used for downstream processing
util.audit(doc)

Report

A summary report will be displayed at the end of the run:

------------------------
Retrieved 2601388 documents in 5203 pages
Processed 10356 saves and 302 removes
- New data count: 1012
- Moderated data count: 4578
- Flagged data count: 88

Summary report can be excluded from the log output by using -q/--quiet option.

FAQ

Q: Why am I getting 'exports' is undefined Microsoft JScript runtime error on Windows?

A: Since Couchtato's default config file is called couchtato.js, Windows tried to execute couchtato.js instead of couchtato command, which then resulted in the above error. A workaround to this problem is to rename couchtato.js to config.js, and then use -c/--config-file flag, e.g. couchtato --config-file config.js iterate --url http://user:pass@host:port/db.

Q: What is the purpose of util.audit and/or the audit array?

A: The audit array is a convenient way to store data while iterating through documents. All objects added via util.audit() will be returned in the callback response upon completion. This is a powerful way to chain processing steps via messaging queues, lambda functions, or monitoring tools.

Colophon

Developer's Guide

Build reports:

Articles:

Related Projects:

  • couchpenter - CouchDB database and document setup tool

couchtato's People

Contributors

cliffano avatar dsquier avatar ronjouch avatar sqisher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

couchtato's Issues

Broken with Node 0.6.X because of require.paths()

It seems that couchtato uses log4js, which itself seems to be making a call to require.paths() in the code. Node 0.6 removed require.paths, so couchtato is broken with the stable release of node.js.

Still accepting PRs?

Appreciate your work on this library. It's been helpful.

Are you still maintaining couchtato (i.e., accepting PRs)?

  • If so, I've added some new features (maintaining backward compatibility) I'd like to submit.
  • If not, would you be interested in allowing other maintainers?

Thanks much!

db level features?

hello:
couchtato looks neat. i want to, say, delete all my databases! Sounds crazy. Are there DB level opts vs doc level?
thx,
chris

Multi-pages removes (>1000docs) are ineffective

Steps to reproduce

  • Make a rule file that will run a util.remove(doc) for more than one page (i.e. 1000 documents, if I understand correctly)
  • Run

Expected

Documents are removed

Actual

Documents are not removed, each successive run prints the same amount "removed":

[2016-03-16 13:57:31.765] [INFO] [default] - 
------------------------
Retrieved 1161 documents in 2 pages
Processed 0 saves and 1100 removes

But looking at futon, the documents were not removed. And indeed, re-trying the same command, they still appear and are still not deleted:

[2016-03-16 13:57:47.698] [INFO] [default] - 
------------------------
Retrieved 1161 documents in 2 pages
Processed 0 saves and 1100 removes

Now, if I first delete a batch of ~200, this one is successful:

[2016-03-16 13:58:04.639] [INFO] [default] - 
------------------------
Retrieved 1161 documents in 2 pages
Processed 0 saves and 199 removes

... and now deleting the rest works too:

[2016-03-16 13:58:32.490] [INFO] [default] - 
------------------------
Retrieved 962 documents in 1 pages
Processed 0 saves and 901 removes

embed user and password

Hi, I need to invoke couchtato from crontab, but on prod environment I don't want to hardcode DB user an password. Is there a way to include this auth params in the config file?

How do you deal with async scenarios

For example:

"update-by-criteria": function (util, doc) {
    if (doc.status === 'new') {
        getNewData(function(data) {
            doc.data = data;
            util.save(doc);
        });
    }
}

Improved --help output

It would be nice if couchtato's --help output had more information about the command line swithces that it supports, such as -u, -s, -c etc...

Batch size

First thank you for the incredible tool. I wouldn't have switched from MYSQL to CouchDB without it.

I think there is an issue with batch size and clearing out the queue. I had many tasks that were working, but one task that called util.save(doc) but the docs would no update and no error was thrown. After checking all the logs (nothing), I poked around the source code.

I finally fixed this issue using the flag -b 1. I think the number of documents with changes never reached the batch size and so were never saved. Ideally couchtato would save everything in the queue periodically or at least at the end of the job. Alternatively, you could log a useful error on the cli that says that some documents were not saved and that you should rerun that task with a smaller batch size.

Thanks for your consideration
Adam

Hide summary

When Couchtato is used to generate a report, user wants to control what goes to the output and would like to ignore the summary at the end.

Perhaps add --exclude-summary flag to do this.

how to iterate all db

I have a cloudant account, I want to iterate all db,
not just one db.
How should I do ?
thanks

Access to nano or database handler

Is there any way to get access to nano from within a task to do other work/verification? In the previous version, one could call c.stool.driver() from a task to get access to cradle, but since the arguments are now util and doc, there doesn't seem to be an easy way to get the database handle...

Custom config file support

Users want to be able to switch between config files (via -f arg) without renaming the file to use as couchtato.js .

It's more convenient to have several config files in the same directory and use -f .

Improve performance by traversing dataset once

At the moment Couchtato traverses the retrieved data twice:

  • when it plucks the docs out of result.rows
  • when it passes each doc from the plucked docs to each function

This is an extra cost for no gain, specially on a large dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.