GithubHelp home page GithubHelp logo

ceejbot / fivebeans Goto Github PK

View Code? Open in Web Editor NEW
182.0 8.0 56.0 289 KB

A beanstalkd client for node.js & a simple framework for running beanstalkd workers.

Home Page: http://ceejbot.github.io/fivebeans/

License: MIT License

JavaScript 100.00%

fivebeans's Introduction

A straightforward and (nearly) complete beanstalkd client for node.js, along with a more opinionated beanstalkd jobs worker & runner.

on npm Tests Coverage Status Dependencies

FiveBeansClient

Heavily inspired by node-beanstalk-client, which is a perfectly usable client but somewhat dusty. I wanted more complete support of the beanstalkd protocol in a project written in plain javascript.

All client method names are the same case & spelling as the beanstalk text command, with hyphens replaced by underscore. The single exception is delete, which is renamed to destroy().

For complete details on the beanstalkd commands, see its protocol documentation.

Creating a client

The client constructor takes two arguments:

host: The address of the beanstalkd server. Defaults to 127.0.0.1.
port: Port to connect to. Defaults to 11300.

The client emits three events that you should listen for: connect, error, and close.

The client is not usable until you call its connect() method. Here's an example of setting up a client:

var fivebeans = require('fivebeans');

var client = new fivebeans.client('10.0.1.1', 11300);
client
    .on('connect', function()
    {
        // client can now be used
    })
    .on('error', function(err)
    {
        // connection failure
    })
    .on('close', function()
    {
        // underlying connection has closed
    })
    .connect();

Producing jobs

use

client.use(tube, function(err, tubename) {});

Use the specified tube. Reponds with the name of the tube being used.

list_tube_used

client.list_tube_used(function(err, tubename) {});

Responds with the name of the tube currently being used by the client.

put

client.put(priority, delay, ttr, payload, function(err, jobid) {});

Submit a job with the specified priority (smaller integers are higher priority), delay in seconds, and allowed time-to-run in seconds. The payload contains the job data the server will return to clients reserving jobs; it can be either a Buffer object or a string. No processing is done on the data. Responds with the id of the newly-created job.

peek_ready

client.peek_ready(function(err, jobid, payload) {});

Peek at the data for the job at the top of the ready queue of the tube currently in use. Responds with the job id and payload of the next job, or 'NOT_FOUND' if there are no qualifying jobs in the tube. The payload is a Buffer object.

peek_delayed

client.peek_delayed(function(err, jobid, payload) {});

Peek at the data for the delayed job with the shortest delay in the tube currently in use. Responds with the job id and payload of the next job, or 'NOT_FOUND' in err if there are no qualifying jobs in the tube. The payload is a Buffer object.

peek_buried

client.peek_buried(function(err, jobid, payload) {});

Peek at the data for the next buried job in the tube currently in use. Responds with the job id and payload of the next job, or 'NOT_FOUND' in err if there are no qualifying jobs in the tube. The payload is a Buffer object.

Consuming jobs

watch

client.watch(tube, function(err, numwatched) {});

Watch the named tube. Responds with the number of tubes currently watched by the client.

ignore

client.ignore(tube, function(err, numwatched) {});

Ignore the named tube. Responds with the number of tubes currently watched by the client.

list_tubes_watched

client.list_tubes_watched(function(err, tubelist) {});

Responds with an array containing the names of the tubes currently watched by the client.

reserve

client.reserve(function(err, jobid, payload) {});

Reserve a job. Responds with the id and the job data. The payload is a Buffer object.

reserve_with_timeout

client.reserve_with_timeout(seconds, function(err, jobid, payload) {});

Reserve a job, waiting the specified number of seconds before timing out. err contains the string "TIMED_OUT" if the specified time elapsed before a job became available. Payload is a buffer.

touch

client.touch(jobid, function(err) {});

Inform the server that the client is still processing a job, thus requesting more time to work on it.

destroy

client.destroy(jobid, function(err) {});

Delete the specified job. Responds with null if successful, a string error otherwise. This is the only method not named identically to its beanstalkd counterpart, because delete is a reserved word in Javascript.

release

client.release(jobid, priority, delay, function(err) {});

Release the specified job and assign it the given priority and delay (in seconds). Responds with null if successful, a string error otherwise.

bury

client.bury(jobid, priority, function(err) {});

Bury the specified job and assign it the given priority. Responds with null if successful, a string error otherwise.

kick

client.kick(maxToKick, function(err, numkicked) {});

Kick at most maxToKick delayed and buried jobs back into the active queue. Responds with the number of jobs kicked.

kick_job

client.kick_job(jobID, function(err) {});

Kick the specified job id. Responds with NOT_FOUND if the job was not found. Supported in beanstalkd versions >= 1.6.

Server statistics

peek

client.peek(id, function(err, jobid, payload) {});

Peek at the data for the specified job. Payload is a Buffer object.

pause_tube

client.pause_tube(tubename, delay, function(err) {});

Pause the named tube for the given number of seconds. No new jobs may be reserved from the tube while it is paused.

list_tubes

client.list_tubes(function(err, tubenames) {});

List all the existing tubes. Responds with an array of tube names.

stats_job

client.stats_job(jobid, function(err, response) {});

Request statistics for the specified job. Responds with a hash containing information about the job. See the beanstalkd documentation for a complete list of stats.

stats_tube

client.stats_tube(tubename, function(err, response) {});

Request statistics for the specified tube. Responds with a hash containing information about the tube. See the beanstalkd documentation for a complete list of stats.

stats

client.stats(function(err, response) {});

Request statistics for the beanstalkd server. Responds with a hash containing information about the server. See the beanstalkd documentation for a complete list of stats.

FiveBeansWorker

Inspired by node-beanstalk-worker but updated & rewritten to work with jobs queued by Stalker.

The worker pulls jobs off the queue & passes them to matching handlers. It deletes successful jobs & buries unsuccessful ones. It continues processing past all recoverable errors, though it emits events on error.

API

constructor

new FiveBeansWorker(options)

Returns a new worker object. options is a hash containing the following keys:

id: how this worker should identify itself in log events
host: beanstalkd host
port: beanstalkd port
handlers: hash with handler objects, with handler types as keys
ignoreDefault: true if this worker should ignore the default tube timeout: timeout parameter used with on reserve_with_timeout, defaults to 10 (in seconds)

start

start(tubelist)

Connect the worker to the beanstalkd server & make it watch the specified tubes. Emits the 'started' event when it is complete.

stop

stop()

Finish processing the current job then close the client. Emits the 'stopped' event when complete.

watch

watch(tubelist, callback)

Begin watching the tubes named in the list.

ignore

ignore(tubelist, callback)

Ignore the tubes named in the list.

Events

The worker is intended to continue processing jobs through most errors. Its response to exceptions encountered when processing jobs is to bury the job and emit an event that can be logged or handled somewhere else.

error: Emitted on error in the underlying client. Payload is the error object. Execution is halted. You must listen for this event.

close: Emitted on close in the underlying client. No payload.

started: Worker has started processing. No payload.

stopped: Worker has stopped processing. No payload.

info: The worker has taken some action that you might want to log. The payload is an object with information about the action, with two fields:

{
    clientid: 'id-of-worker',
    message: 'a logging-style description of the action'
}

This event is the tattered remnants of what used to be built-in logging, and it might go away.

warning: The worker has encountered an error condition that will not stop processing, but that you might wish to act upon or log. The payload is an object with information about the error. Fields guaranteed to be present are:

{
    clientid: 'id-of-worker',
    message: 'the context of the error',
    error: errorObject
}

Some errors might have additional fields providing context, such as a job id.

job.reserved: The worker has reserved a job. The payload is the job id.

job.handled: The worker has completed processing a job. The payload is an object with information about the job.

{
    id: job id,
    type: job type,
    elapsed: elapsed time in ms,
    action: [ 'success' | 'release' | 'bury' | custom error message ]
}

job.deleted: The worker has deleted a job. The payload is the job id.

job.buried: The worker has buried a job. The payload is the job id.

Jobs

Each job must be a JSON-serialized object with two fields:

type: type string matching a handler
payload: job data, in whatever format the job defines

The worker looks up a handler using the given type string and calls work() on the job payload.

The job may also be a JSON array containing two items:

[ tubename, jobdata ]

Where the second item is an object as specified above. This is for compatibility with the Stalker library, which wraps the job data this way.

Handlers

Handler modules must export a single function that returns an object. The object must have a field called 'type' with a brief descriptive string. It must also expose a function called work() with this signature:

work(jobdata, callback(action, delay))

jobdata: job payload
action: 'success' | 'release' | 'bury' | custom error message
delay: seconds to delay if the job is released; otherwise unused

If the action is "success", the job is deleted. If it is "release", the job is released with the specified delay. If it is "bury", the job is buried. All other actions are treated as errors & the job is buried in response.

Here's a simple handler example.

module.exports = function()
{
    function EmitKeysHandler()
    {
        this.type = 'emitkeys';
    }

    EmitKeysHandler.prototype.work = function(payload, callback)
    {
        var keys = Object.keys(payload);
        for (var i = 0; i < keys.length; i++)
            console.log(keys[i]);
        callback('success');
    }

    var handler = new EmitKeysHandler();
    return handler;
};

The examples directory has another sample handler.

Example

This example starts a worker capable of handling the emitkeys example from above.

var Beanworker = require('fivebeans').worker;
var options =
{
    id: 'worker_4',
    host: '127.0.0.1',
    port: 11300,
    handlers:
    {
        emitkeys: require('./emitkeyshandler')()
    },
    ignoreDefault: true
}
var worker = new Beanworker(options);
worker.start(['high', 'medium', 'low']);

FiveBeansRunner

A wrapper that runs a single beanstalkd worker as a daemon. Responds to the USR2 signal by reloading the configuration and restarting the worker. Handles SIGINT, SIGHUP, and SIGQUIT by completing processing on the current job then stopping.

Example use:

var fivebeans = require('fivebeans');
var runner = new fivebeans.runner('worker_id_1', '/path/to/config.yml');
runner.go();

bin/beanworker

The above code plus yargs wrapped in a node shell script for your convenience.

bin/beanworker --id=[ID] --config=[config.yml]

Creates a runner for a worker with the specified ID & configured with the specified yaml file.

Here's the complete source:

#!/usr/bin/env node

var argv = require('yargs')
    .usage('Usage: beanworker --id=[ID] --config=[config.yml]')
    .default('id', 'defaultID')
    .demand(['config'])
    .argv;

var FiveBeans = require('fivebeans');

var runner = new FiveBeans.runner(argv.id, argv.config);
runner.go();

Configuration file

Here's an example yaml configuration:

beanstalkd:
    host: "127.0.0.1"
    port: 11300
watch:
    - 'circle'
    - 'picadilly'
    - 'northern'
    - 'central'
handlers:
    - "./handlers/holborn.js"
    - "./handlers/greenpark.js"
    - "./handlers/knightsbridge.js"
ignoreDefault: true

beanstalkd: where to connect
watch: a list of tubes to watch.
handlers: a list of handler files to require
ignoreDefault: true if this worker should ignore the default tube

If the handler paths don't start with / the current working directory will be prepended to them before they are required.

Why yaml not json? Because when I originally wrote this, it was in support of a ruby service, and yaml is the native config format over in that land. I continue using it because it's more readable than json and easier for humans to type.

Contributors

@AVVS
@crackcomm
@zr40
Jon Keating Jevgenij Tsoi

Many thanks!

TODO

  • Handle DEADLINE_SOON from the server.

fivebeans's People

Contributors

avvs avatar ceejbot avatar crackcomm avatar emostar avatar greenkeeperio-bot avatar jtsoi avatar mauritsl avatar sandeep89 avatar tbleckert avatar zr40 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fivebeans's Issues

Connections left open

I have got fivebeans running as a part of a small system that uses supervisor to run a node server and the queue runner.
I've noticed that every time a job gets added to the queue a new connection is opened and never closed after the job has been successful or buried.
I thought it might be the server script but that only initialises a singleton of the fivebeans client.
Is there potentially something I could be missing, do I need to manually close the connection somewhere?
I know this is going to be hard to respond to with no code and very little context but unfortunately I can't disclose any code.
Any help or suggestions would be hugely appreciated!

SyntaxError: Unexpected token

Hello, the error is:
warn: { message: 'parsing job JSON', ... error: [SyntaxError: Unexpected token P] ...

It throws this error everytime worker get payload with non-latin characters in it.

My solution is to change line 155 of /lib/worker.js:
try { job = JSON.parse(payload.toString('ascii')); }
into:
try { job = JSON.parse(payload.toString()); }

Can you include this bugfix into main brunch?

unhandled exceptions are warnings, not errors

An unknown exception in a node.js program may leave your process in an invalid state. The only sane response to an unhandled exception is to exit the process and restart it.

Sending a warning on unhandled exception implies that the process may continue, and is very dangerous.

Support put job with loop

hi mate,
I try to put jobs with loop as below, but it seem need to exit process.
Is there any better idea?

thanks you.

var fivebeans = require('fivebeans');
var host = '127.0.0.1';
var port = 11300;
var tube = 'my_tube';

for (var i=0;i<3;i++){
  var stalker=new fivebeans.client(host, port);
  stalker.connect(function(err){
    stalker.use(tube, function(err, tubename){
      stalker.put(0,0,60,"foo", function(err, jobid){
        console.log(jobid);
        // process.exit(0);
      });
    });
  });
}

bug: if .connect() called more than once, incorrect callbacks are used

@ceejbot,

if FiveBeansClient.prototype.connect() called more than once, a callback for an old socket command may exist in self.handlers even after a new socket is created. In this case, the first command with the new socket will stuck, because its callback will not be called (a callback for the old socket will be called instead).

Suggested fix: #132.

Delayed job executes now

I am pushing a job to the queue as such :

Queue.use(Config.beanstalkd.tube, function(err, tname){
    Queue.put(0,15,60,JSON.stringify(['aggregator', job1]),onJobPushedToQueue);
});

I have a simple worker like so :

module.exports = function() {
    function aggregatorWorker() {
        this.type = 'aggregator';
    }


    aggregatorWorker.prototype.work = function (payload, callback) {
        console.log('payload', payload);
        callback('success');
    };

    var handler = new aggregatorWorker();
    return handler;
};

And the moment i push the job to queue it gets processed. according to the documentation, shouldn't this be executed at least 15sec later?

Running on ubuntu @Version 1.10+1+g86231ba

runner, why class?

Why is runner a class, since all you can do with it is call go()

In JS we use the Function pattern , ...this smells like a Java "command"

TIP; Infact it should be a class, but it must inherit from EventEmitter and pass on events from Worker....

Race condition

We're starting to see jobs showing up in the wrong tube in production.

It appears to be a race condition. Our code looks like this:

stalk.use(tube, function(err, tubename) {
  if(err) ...
  stalk.put(...)
}

And of course stalk.use does IO so node schedules before calling the callback, opening up an opportunity for a race condition if another request also calls stalk.use.

I imagine that's how a good number of people have their code structured. To prevent the race condition, we need to either call use & put sequentially without a potential schedule between them, or we need to use a different stalk connection per tube.

A use_and_put function would make the first easier for people, and also make the problem more obvious.

Regardless of solution, the README should probably warn people not to do what I did. :)

Confused: Is Watch not allowed while in Reserve?

Is WATCH not allowed while in RESERVE call? Is this the way beanstalkd works or is it fivebeans?

In the below code, first reserve command is run and then watch command after 2 secs, but watch command callback is never run.

consumer.js

var fivebeans = require('fivebeans');

var client = new fivebeans.client('127.0.0.1', 11300);
client
    .on('connect', function () {
        console.log('Connected');
    })
    .on('error', function (err) {
        // connection failure
    })
    .on('close', function () {
        // underlying connection has closed
    })
    .connect();

client.reserve(function (err, jobid, payload) {
    if (err) { console.log(err) }
    console.log(`Job Id ${jobid}`);
    console.log(`Job data ${payload.toString()}`);
    client.destroy(jobid, function (err) {
        if (err) { console.log(err) }
        console.log(`Job deleted ${jobid}`);
    });
});

setTimeout(() => {
    console.log('going to watch')
    client.watch('low-volume', (err, tubecount) => {
        console.log(err);
        console.log(tubecount);
    })
}, 2000)    

Output

Connetced
going to watch

Obsolete documentation for yaml configuration

In version 1.4.1 we used to put host and port inside beanstalk:

var options = {
 		id: this.id,
 		host: config.beanstalkd.host,
 		port: config.beanstalkd.port,
 		handlers: config.handlers,
 		ignoreDefault: config.ignoreDefault
 };

It was changed in this commit. And now in version 1.5.0 we put host and port just inside config:

var options =
{
    id: 'worker_4',
    host: '127.0.0.1',
    port: 11300,
    handlers:
    {
        emitkeys: require('./emitkeyshandler')()
    },
    ignoreDefault: true
}

But documantation about it for yaml configuration hasn't been updated:

beanstalkd:
    host: "127.0.0.1"
    port: 11300
watch:
    - 'circle'
    - 'picadilly'
    - 'northern'
    - 'central'
handlers:
    - "./handlers/holborn.js"
    - "./handlers/greenpark.js"
    - "./handlers/knightsbridge.js"
ignoreDefault: true

serial access on localhost too slow

If I serialize put operation it's painfully slow. I am used to 25-30K operations per second on beanstalkd with php, this one operates at 10 operations per second. When I queue up several thousand operations in async mode then it's moderately fast.

Release/Destroy fails to fire when using multiple Reserve in parallel

If i launch say 20 reserves at the same time, then doing release/destroy for each of them doesn't get called. If I lower the number of parallel requests to say, 9, it works fine.

Is there some type of edge case I'm hitting?

example: i have 1 thing in the queue.
I spin up 20 requests to beantalk. The reserve works, but the destroy inside the reserve doesn't.
If I only spin up 1 request, i can reserve/destroy just fine.

support for binary payload

Payload is currently a string. stream.write() and Buffer.toString() method is called without encoding specified. Thus, Node.js will encode/decode the payload into/from a UTF-8 Buffer. If the payload is a binary, like a PNG image, or a protocol buffer, it will get messed up, and the payload length is not calculated correctly (you'll see an EXPECTED_CRLF error if you try to send binary data).

Would you like to add an option to specify the encoding?

Thanks.

Homepage documentation example is wrong

emitkeys: require('./emitkeyshandler')
needs to be
emitkeys: require('./emitkeyshandler')()

Not a great experience when the only example in the documentation is incorrect!

Handlers issues

Hello everyone, this is not really an issue, but I'm having some troubles with the handlers.

I have one worker and two handlers doing completely different tasks.

I want to use Scribe-js to log what my worker are doing.

If I 'require' Scribe-js where my worker is, it will not be available in the handlers.

If I 'require' Scribe-js in both handlers I got an error because it looks like it's two separate instances of Scribe-js trying to write on the same files (did not work when I configure two different paths for the logs either).

Why I can't use a variable in my handler that has been declared where I instanciate my worker ?

Thank you for any help.

Why do I have to destroy before I reserve.

After reserving a job, I cannot reserve again until I delete the job. That means I can't do any jobs in parallel.

What if I queue a job that takes 40 seconds due to some asynchronous call out to the internet. I can't handle any other jobs in the mean time, until that call returns and then I call destroy.

What I've been doing is calling reserve right after a reserve returns:

client.reserve(function handleJob(error, jobId, payload) {
    setImmediate(function() {
        client.reserve(handleJob);
    });

    someCallThatTakesALongTimeToReturn(function() {
       client.destroy(jobId, ...);
    });
});

and for whatever reason, that does not work. I have to do

client.reserve(function handleJob(error, jobId, payload) {
    someCallThatTakesALongTimeToReturn(function() {
       client.destroy(jobId, function() {
            client.reserve(handleJob);
       });
    });
});

Is there a particular reason it has to work this way? Can beanstalkd not handle parallel requests?

Multiple worker

Hello,

I'm running two workers on the same tube with different ID tho using the fiverunner, but it seems that sometime both worker are trying to process the same job. From what i understood this shouldn't be possible because once a worker reserve the job the other isn't suppose to get the job as well no?

And i tested with different jobs being put on the tube and it seems to be a bit random the fact that both worker are trying to process the same job.

client.stats_job() never responds

I'm new to Beanstalkd. I'm trying to use the stats_job method in a worker, like this:

worker.client.stats_job(job_info.id, function(err, stats) {
                if(!err) {
                    if(stats.buries !== 0) {
                        worker.client.destroy(stats.id, function(err) {});
                    } else if(stats.releases >= settings.facebook.max_releases) {
                        worker.client.bury(stats.id, stats.pri, function(err) {});
                    }
                }
            });

However, the callback never seem to be called. There is no response at all. And processing of the queue seems to be stuck as well. If I remove this call, everything works fine.

What could be wrong? Thanks in advance! I compiled Beanstalkd from source, I'm on MacOS.

Use events for close/error instead of original callback on connect

I am not sure if it's just my inexperience with Fivebeans, but I find that the way the connect callback can be called multiple times makes things kinda weird and breaks the general node.js convention.

Effectively, the callback is being used as a quasi event loop, so it's better to use an actual event.

is connected method?

Is there an is connected method available on client object?

Whats the best way to make sure client is connected to beanstalk?

Job info in handler

This is not really an issue, but more like a feature idea.

So the handler's work() method has access to the payload, but it doesn't know anything about the job's details. Using this info, the work method could make better decisions about how to handle the job.

Example:

The jobstat - among other things - includes the number of tries, buries and time-left. These could be used to implement some kinda functionality that buries / deletes jobs after X tries and runs each retry with decreasing frequency.

But this is only one use-case. I guess having access to the job data (and the beans client) would make the handler more powerful.

So something like this would be cool:

.work(jobid, payload, callback)

or

.work(stats-job, payload, callback)

In the first case, the handler should have access to the beanstalkd client (created by worker).

In second case, by "jobs-stat" I meant the most important info about the job and its history, not necessarily the whole stats-job() response.

What do you think? :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.