GithubHelp home page GithubHelp logo

breejs / bree Goto Github PK

View Code? Open in Web Editor NEW
2.9K 10.0 76.0 1.35 MB

Bree is a Node.js and JavaScript job task scheduler with worker threads, cron, Date, and human syntax. Built for @ladjs, @forwardemail, @spamscanner, @cabinjs.

Home Page: https://jobscheduler.net

License: MIT License

JavaScript 94.70% HTML 1.37% Shell 0.15% TypeScript 3.77%
job scheduler node cron ms human simple workers sandboxed processes

bree's People

Contributors

climba03003 avatar cronid avatar danibram avatar dynamitec avatar flyingpumba avatar knicola avatar mat813 avatar mikevalstar avatar naz avatar niftylettuce avatar nsylke avatar olliechick avatar r00b avatar revadike avatar shadowgate15 avatar snrmwg avatar spence-s avatar thewebdevel avatar tipiwiny avatar titanism avatar zanechua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bree's Issues

Cron interval doesnt execute functional job

When using a cron string, the job does not execute at the specified time, 6:20pm every day. It does however, execute on start. Not sure exactly why but...theres that lol.

function test () { console.log('yay') }

const bree = new Bree({
  root: false,
  jobs: [
    {
      name: 'snapshot',
      path: test,
      cron: '20 18 */1 * *'
    }
  ]
});

bree.start()

[question] workerData parameter shape

The problem I'm facing is a confusing (from job developer perspective) use of the data (workerData) passed into the worker. After looking closer into how workerData parameter is constructed, I have a proposal on how to structure it so it's more intuitive for the job developer.

Let's consider an example bree job initialization with a value passed in:

// bree client configuring a job with passed in data

const job = {
    cron:'0/5 * * * * *',
    name:'job-with-data',
    path:'/path/to/job-using-data.js',
    worker:{
      workerData: 42
    }
};

bree.add(job);
bree.start('job-with-data');
// job-using-data.js file content

const {workerData} = require('worker_threads');

(async () => {
    const data = workerData.job.worker.workerData;
    console.log(data); // prints '42'
})();

To get access to the data inside of the job file job developer has have knowledge about quite complicated bree's internal object structure workerData.job.worker.workerData. This structure doesn't match 1:1 to the parameter passed during the job initialization. As a developer writing a job, I would expect to have access to same data structure as if I initialized a job through native Worker constructor:

new Worker('/path/to/job-using-data.js', {workerData: 42});
// job-using-data.js file content

const {workerData} = require('worker_threads');

(async () => {
    const data = workerData;
    console.log(data); // prints '42'
})();

I don't see a good reason why bree job would need to have all the meta information about itself accessible from within the script. I think it's unnecessary for the job to know about it's name, interval or any other bree's internal configurations. It should be up to bree's client to use these configurations and the job itself should execute in self contained manner, with as little knowledge as possible. @niftylettuce what are your thoughts? Could you please clarify the usecase for passing in job metadata into workerData parameter?

Proposed change

I think initializing worker in following way would make reading workerData inside of jobs most intuitive:

const workerData = job.worker && job.worker.workerData ? job.worker.workerData : undefined;
const object = {
  ...(this.config.worker ? this.config.worker : {}),
  ...(job.worker ? job.worker : {}),
  workerData
};

this.workers[name] = new threads.Worker(job.path, object);

[feat] browser support

Right now browser support is unstable due to these issues:

I previously had modified index.html with a basic example such as this:

  <script src="https://unpkg.com/bree">
  <script>
    (function() {
      function hello() {
        console.log('hello');
        postMessage('done');
      }

      var bree = new Bree({
        jobs: [
          {
            name: 'hello',
            path: hello,
            interval: '5s',
          }
        ]
      });

      bree.start();
    })();
  </script>

However since postMessage is not working, I have it commented out right now.

Note that I do expose threads from bthreads as Bree.threads, therefore one could write a browser test (e.g. with puppeteer that uses Bree.threads.backend === 'web_worker' and Bree.threads.browser === true to test that the proper backends load.

error when use with pino log

in jobs/test.js:

export {}
require('dotenv').config()
const pino = require('pino')
let logger = pino({
  customLevels: {
    critical: 70
  }
})
process.on('uncaughtException', pino.final(logger, (err, finalLogger) => {
  finalLogger.error(err, 'uncaughtException')
  process.exit(1)
}))
  

err: Error: final requires a stream that has a flushSync method, such as pino.destination and pino.extreme

I think the issue was related to this:
pinojs/pino#761

Using the same file for different jobs

I need to create a number of jobs dynamically from the same file using different entry parameters, however the following won't work since the name of the job is the same as the name of the worker (which needs to be unique).

Since I want to execute the same job for a number of times (a hundred or so), creating a file for each one doesn't seem like a valid solution.

const Bree = require('bree');

const params = [1, 2, 3, 4];
var myJobs = [];

params.forEach((param) => {
  myJobs.push({
    name: 'test-worker', // execute the file "test-worker.js" with different params
    interval: param,
    worker: {
      workerData: param,
    },
  });
});

const bree = new Bree({
  jobs: myJobs,
  errorHandler: (error, workerMetaData) => {
    console.log(error);
  },
});

bree.start();

However, this doesn't work which is to be expected since the name is the same for every job.

OUTPUT:

Error: Job #1 has a duplicate job name of test-worker
Error: Job #2 has a duplicate job name of test-worker
...

Is there a method to use the same file for different jobs? It feels like it does but I'm missing something.

same job file different para meters

hello is is possbile to have

jobs:[{ name: "jobA", interval: 15, workerData:{ test: "taska" }, name: "jobA", interval: 15, workerData:{ test: "taskb" }, }]

Rare one off jobs & dynamic scheduled jobs

I was really excited to see this library pop up - it's awesome to see something using native worker threads and not requiring redis/mongo/some other store. But was then a bit confused by the configuration method when it came to trying it out.

I have use cases for different types of jobs:

  1. recurring jobs that operate like a cron
  2. one off, super rare, but long-running tasks that might never be run in the lifetime of the process
  3. jobs that are generated dynamically and need to be run at a specific time
  • 1 seems to be the main use case for Bree, but in my case the use case is smallest
  • 2 is the my main use case - I can sort of see how I might manage it by not calling bree.start() and only calling bree.run('task') if the long running task is needed, but that feels like I'm not using the tool properly
  • 3 is a nice to have - already have code doing this, but unless I'm missing something there's no way to achieve it with Bree - except approximation with a cron running every minute to check

Given the rareness of the one-off jobs, it's a shame to have to declare them upfront, rather than being able to add them if and when they show up - otherwise there's overhead for no good reason.

With type 3 specifically, I find it odd that Bree has support for setting an exact date when a job should run, but that can only be set on instantiation?

I guess I'm looking for bree.add(({jobConfig}) or bree.run({jobConfig}) - or am I massively missing something?

BTHREADS_BACKEND=child_process SIGINT handling

For my specific use case in a node app, I'd like jobs created as processes (lots of IO and long running). Experimenting with the examples, I've come up with the parent / child code as per below. The problem is I cannot get the exception handling to work properly without a "sleep" after it calls bree.stop(). With it, it works as expected. My naive understanding is that signal handlers should not be async in nature. Is there a better way? BTW, same problem exists if I use Graceful.

Main "dispatch" app 

const path = require('path');
 
// optional
const ms = require('ms');
const dayjs = require('dayjs');
const Graceful = require('@ladjs/graceful');
const Cabin = require('cabin');
const later = require('@breejs/later');
 
// required
const Bree = require('bree');
const sleep = require('util').promisify(setTimeout);

process.on('SIGINT',  () => {
    myExit('SIGINT');
});
process.on('SIGQUIT',  () => {
    myExit('SIGQUIT');
});
process.on('SIGTERM',  () => {
    myExit('SIGTERM');
});
async function myExit(name) {
    console.log(`\mBREE Parent Ending ${name}`);
    bree.stop();
    //removing sleep here now does not allow child to close properly (at least no logging is seen but process is closed)
    await sleep(500);
    process.exit(0);
}

console.log(`BREE starting with PID:${process.pid}`);

//
// NOTE: see the "Instance Options" section below in this README
// for the complete list of options and their defaults
//
const bree = new Bree({
  //
  // NOTE: by default the `logger` is set to `console`
  // however we recommend you to use CabinJS as it
  // will automatically add application and worker metadata
  // to your log output, and also masks sensitive data for you
  // <https://cabinjs.com>
  //
  // logger: new Cabin(),
 
  //
  // NOTE: instead of passing this Array as an option
  // you can create a `./jobs/index.js` file, exporting
  // this exact same array as `module.exports = [ ... ]`
  // doing so will allow you to keep your job configuration and the jobs
  // themselves all in the same folder and very organized
  //
  // See the "Job Options" section below in this README
  // for the complete list of job options and configurations
  //
  jobs: [
    // runs `./jobs/foo.js` on start
    // 'job-async',
//    'job-async',

    //'job-compute'
    
    // runs `./jobs/worker-5.js` on after 10 minutes have elapsed
    {   
      name: 'job-async-copy',
      interval: 5000,//later.parse.text('every 5 seconds'),
      worker: {
          workerData: { name: 'joe' }
      }

    },
    
  ]
});
 
// handle graceful reloads, pm2 support, and events like SIGHUP, SIGINT, etc.
// const graceful = new Graceful({ brees: [bree] });
// graceful.listen();
 
// start all jobs (this is the equivalent of reloading a crontab):
bree.start();

//bree.stop('job-async');

/*
// start only a specific job:
bree.start('foo');
 
// stop all jobs
bree.stop();
 
// stop only a specific job:
bree.stop('beep');
 
// run all jobs (this does not abide by timeout/interval/cron and spawns workers immediately)
bree.run();
 
// run a specific job (...)
bree.run('beep');
 
// add a job array after initialization:
bree.add(['boop']);
// this must then be started using one of the above methods
 
// add a job after initialization:
bree.add('boop');
// this must then be started using one of the above methods
 
// remove a job after initialization:
bree.remove('boop');
*/

bree.on('worker deleted', (name) => {
    console.log('worker deleted', name);
    // if (name === 'job-async' && !signal) 
    //     bree.run(name);
  });

bree.on('worker created', (name) => {
    console.log(`new worker ${name}`);
});  
Job

const threads = require('bthreads');
const sleep = require('util').promisify(setTimeout);

let signal = false;

process.on('SIGINT',  () => {
    myExit('SIGINT');
});
process.on('SIGQUIT',  () => {
    myExit('SIGQUIT');
});
process.on('SIGTERM',  () => {
    myExit('SIGTERM');
});

function myExit(name) {
    console.log(`ASYNC received ${name}`);
    signal = true;

    // if (threads.parentPort) {
    //     threads.parentPort.postMessage('cancel');
    //     console.log(`Cancel sent`);
    // } else 
    //     console.log(`Exiting...`);
}
if (!threads.isMainThread) {
    threads.parentPort.on('message', async (message) => {
        console.log(`Received message from parent [${message}]`);
    //    await sleep(300);
        if (message === 'cancel') {
            signal = true;
        }   
    });
}

if (threads.isMainThread) {
    console.log(`ASYNC on main threads`);
    
} else {
    console.log(`ASYNC on worker thread`);

}

// if (threads.parentPort) {
//     console.log(`Adding .... event receiver`);
//     threads.parentPort.once('message', async (message) => {
//     });
// }


let a = 1;
let name = 'll'; //threads.hasOwnProperty('workerData.name')  ? "UNKNOWN" : threads.workerData.name
console.log(`async-COPY starting with PID:${process.pid} and name ${name}`);
async function x() {
    for (;;) {
        a++;
        await sleep(300);
        console.log(`COPY loop ${a}`);
        // if (a==5) 
        //     threads.parentPort.postMessage('error');
        if (a>10 || signal) {
            // if (threads.parentPort) {
            //     threads.parentPort.postMessage('COPY exit');
            // }
            if (signal) {
                console.log(`COPY Exiting due to signal...`);
                if (!threads.isMainThread) {
                    console.log(`Sending to Parent...`);
                    threads.parentPort.postMessage('cancelled');
                }
            }
            process.exit(0);
        }
    }
}

x();

[bug] interval jobs overlaping with initial job.run() execution

Problem

When interval job is started unnecessary Job "${name}" is already running error is logged.

Context

When an interval job is added to bree it runs itself immediately upon start. This does not take into account that job execution could take some time (for sake of example let's say 5 seconds). Because of the nature of schedules produced by later they execute on "whole" time intervals instead of taking into account exact time of execution+interval.

Example

Schedule a job that executes every 10 seconds with a job that takes roughly 5 seconds to execute:

const bree = new Bree({...config});
const job = { 
    interval: 'every 10 seconds' 
    name: 'run-five-seconds', 
    path: '/path/to/jobs/run-five-seconds.js', 
    outputWorkerMetadata: true
};
bree.add(breeJob);
bree.start('run-five-seconds');

The schedules produced with every 10 seconds interval is:

> job.interval.schedules[0].s
(6) [0, 10, 20, 30, 40, 50]

Meaning the job will run at 0 seconds, 10 seconds, 20 seconds ... etc

If the bree.start('run-five-seconds'); runs at second 6 it still will be executing when the first interval hits because: 6s + 5s ( execution time) = 11s which runs over first interval at second 10

Possible approaches

Options:

  1. Disallow immediate run() for interval jobs and let them run when their interval comes
  2. Skip first interval run
  3. Other ideas?

Imo, option 1 is the way to go as I don't think its ever an expectation for a job to run right away when it's given a specific schedule (for reference that's how crontab executes cron jobs). @niftylettuce wdyt?

Using cron

Can somebody give me a simple example, im trying to use bree with a cron, but it just exit and nothing happens

const Bree = require('bree');

const bree = new Bree({
    jobs: [{
        name: 'test1',
        cron : '* * * * *'
    }]

});

bree.start();

job/test1

const { parentPort } = require("worker_threads");

console.log("working")
if (parentPort) parentPort.postMessage("done");
else process.exit(0);

Using typescript files

Just a tip for anyone wanting to use typescript files instead of javascript. Make sure your project is already setup for ts-node.

Setup the job like this:

{ 
    name: 'Typescript Worker', 
    path: typescript_worker, 
    interval: 'every 10 seconds', 
    worker: { workerData: { __filename: './src/job_specific_filename_worker.ts' } } 
}

With the typescript_worker function like this:

function typescript_worker() {
    const path = require('path')
    require('ts-node').register()
    const workerData = require('worker_threads').workerData
    require(path.resolve(__dirname, workerData.__filename))
}

Passing data

Hello, is there a way currently to pass data on to worker threads?

[feat] custom error handler

Problem

Allow custom error handling when an error is produced by the worker.

Current state

At the moment when worker throws an unhandled exception it should bubble up to bree to handle it. When bree picks up an 'error' event it logs the error.

Proposed improvement

Allow providing custom error handler which would be called when errors happen in bree allowing to modify default behavior.

Alternative or additional solution would be extending the events bree emits with a new 'error' event.

An example use-case: giving the library client a possibility to send error reports to an external error tracking service like Honeybadger, Airbrake, Rollbar, BugSnag, Sentry, Exceptiontrap, Raygun, etc.

References

  • sidekiq - allows to configure handler
  • resque - same as sidekiq
  • agenda - no specific pattern the library only partially logs some of the errors and seems like leaves handling to the job
  • bull - not documented specifically but allows to register handlers for failed jobs

[feat] Option to execute jobs on the same event loop "inline"

Problem

Executing jobs on a separate thread through worker_thread or in a separate process comes with a price of additional memory allocation and time taken to spawn up a thread/process. Because of these constraints it's sometimes more efficient to run jobs in the same event loop. For example, a job which has few non blocking (async) operations and needs to be executed on specific schedule.

Solution

Introduce an option to job configuration to execute it as an "inline" function. Proposing an option name: offloaded, which would have to be set to false to run in the same even loop.

The solution would need to look into restrictions that would come with running such functions.

@niftylettuce @shadowgate15 would love to hear your opinions on this topic and hear your thoughts about the new option naming.

Failed to stop bree

{"level":30,"time":1604236413590,"pid":7466,"hostname":"ip-10-0-0-21","scope":["bree"],"msg":"Worker for job \"get_lastest_from_rss\" exited with code 0"}
Stopping bree...
bree.service: State 'stop-final-sigterm' timed out. Killing.
Stopped bree.
bree.service: Unit entered failed state.
bree.service: Failed with result 'timeout'.
Stopped bree.
Started bree.
``
I saw these lines in the log.
It was created my script connect to postgresql forever.

Interval parsing issue

Specifying a job interval of every 10 seconds as opposed to 10s results in the following very confusing error:

    this[kPort].postMessage({
                ^

DOMException [DataCloneError]: function(d) {
        return getInstances("next", 1, d, d) !== later.NEVER;
      } could not be cloned.
    at new Worker (internal/worker.js:200:17)
    at Bree.run (/Users/cedgington/repos/personal/temp/express-example/node_modules/bree/index.js:573:28)

You can see this by simply using the example project (as I did) and modify only the interval.

Dynamically set `workerData`

This is more of a question and not an issue--not sure where might be the best place to post. I can configure a worker with a workerData hash so that the scheduled job can have access to some seed data when it runs. However, is there a way to dynamically set this workerData hash on each run? Say I have a job that runs every 5 minutes, and every time it runs, I want to repopulate workerData with some data from another class in my project fetched via a getter, for example. Is that possible?

[test] cover node's internal 'messageerror' event with tests

Problem

Test coverage is turned off for messageerror events with a comment to fix it:

// NOTE: you cannot catch messageerror since it is a Node internal
// (if anyone has any idea how to catch this in tests let us know)

Research

Have started a discussion in node repository around best ways to handle such cases - nodejs/node#36333 (comment). The best advice to tackle it so far has been:

I would currently recommend just to emit the events directly

Next

Simulating these events directly as suggested would be a next thing to try here.

Note, opening up an issue here so it's easier to track progress and any information that comes up on this topic in the future.

Human-readable "every *" style intervals are still broken

This should produce two jobs that run every 2 seconds. The job with the interval "every 2 seconds" only runs once. I was using bree 3.0.1 for testing.

const Bree = require('bree')

const taskman = new Bree({
    jobs: [
        { name: 'Worker 1', path: worker_1, interval: '2s' },
        { name: 'Worker 2', path: worker_2, interval: 'every 2 seconds' },
    ]
})

function worker_1() {
    console.log('worker_1 working ...')
}

function worker_2() {
    console.log('worker_2 working ...')
}

taskman.start()

[feat] worker thread pool and other performance optimizations

Was looking into how resource intensive worker creation is through bree. I did not come across concrete numbers of the overhead associated with worker_thread instantiation. Based on the reference below it should not be considered as a "cheap" operation to run.

From Node.js worker threads documentation:

use a pool of Workers instead for these kinds of tasks. Otherwise, the overhead of creating Workers would likely exceed their benefit.
When implementing a worker pool, use the AsyncResource API to inform diagnostic tools (e.g. in order to provide asynchronous stack traces) about the correlation between tasks and their outcomes. See "Using AsyncResource for a Worker thread pool" in the async_hooks documentation for an example implementation.

(linked Worker thread pool implementation)

Was wondering if there are concrete performance metrics available illustrating an overhead associated with worker thread creation? Is it planned to utilize worker thread pool pattern (there's a Pool implementation available in bthreads)

Another idea I was contemplating was around optimizing non-CPU intensive jobs. Would it make sense to bypass worker creation and run such jobs in the main thread without breaking out of event loop? What's the threshold which would determine if it's more resource efficient to run the task on the main thread or offload it to a worker thread?

[dicsussion] merges vs rebases

Is there a preference on how PRs/changes from branches are merged into master?

I have slight preference towards keeping the master as clean as possible so would prefer clean rebase or squash+rebase combination. Can check an example project history using this pattern in Ghost - https://github.com/TryGhost/Ghost/commits/master.

Don't mind keeping the current pattern as well. Just wanted to have this written down somewhere for clarity :)

[feat] Allow forced thread termination after timeout period

Problem

Graceful thread termination is currently relying on correct handling of 'cancel' event inside of the job, which allows for hanging thread termination if job was incorrectly implemented.

Solution

To make thread termination bullet-proof bree could introduce a grace period after which thread would be terminated no matter what. Termination period could be parameterized through property like terminateWorkerAfterMs (similar to existing closeWorkerAfterMs). Would also make sense to have a a smart default like 60s (value taken from thin air and needs discussion/research) to make sure threads are always terminated after certain time.

Feature (PR welcome): Use functions for a job

As I understood the docs the only way to define a job is to have a .js file.
Is it possible or planned in the future, that we can simply pass a function for each job?

Our use-case: we use ncc to compile our node app to a single javascript file, so we cannot have separate .js files for jobs

[docs] Example integration with external persistance layer

Problem

There's a need to track, discuss, and document best approaches around adding jobs persistence layer on top of bree. This issue is a focused continuation of this comment:

An example in the README for using sockets, or redis pubsub to communicate with Bree to add new jobs (?)

and another comment :

I would recommend keeping dynamic jobs actually stored in a queue, with a persistent database, and then have a job that runs every so often to flush the queue (with limited concurrency).

References

How job persistence is approached in other job processing/queuing libraries:

Current use of bree along with MongoDB:

  • forwardemail.net - here jobs are generated based on state of records in the database
  • TBD: expand this list with more examples for future reference

Expected outcome

What I think would be the ideal outcome for this issue is documentation an example implementation of persisted jobs queue on top of bree. Ideally the example would be backend agnostic so that multiple storage approaches - NoSQL, SQL or key/value stores like Redis, could utilize it.

Maybe bree could even provide an adapter/plugin system to optionally add persistence for (1) scheduled one-off jobs and (2) failed jobs ๐Ÿค”

EMFILE: too many open files

I'm running a job every second that instantiates a class, like so

const RedisService = require('../../services/redis-service');
...
const redis = new RedisService()

This job runs for awhile and then eventually goes into a fail loop with the error

EMFILE: too many open files

and points to ...services/redis-service.js as the open that exceeded whatever ulimit -n must be on my machine. I know that there are packages like graceful-fs that might be able to solve this problem, but is this just because I'm using Bree incorrectly? Is there a proper way to close the files that a job opens when it is run by Bree?

[feat] Dynamic jobs

Hi,
is there a way to create a dynamic set of jobs? How can I pass custom parameters to a job?
I was wondering if a similar approach to the following one can be used...

const currencies = ['USD', 'EUR','GBP'];
const currencyJob = currency => () => {
// job
console.log(currency)
};

const jobs = currencies.map(currency => ({task: currencyJob(currency), interval: '2m'}));
const bree = new Bree({jobs});

// start...

Thanks

[feat] user interface

Hello, this library looks great :) . I'am currently using agenda which provides a UI that helps to monitor jobs , retry them etc... Does Bree provides a User Interface like that ? What is the best practice or recommendation you have to connect a UI with this job scheduler ?

Better directory resolution for workers

Hi there. This is a great little scheduling library that is much easier to use out of the box compared to some of the alternatives.

One thing that has been a bit of an itch is directory resolution. For apps like express, I don't want to have all my jobs sitting at <repo>/jobs by default; I'd like them to be somewhere in <repo>/src/jobs or <repo>/src/path/to/services/jobs or something. I can avoid <repo>/jobs by setting root: false on the instance, but then I need to individually set the full path of each worker in the jobs array. I also can't use a relative path like ./jobs/worker that is require-esque, I need to do something like '${__dirname}/jobs/test.js', which is just a bit ugly in my opinion and produces repeated code unless I pull out '${__dirname}/jobs/' out into a variable somewhere.

process crashes when worker with timeout exits before timeout

Setting closeWorkerAfterMs causes for the worker to be terminated after the specified time. However, if the worker completes and exits normally before closeWorkerAfterMs milliseconds, the timeout created upon worker init is not cleared. This causes a TypeError where this.workers[name] is undefined and thus this.workers[name].terminate cannot be read or called.

See #14

[bug] bthreads require is needed for proper error handling in worker scripts

Problem

When the worker script is defined without any dependencies it does not emit an 'error' event described in node documentation.

Reproduction steps

Register a job in with bree that points to following script:

// contents of uncaught-exception.js file
(async () => {
    throw new Error('catch me if you can');
})();

when the job executes observe the log:

  1. There is an uncaught error in the log (node:37046) UnhandledPromiseRejectionWarning: Error: catch me if you can
  2. Bree logs the worker exit as if it was successful: Worker for job "uncaught-exception" exited with code 0

The expected behavior would be no unhandled exception in the log and exit code 1.

Quick fix

To fix the situation load bthreads module with the job script, e.g.:

// contents of uncaught-exception-corrected.js file
require('bthreads');

(async () => {
    throw new Error('catch me if you can');
})();

Solution

I think it's worth documenting need to include bthreads in each job script and advice against using native worker_threads module as that causes unexpected behavior with unhandled exceptions within job scripts.

Using functions for jobs

Hello,

I have red a part of your document indicated:

It is highly recommended to use files instead of functions. However, sometimes it is necessary to use functions.

And we really need to use functions now.
Once we initiate a function to be executed along with cron:

Screen Shot 2020-11-27 at 10 46 32 PM

We get this error:
Screen Shot 2020-11-27 at 10 45 38 PM

Would you please assist us with this?

[perf] get rid of syncronous fs.statSync calls during job validation

Problem

Synchronous calls are blocking the even loop leading to performance degradation:

The synchronous form blocks the Node.js event loop and further JavaScript execution until the operation is complete.
(reference)

Current state

The codebase uses fs.statSync calls in Bree constructor and during job validation.

Possible solutions

There are two different parts to the problem (1) the constructor call and (2) validateJob with all it's callers (constructor and add method).

(1) Solving constructor sync calls could be approached in couple ways:

  1. Refactor constructor into 2 separate phases - general initialization and job specific initialization. Breaking down constructor into job initalization phase would allow having async job initialization logic in separate method which can be called post construction. Use could look something like this:
const bree = new Bree({...nonJobOptions});
bree.addJobs(jobOptionsArray, optionalRootPath);

Main downside of this approach is not being able to construct Bree instance in one call, and makes use of root parameter somewhat ambiguous.
2. Refactor constructor into async function. Clients could use it like this:

const bree = await new Bree({...options});

I'm not sure this even works, only checked briefly and that's what came up on SO. The downside of this approach is tricky use of super() when using async self invoking functions inside the constructor:

super note: If you need to use super, you cannot call it within the async callback.
(reference)

(2) As for validateJob method, I don't see a good reason not to convert it into an async function with async fs.stat call inside. Needs solving the constructor problem first as it's using validateJob internally.

Note, synchronous calls are problematic but not critical. It only effects initialization code which is rarely called. Nevertheless I think it's useful to track and solve it.

@niftylettuce @shadowgate15 would love to hear your opinion on this.

Interval at specific time and cron not working

Environment:

  • Windows Server 2012
  • Node.js 12.18.4
  • Task is managed by PM2

I'm using bree now, however, only interval: '30s' works. Neither interval: 'at 09:28 am' nor cron: '18 9 * * *' works. I'm using the following code:

const Bree = require('bree');

const bree = new Bree({
  jobs: [
    {
      name: 'daily-report',
      interval: '30s', // works
      // interval: 'at 09:28 am', // not working
      // cron: '18 9 * * *', // not working
    },
  ]
});

bree.start();

Error: Job "email" is already running

Hey,
After some day using and running the job, when I try to run my dev environment node bree.js
I get Error: Job "email" is already running, the strange thing is that a I kill all node process and I reestart my machine and looks like the worker is start running when start my pc, any idea how to fix that?

Error: Job "email" is already running at Bree.run (C:\...\node_modules\bree\index.js:571:11)

Question about queues

We want to switch from our own solution to a better queue/scheduler (because ours doesn't scale great). We looked at Bull but the unanswered/unfixed issues aren't a good sign. So, we're considering Bree.

Besides simple queues, cron-like and repeated jobs, we also have queues that start processing either after a certain time has passed or after a certain number of jobs have queued up. One use case is bulk inserts into the database. Is this easily achievable with Bree?

Some question about using this library

Hi, this library seems quite attractive to me. I would like to ask some questions.
Assuming I have a microservice node.js app to queue jobs from http request. input looks like:

{
     "id": "1234",
     "fireAt" : "15xxxxxxxxxx",
     "name" : "job_name" // refer to job_name.js
     "data" : {"someJSONdata": "data" }
}

And I have 3 instances (A,B,C) of this service in a cluster which also contains Redis and MongoDB.

  1. If instance A queued a job, but before it being fired, instance A somehow crashed. How to recover the job?
    I read that you suggest to query databases in worker.js to prevent duplicated jobs. But how about recovering?
  2. I saw you have an example of worker which connect to MongoDB inside it. Is it a good practice to connect to database every time the job being fired? Is there a worker thread pools to reuse threads (and maybe database connections)?
  3. In my use case, jobs could be canceled frequently. Maybe only 20% jobs fired. How to implement this case? The id in input fields is used to overwrite the current queued job. For example:
    a) Got id:"1234" job with 2s delay to be fired. Enqueue it.
    b) 1s later, got id:"1234" job with 10s delay to be fired. Cancel the job in a), enqueue the new job.
    c) 10s later, fire id:"1234" job queued in b)

Thanks for you patience.

Typescript missing

Hi I appreciate that you made this awesome library! But what I am missing little bit is TypeScript support. I possible in future add this extension?

Functional job always running on start, although interval is given

Regardless of the interval, the task is run on start. I dont think i missed an option to skip running on start.

function test () { console.log('yay') }

const bree = new Bree({
  root: false,
  jobs: [
    {
      name: 'snapshot',
      path: test,
      interval: '10s'
    }
  ]
});

bree.start()

How to use with multiple instances?

Hi, Bree looks very promising. Thanks for the hard work!

I have hacked together a mail queue which checks for unsent mails every x seconds and sends them out. My problem is: I'm running more than one instance of the application and use Postgres locks to make sure no mail is sent twice (only one instance has access to the queue at the same time).

Is there any way to solve this with Bree? Or would I still need to use locking?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.