A Task Queue for Coordinating Varied Tasks Across Multiple HPC Resources and HPC Jobs.
The main interface is the JobQueue
module that connects to a Postgres database and allows users to send JSON objects as task descriptions to the database, which then will return these to individual resources when they request a task. The software needs to know how to unpack the JSON and execute the correct job from the description.
If the database is configured to be available from multiple clusters, then the compute nodes at these different clusters can all request tasks from the JobQueue
and participate in the overall workflow.
We build our environment using conda. Execute the following steps to install the development environment.
conda env create
source activate jobqueue
python setup.py develop
If you'd like to include this module in an existing project, we recommend including it as a pip dependency in your environment.yml
conda file. For example:
"--editable=git+https://github.com/NREL/E-Queue-HPC.git@master#egg=jobqueue"
At the moment, you must have read/write/create
capabilities on your database. In order for jobqueue
to know which databases you have access to you will need to describe these in a hidden file located in your home directory called ".jobqueue.json". The structure should look like the following:
File location: os.path.join(os.environ['HOME'], ".jobqueue.json")
{
"project1": {
"host": "HOSTNAME",
"user": "project1ops",
"dbname": "project1",
"password": "*****************",
"table_name": "jobqueue"
},
"project2": {
"host": "HOSTNAME",
"user": "project2ops",
"dbname": "project2",
"password": "*****************",
"table_name": "jobqueue"
},
}
Please see the test case for details on using the JobQueue
module. To run the test, type:
pytest -s -v