CLI tool for running virtual screening (or other batch processing on chemical structures) on multiple processors by creating batches and running them in parallel.
Usage of batch-vs-runner: batch-vs-runner [FLAGS] [SD|PDB|PDBQT|MOL2|DIRECTORY]...
-batchEnd int
end at Nth molecule, 0 means all molecules
-batchSize int
batch size (default 100)
-batchStart int
start from Nth molecule (cumulative across all input files) (default 1)
-delay int
delay a certain amount of time (in ms) between spawning the next process, useful for programs that periodically do heavy IO
-enableSlurm
detect slurm allocations based on environment variable and use srun to run jobs (default true)
-exec string
command to execute in worker (default "./job.sh")
-lineBreak string
linebreak for output structure: unix, dos, or mac (default "unix")
-np int
no. of worker processes (does not apply if slurm mode is in use) (default 1)
-prefix string
prefix on individual job work directory (default "job")
-slurmNodeTaskOverride string
override how many tasks to distribute to each node from the env received from slurm
-verbose
pass through worker script output to terminal
-workspace string
path to job setup files (can be a directory or single file) (default ".")
-workspaceOnly
generate workspace only but do not execute any job, you can use anything to execute the job once the workspace has been compiled
-
Create a folder as the "template" for each batch's workspace. During runtime, the program will automatically generate a workspace for each batch. You can put files that you want to copy to all workspaces (configuration files, batch scripts, etc.) here. Additionally, the batch of molecules for each job will be generated by the program automatically, named "job.sd", "job.sdf", "job.mol2" depending on input file extension. See Execution Environment part for details on how to write the template workspace.
-
Execute "batch-vs-runner" with corresponding flags, format is Go standard lib style
-key=value
. Examples:-workspace=path/to/my_workspace
Workspace template is at pathpath/to/my_workspace
-np=20
20 parallel processes-verbose=true
pass through worker script output to terminal-delay=1000
delay 1000ms before starting the next process during initialization.-batchSize=10
override batch size to 10 molecules-batchEnd=100
end at the 100th molecule (cumulative across all input files specified)
Full examples:
./batch-vs-runner -np=30 -workspace=my_dock_job_template -batchSize=50 my_library.sdf
Splitmy_library.sdf
into 50-molecule batches and generate a workspace just likemy_dock_job_template
folder for each batch. Runjob.sh
in each batch with 30 parallel processes../batch-vs-runner -workspace=my_dock_job_template -batchEnd=100 -batchSize=100 -workspaceOnly=true my_library.sdf
Splitmy_library.sdf
into 100-molecule batches, ending at the 100th molecule, and generate a workspace just likemy_dock_job_template
folder. Only generate workspace but do not executejob.sh
. You cancd
into the work directory and do whatever you want. Mainly used for testing and debugging.
- Files will preserve their path relative to the template folder when they are compiled, so
workspace/some_dir/file.txt
will be copied tojob_*_*/some_dir/file.txt
upon execution. File modes will also be copied, exception is common executable files such as.sh
.bash
.run
will be automatically added executable permission when they are compiled to the workspace. - Files with
.tpl
extension will be processed through Go text/template system, and they will be executed with.
Context filled with the batch definition for each batch job. Seeexample/gold/example.txt.tpl
as an example. - A
job.<ext>
file will automatically be generated containing the molecules belonging to the batch.<ext>
ismol2
sd
sdf
pdb
pdbqt
depending on input molecule format. - I recommend not leave empty folders in template directory. If you want to explicitly create an empty folder, use
mkdir
injob.sh
ortouch .keep > template/empty_dir
This program can automatically parse environment variables set by slurm and distribute jobs to the nodes allocated. (files won't be transferred automatically as of now, so must be run on a shared storage). No extra configuration needed.
To exilicitly disable this behavior (a.k.a.) do not use srun
and run all job shell files on master node, use -enableSlurm=false
.
Use flag -slurmNodeTaskOverride
to override how many tasks to distribute to each node. Format is comma-separated list of numbers or numbers plus (xN)
where N denotes the same configuration for N
nodes.
The default command to execute for each batch job is bash -c ./job.sh
. Thus, just add a script called job.sh
in the template folder and it will be run automatically during runtime. Call your docking software in job.sh
and ask it to dock file job.sdf
, job.mol2
etc. depending on your input molecule type.
NOTE: The work directory for each batch script is the batch folder, so if you have a software.conf
or software.conf.tpl
in your job template, the correct way to refer to that file in job script is just software.conf
or ./software.conf
. If you want to override this bahavior, use cd
in your job.sh
See examples/
folder for some example workspace templates.