GithubHelp home page GithubHelp logo

eternal-flame-ad / batch-vs-runner Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 54 KB

framework for batching sdf|mol2|pdb file(s) for parallellized analysis

License: Apache License 2.0

Go 88.14% Smarty 4.66% Shell 7.20%

batch-vs-runner's Introduction

batch-vs-runner

Description

CLI tool for running virtual screening (or other batch processing on chemical structures) on multiple processors by creating batches and running them in parallel.

Usage of batch-vs-runner: batch-vs-runner [FLAGS] [SD|PDB|PDBQT|MOL2|DIRECTORY]...
  -batchEnd int
        end at Nth molecule, 0 means all molecules
  -batchSize int
        batch size (default 100)
  -batchStart int
        start from Nth molecule (cumulative across all input files) (default 1)
  -delay int
        delay a certain amount of time (in ms) between spawning the next process, useful for programs that periodically do heavy IO
  -enableSlurm
        detect slurm allocations based on environment variable and use srun to run jobs (default true)
  -exec string
        command to execute in worker (default "./job.sh")
  -lineBreak string
        linebreak for output structure: unix, dos, or mac (default "unix")
  -np int
        no. of worker processes (does not apply if slurm mode is in use) (default 1)
  -prefix string
        prefix on individual job work directory (default "job")
  -slurmNodeTaskOverride string
        override how many tasks to distribute to each node from the env received from slurm
  -verbose
        pass through worker script output to terminal
  -workspace string
        path to job setup files (can be a directory or single file) (default ".")
  -workspaceOnly
        generate workspace only but do not execute any job, you can use anything to execute the job once the workspace has been compiled

Get Started

  1. Create a folder as the "template" for each batch's workspace. During runtime, the program will automatically generate a workspace for each batch. You can put files that you want to copy to all workspaces (configuration files, batch scripts, etc.) here. Additionally, the batch of molecules for each job will be generated by the program automatically, named "job.sd", "job.sdf", "job.mol2" depending on input file extension. See Execution Environment part for details on how to write the template workspace.

  2. Execute "batch-vs-runner" with corresponding flags, format is Go standard lib style -key=value. Examples:

    • -workspace=path/to/my_workspace Workspace template is at path path/to/my_workspace
    • -np=20 20 parallel processes
    • -verbose=true pass through worker script output to terminal
    • -delay=1000 delay 1000ms before starting the next process during initialization.
    • -batchSize=10 override batch size to 10 molecules
    • -batchEnd=100 end at the 100th molecule (cumulative across all input files specified)

Full examples:

  • ./batch-vs-runner -np=30 -workspace=my_dock_job_template -batchSize=50 my_library.sdf Split my_library.sdf into 50-molecule batches and generate a workspace just like my_dock_job_template folder for each batch. Run job.sh in each batch with 30 parallel processes.
  • ./batch-vs-runner -workspace=my_dock_job_template -batchEnd=100 -batchSize=100 -workspaceOnly=true my_library.sdf Split my_library.sdf into 100-molecule batches, ending at the 100th molecule, and generate a workspace just like my_dock_job_template folder. Only generate workspace but do not execute job.sh. You can cd into the work directory and do whatever you want. Mainly used for testing and debugging.

Execution Environment

Template folder

  • Files will preserve their path relative to the template folder when they are compiled, so workspace/some_dir/file.txt will be copied to job_*_*/some_dir/file.txt upon execution. File modes will also be copied, exception is common executable files such as .sh .bash .run will be automatically added executable permission when they are compiled to the workspace.
  • Files with .tpl extension will be processed through Go text/template system, and they will be executed with . Context filled with the batch definition for each batch job. See example/gold/example.txt.tpl as an example.
  • A job.<ext> file will automatically be generated containing the molecules belonging to the batch. <ext> is mol2 sd sdf pdb pdbqt depending on input molecule format.
  • I recommend not leave empty folders in template directory. If you want to explicitly create an empty folder, use mkdir in job.sh or touch .keep > template/empty_dir

HPC environment with slurm

This program can automatically parse environment variables set by slurm and distribute jobs to the nodes allocated. (files won't be transferred automatically as of now, so must be run on a shared storage). No extra configuration needed.

To exilicitly disable this behavior (a.k.a.) do not use srun and run all job shell files on master node, use -enableSlurm=false.

Use flag -slurmNodeTaskOverride to override how many tasks to distribute to each node. Format is comma-separated list of numbers or numbers plus (xN) where N denotes the same configuration for N nodes.

job.sh file

The default command to execute for each batch job is bash -c ./job.sh. Thus, just add a script called job.sh in the template folder and it will be run automatically during runtime. Call your docking software in job.sh and ask it to dock file job.sdf, job.mol2 etc. depending on your input molecule type.

NOTE: The work directory for each batch script is the batch folder, so if you have a software.conf or software.conf.tpl in your job template, the correct way to refer to that file in job script is just software.conf or ./software.conf. If you want to override this bahavior, use cd in your job.sh

Examples

See examples/ folder for some example workspace templates.

batch-vs-runner's People

Contributors

eternal-flame-ad avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.