GithubHelp home page GithubHelp logo

bioruby-pipeline's Introduction

bio-pipeline

Build Status

Common pipeline tasks. This Bio module does not do the work of a job scheduler, for this you can choose to use our simple Ruby Queue (rq) from many other schedulers.

bio-pipeline, meanwhile, addresses do-not-repeat-yoursefl (DRY) principles for creating tasks at the job level, and aims for convention-over-configuration (CoC). For example, bio-pipeline comes with a library of templates, mostly based on YAML and ERB, for common bioinformatics tasks.

Another feature of bio-pipeline is the run-once command, which caches results and won't calculate the same result twice - allowing resilience in the pipeline (when one or more jobs fails, just rerun the pipeline). Also the pipeline can be interrupted and start where it left off.

You do not need to know Ruby to use bio-pipeline. But you it may be interesting to note that other successful tools for cluster deployment use similar ideas. For example Chef uses Ruby, YAML and ERB for configuring machines. It may be an idea to combine Chef with bio-pipeline.

Note: this software is under active development! Feel free to pitch in.

task files as YAML/erb templates

In order to describe a job that can be run in a pipeline, we introduce a data structure in YAML, a task file, which acts also as a template preparsed by erb. An example for running an alignment program would be

    # task file: muscle.yaml
    :inputs:
      - <%= in_file = 'aa.fa' %>      # here we set in_file too!
    :commands:
      - <%= muscle_bin %> -i <%= in_file %> -o <%= output_dir %>/aa-align.fa
    :outputs:
      - <%= output_dir %>             # defaults to ./output

Note that in_file gets defined in the YAML task file, while muscle_bin and output_dir are defined by the calling context. Run this command from the command line with

  ./bin/runner -c muscle.yaml

The idea here is to have richer meta-data possibilities, and rather than using commands on the command line we can easily share common tasks, add context, paths, and features like creating and copying the output_dir.

To set/override parameters outside the template, they can also be added on the command line as switches:

  ./bin/runner -c muscle.yaml -output_dir tmp -muscle_bin /opt/muscle/bin/muscle

the runner handles that by copying the switches into the name space - using some nice Ruby magic.

erb executes the Ruby between <% and %> on compiling the template. After this, at runtime, you can run Ruby programs as scripts, but you can also call into the bio-pipeline engine and libraries. A command is always checked if it exists as a method in the engine's namespace first. So if a command exists as a method the rest of the command is executed as Ruby in the local interpreter. For example

    :commands:
      - BioPipeline::report(<%= in_file %>,<%= output_dir %>/aa-align.fa)

Within the task file commands section, commands are simply executed in sequence.

Chaining task files

Chaining tasks allows modularising work in task files - so each task file represents as few steps as possible. To chain we want

  1. to call the next task file
  2. to pass in new inputs (including output of the current task)

(more soon)

run-once

(coming soon)

map reduce and dependencies

(coming soon)

more documentation

Features describe the behaviour of bio-pipeline. More documentation can also be found

Installation

(sorry, not ready yet!)

    gem install bio-pipeline

Usage

    require 'bio-pipeline'

The API doc is online. For more code examples see the test and feature files in the source tree.

Project home page

Information on the source tree, documentation, examples, issues and how to contribute, see

http://github.com/pjotrp/bioruby-pipeline

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

If you use this software, please cite one of

Biogems.info

This Biogem is published at #bio-pipeline

Copyright

Copyright (c) 2012 Pjotr Prins. See LICENSE.txt for further details.

bioruby-pipeline's People

Contributors

pjotrp avatar

Stargazers

Yannick Wurm avatar Robert Syme avatar Raoul J.P. Bonnal avatar  avatar

Watchers

 avatar  avatar

bioruby-pipeline's Issues

gem install does not seem to work

Hi!

gem install bio-pipeline

ERROR: Could not find a valid gem 'bio-pipeline' (>= 0) in any repository
ERROR: Possible alternatives: ruby-pipeline, rake-pipeline, pipeline, pipeliner, empipelines

The above does not work. Was the gem not broadcasted yet?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.