GithubHelp home page GithubHelp logo

brianeads / bayerclaw Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bayer-group/bayerclaw

0.0 0.0 0.0 1.26 MB

BayerCLAW workflow orchestration system for AWS

License: BSD 3-Clause "New" or "Revised" License

Shell 0.19% Python 99.81%

bayerclaw's Introduction

Bayer CLoud Automated Workflows (BayerCLAW)

BayerCLAW is a workflow orchestration system targeted at bioinformatics pipelines. A workflow consists of a sequence of computational steps, each of which is captured in a Docker container. Some steps may parallelize work across many executions of the same container (scatter/gather pattern).

A workflow is described in a YAML file. The BayerCLAW compiler uses AWS CloudFormation to transform the workflow description into AWS resources used by the workflow. This includes an AWS StepFunctions state machine that represents the sequence of steps in the workflow.

A workflow typically takes several parameters, such as sample IDs or paths to input files. Once the workflow definition has been deployed, the workflow can be executed by copying a JSON file with the execution parameters to a "launcher" S3 bucket, which is constructed by BayerCLAW. The workflow state machine uses AWS Batch to actually run the Docker containers, in the proper order.

Documentation

The doc/ directory of this repo contains all the pages linked above.

Key components of BayerCLAW

The workflow definition

The BayerCLAW workflow template is a JSON- or YAML-formatted file describing the processing steps of the pipeline. Here is an example of a very simple, one-step workflow:

Transform: BC_Compiler

params:
  repository: s3://example-bucket/hello-world/${job.SAMPLE_ID}

steps:
  - hello:
      image: docker.io/library/ubuntu
      commands:
        - echo "Hello world! This is job ${job.SAMPLE_ID}!"

The repository

The repository is a path within an S3 bucket where a given workflow stores its output files, such as s3://generic-workflow-bucket/my-workflow-repo/. The repo is typically parameterized with some job-specific unique ID, so that each execution of the workflow is kept separate. For example, s3://generic-workflow-bucket/my-workflow-repo/job12345/

Job data file

The job data file contains data needed for a single pipeline execution. This data must be encoded as a flat JSON object with string keys and string values. Even integer or float values should be quoted as strings.

Copying the job data file into the launcher bucket will trigger an execution of the pipeline. Overwriting the job data file, even with the same contents, will trigger another execution.

Sample job data file

{
  "SAMPLE_ID": "ABC123",
  "READS1": "s3://workflow-bucket/inputs/reads1.fq",
  "READS2": "s3://workflow-bucket/inputs/reads2.fq"
}

bayerclaw's People

Contributors

jetabaska64 avatar jack-e-tabaska avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.