GithubHelp home page GithubHelp logo

helrick / data-processing-utility-tools Goto Github PK

View Code? Open in Web Editor NEW

This project forked from icgc-argo-workflows/data-processing-utility-tools

0.0 0.0 0.0 6.51 MB

Collection of utility tools for ARGO data processing

License: GNU Affero General Public License v3.0

Dockerfile 2.43% Python 53.25% Common Workflow Language 1.87% Shell 13.69% Nextflow 28.76%

data-processing-utility-tools's Introduction

Build Status

Data processing utility tools

This repository keeps a collect of data processing utility tools for ARGO analytic pipelines. All tools are defined using Nextflow workflow language.

Every tool is self-sufficient, can be independently developed, tested, released and used. This clean isolation allows maximum flexibility, maintainability and portability.

These tools are building blocks to create multi-step data analysis workflows as needed, like the workflows here: https://github.com/icgc-argo/dna-seq-processing-wfs and here: https://github.com/icgc-argo/variant-calling-wfs

Development Process

As tools are meant to be independent from each other, arguably a better choice is to develop each tool using its own source control repository and container image. In reality, it's undesirable to have to manage too many repositories, so we ended up with using one repository for many tools. Despite sharing the same repository, in tools development, we still want to follow good practices to ensure as much as possible tools are independent.

Besides common software development practices such as feature branches, PRs and code reviews, we'd like to follow additional guidelines.

  1. To start development, start a branch from the current master.
  2. If the planned work affects only one tool, name the branch with the tool's name as prefix, followed by '.release_version' (which is the same as the next new release tag, eg, score-download.0.1.4). During development, no code/file not related to this tool should be changed.
  3. In the tool's Nextflow process file update docker image and tag in 'container' to include the same release version, for example: quay.io/icgc-argo/score-download:score-download.0.1.4.
  4. During the tool development, when it's ready local commits should be pushed to the git server to trigger automated Travis CI tests. Note that first trigger of the test may fail due to missing docker image (which is to be or being built by Quay.io, this problem maybe solved if we use Travis or Github to built docker image).
  5. Repeat #3 until planned features are done and all tests pass. You should also run local testing as well by invoking pytest -v, some tests are only executed locally due to security issues (at some point we should come up with plan to run those tests on Travis). If needed, you can also test the updated version of the tool in workflows that need it, just update the tool URL to point to tool's development branch.
  6. Create PR for review when ready.
  7. Review and update as needed until all tests (Travis + local tests) pass and PR approved.
  8. Merge feature branch to master, and delete the branch. Create a new release from the current master branch, use the same tag as the tool development branch that was just merged and deleted, eg score-download.0.1.4.
  9. Note that if the work involves changing the sharded base docker (ie, /docker/Dockerfile) with new or updated installed dependencies all tools that use the base docker should be updated individually going through the above process.

data-processing-utility-tools's People

Contributors

baminou avatar hknahal avatar junjun-zhang avatar lindaxiang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.