GithubHelp home page GithubHelp logo

scilifelab / taca Goto Github PK

View Code? Open in Web Editor NEW

This project forked from guillermo-carrasco/taca

13.0 17.0 17.0 2.93 MB

Tool for the Automation of Cleanup and Analyses: tools for projects and data management at NGI Stockholm

License: MIT License

Shell 0.02% Python 98.46% HTML 1.35% Dockerfile 0.16%

taca's Introduction

Tool for the Automation of Cleanup and Analyses

PyPI version Documentation Status codecov

This package contains several tools for projects and data management in the National Genomics Infrastructure in Stockholm, Sweden.

Run tests in docker

git clone https://github.com/SciLifeLab/TACA.git
cd TACA
docker build -t taca_testing --target testing .
docker run -it taca_testing

Installation

Inside the repo, run pip install .

Development

Run pip install requirements-dev.txt to install packages used for development and pip install -e . to make the installation editable.

Automated linting

This repo is configured for automated linting. Linter parameters are defined in pyproject.toml.

As of now, we use:

  • ruff to perform automated formatting and a variety of lint checks.

    • Run with ruff check . and ruff format .
  • mypy for static type checking and to prevent contradictory type annotation.

    • Run with mypy **/*.py
  • pipreqs to check that the requirement files are up-to-date with the code.

    • This is run with a custom Bash script in GitHub Actions which will only compare the list of package names.

      # Extract and sort package names
      awk '{print $1}' $1 | sort -u > "$1".compare
      awk -F'==' '{print $1}' $2 | sort -u > "$2".compare
      
      # Compare package lists
      if cmp -s "$1".compare "$2".compare
      then
        echo "Requirements are the same"
        exit 0
      else
        echo "Requirements are different"
        exit 1
      fi
      
  • prettier to format common languages.

    • Run with prettier .
  • editorconfig-checker to enforce .editorconfig rules for all files not covered by the tools above.

    • Run with
      editorconfig-checker $(git ls-files | grep -v '.py\|.md\|.json\|.yml\|.yaml\|.html')
      

Configured in .github/workflows/lint-code.yml. Will test all commits in pushes or pull requests, but not change code or prevent merges.

Will prevent local commits that fail linting checks. Configured in .pre-commit-config.yml.

To set up pre-commit checking:

  1. Run pip install pre-commit
  2. Navigate to the repo root
  3. Run pre-commit install

This can be disabled with pre-commit uninstall

VS Code automation

To enable automated linting in VS Code, go the the user settings.json and include the following lines:

"[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
}

This will run the ruff-mediated linting with the same parameters as the GitHub Actions and pre-commit every time VS Code is used to format the code in the repository.

To run formatting on save, include the lines:

"[python]": {
    "editor.formatOnSave": true,
}

Git blame suppression

When a non-invasive tool is used to tidy up a lot of code, it is useful to supress the Git blame for that particular commit, so the original author can still be traced.

To do this, add the hash of the commit containing the changes to .git-blame-ignore-revs, headed by an explanatory comment.

Deliver command

There is also a plugin for the deliver command. To install this in the same development environment:

# Install taca delivery plugin for development
git clone https://github.com/<username>/TACA.git
cd ../taca-ngi-pipeline
python setup.py develop
pip install -r ./requirements-dev.txt

# add required config files and env for taca delivery plugin
echo "foo:bar" >> ~/.ngipipeline/ngi_config.yaml
echo "foo:bar" >> ~/.ngipipeline/ngi_config.yaml
mkdir ~/.taca && cp tests/data/taca_test_cfg.yaml ~/.taca/taca.yaml
export CHARON_BASE_URL="http://tracking.database.org"
export CHARON_API_TOKEN="charonapitokengoeshere"

# Check that tests pass:
cd tests && nosetests -v -s

For a more detailed documentation please go to the documentation page.

taca's People

Contributors

aanil avatar alneberg avatar b97pla avatar chuan-wang avatar ewels avatar franbonath avatar galithil avatar guillermo-carrasco avatar hammarn avatar ingkebil avatar jfnavarro avatar kate-v-stepanova avatar kedhammar avatar parlundin avatar pekrau avatar remiolsen avatar robinandeer avatar senthil10 avatar ssjunnebo avatar sylvinite avatar vezzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taca's Issues

archive funcitonality not looking at the days

Even though it is in the argument list

def archive_to_swestore(days, run=None)

It is not used in this method (it is in cleanup), so basically it will archive all the runs, regardless whatever you specify as old.

Implement delivery routine

Delivery of analysis data, as outlined in NGI delivery policies document, should be implemented and managed with TACA.

Remove contributors from README

What do you think? It is implicit in the commit history. Actually, it is availably in the "Contributors" tab on the repository so... one less thing to keep up to date.

Docs docs docs

Hmmm this is just a question: Do you think it is enough with the help of the package?

(master) ~/repos_and_code/TACA (master) ~> taca --help
Usage: taca [OPTIONS] COMMAND [ARGS]...

  Tool for the Automation of Storage and Analyses

Options:
  --version                   Show the version and exit.
  -c, --config-file FILENAME  Path to TACA configuration file
  --help                      Show this message and exit.

Commands:
  analysis  Analysis methods entry point
  storage   Storage management methods and utilities

etc. Or do you think we should add a page per subcommand in the documentation? Like one page for taca storage, one page per taca analysis, etc.

I don't want to over-document, thats the thing, but I don't want either that subcommands or options become forgotten. On the other hand... is a subcommand becomes forgotten is basically because it is not used, so it shouldn't be there....

what do you think? @senthil10 @vezzi @ewels @mariogiov

PM - Check if run exists in Swestore

Now it will crash if the run already exists in Swestore:

ERROR: putUtil: put error for /ssUppnexZone/proj/a2010002/141120_M01548_0038_000000000-AB8D9.tar.bz2, status = -312000 status = -312000 OVERWRITE_WITHOUT_FORCE_FLAG

pm is not logging to a file

Even though it is specified in the configuration file:

# This section overrides the default login parameters in Cement
log.logging:
    file: /home/hiseq.bioinfo/log/pm.log
    rotate: True

-r option not working properly

(master)hiseq.bioinfo@seq-nas-3:/srv/illumina/hiseq_data/nosync$ taca storage archive-to-swestore -r 150113_D00456_0058_AC6KUBANXX.tar.bz2
Traceback (most recent call last):
  File "/home/hiseq.bioinfo/.anaconda/envs/master/bin/taca", line 5, in <module>
    pkg_resources.run_script('taca==1.0', 'taca')
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 534, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 1434, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/EGG-INFO/scripts/taca", line 38, in <module>
    app.run()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/foundation.py", line 694, in run
    self.controller._dispatch()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 455, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 461, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/taca/controllers/storage.py", line 56, in archive_to_swestore
    self._archive_run(self.pargs.run)
AttributeError: 'StorageController' object has no attribute 'pargs'

Demultiplexing should be machine agnostic

Baically, taca analysis demultiplex -r <HiSeq run> should work as taca analysis demultiplex -r <MiSeq run> and taca analysis demultiplex -r <XTen run> without the user having to specify the run type.

Samplesheets for HAS

That might not be true for the latest versions, but if you want to make the samplesheets HAS compatible, you need a key named "Workflow" under the [Header] key, and possibly a [Settings] key before [Data]

Detach iput command

This command takes ages for a HiSeq/XTen run, and it only uses one core, so I think that we could detach it and continue to tarball the next run. So basically at a given point we would have just one run being compressed (using several cores), but several being sent at the same time to swestore.

If we don't do like this, the risk of creating a queue of pm processes is high.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.