GithubHelp home page GithubHelp logo

scilifelab / ngi-neutronstar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nf-core/neutronstar

2.0 17.0 1.0 22.93 MB

De novo assembly pipeline for 10X linked-reads, used at the SciLifeLab National Genomics Infrastructure.

License: MIT License

R 3.78% Python 15.39% HTML 10.77% Nextflow 59.13% Dockerfile 10.94%
nextflow denovo-assembly pipeline bioinformatics

ngi-neutronstar's Introduction

NGI-NeutronStar

Nextflow

This pipeline has moved!

This pipeline has been moved to the new nf-core project. You can now find it here:

If you have any problems with the pipeline, please create an issue at the above repository instead.

To find out more about nf-core, visit http://nf-co.re/

This repository will be archived to maintain the released versions for future reruns, in the spirit of full reproducibility.

If you have any questions, please get in touch: [email protected]

// Phil Ewels, 2018-10-05


Table of Contents

  1. Introduction
  2. Installation
  3. Usage instuctions
  4. Pipeline overview
  5. Credits

Introduction

NGI-NeutronStar is a bioinformatics best-practice analysis pipeline used for de-novo assembly and quality-control of 10x Genomics Chromium data. It's developed and used at the National Genomics Infastructure at SciLifeLab Stockholm, Sweden. The pipeline uses Nextflow, a bioinformatics workflow tool.

Disclaimer

This software is in no way affiliated with nor endorsed by 10x Genomics.

Installation

Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed by running the following commands:

# Make sure that Java v7+ is installed:
java -version

# Install Nextflow
curl -fsSL get.nextflow.io | bash

# Add Nextflow binary to your PATH:
mv nextflow ~/bin
# OR system-wide installation:
# sudo mv nextflow /usr/local/bin

You need NextFlow version >= 0.25 to run this pipeline.

While it is possible to run the pipeline by having nextflow fetch it directly from GitHub, e.g nextflow run SciLifeLab/NGI-NeutronStar, depending on your system you will most likely have to download it (and configure it):

get https://github.com/SciLifeLab/NGI-NeutronStar/archive/master.zip
unzip master.zip -d /my-pipelines/
cd /my_data/
nextflow run /my-pipelines/NGI-NeutronStar-master

Singularity

If running the pipeline using the Singularity configurations (see below), Nextflow will automatically fetch the image from DockerHub. However if your compute environment does not have access to the internet you can build the image elsewhere and run the pipeline using:

# Build image
singularity pull --name "ngi-neutronstar.simg" docker://remiolsen/ngi-neutronstar
# After uploading it to your_hpc:/singularity_images/
nextflow run -with-singularity /singularity_images/ngi-neutronstar.simg /my-pipelines/NGI-NeutronStar-master

Busco data

By default NGI-NeutronStar will look for the BUSCO lineage datasets in the data folder, e.g. /my-pipelines/NGI-NeutronStar-master/data/. However if you have these datasets installed any other path it is possible to specify this using the option --BUSCOfolder /path/to/lineage_sets/. Included with the pipeline is a script to download BUSCO data, in /my-pipelines/NGI-NeutronStar-master/data/busco_data.py

# Example downloading a minimal, but broad set of lineages
cd /my-pipelines/NGI-NeutronStar-master/data/
# To list the datasets
# Category minimal contains:
#  - bacteria_odb9
#  - eukaryota_odb9
#  - metazoa_odb9
#  - protists_ensembl
#  - embryophyta_odb9
#  - fungi_odb9
python busco_data.py list minimal
# To download them
python busco_data.py download minimal

Usage instructions

It is recommended that you start the pipeline inside a unix screen (or alternatively tmux).

Single assembly

To assemble a single sample, the pipeline can be started using the following command:

nextflow run \\
    -profile nextflow_profile \\
    /path/to/NGI-NeutronStar/main.nf \\
    [Supernova options] \\
    (--clusterOptions)
  • nextflow_profile is one of the environments that are defined in the file nextflow.config
  • [Supernova options] are the following options that are following supernova options (use the command supernova run --help for a more detailed description or alternatively read the documentation available by 10X Genomics)
    • --fastqs required
    • --id required
    • --sample
    • --lanes
    • --indices
    • --bcfrac
    • --maxreads
  • --clusterOptions are the options to feed to the HPC job manager. For instance for SLURM --clusterOptions="-A project -C node-type"
  • --genomesize required The estimated size of the genome(s) to be assembled. This is mainly used by Quast to compute NGxx statstics, e.g. N50 statistics bound by this value and not the assembly size.

Multiple assemblies

NGI-NeutronStar also supports adding the above parameters in a .yaml file. This way you can run several assemblies in parallel. The following example file (sample_config.yaml) will run two assemblies of the test data included in the Supernova installation, one using the default parameters, and one using barcode downsampling:

genomesize: 1000000
samples:
  - id: testrun
    fastqs: /sw/apps/bioinfo/Chromium/supernova/1.1.4/assembly-tiny-fastq/1.0.0/
  - id: testrun_bc05
    fastqs: /sw/apps/bioinfo/Chromium/supernova/1.1.4/assembly-tiny-fastq/1.0.0/
    maxreads: 500000000
    bcfrac: 0.5

Run nextflow using nextflow run -profile -params-file sample_config.yaml /path/to/NGI-NeutronStar/main.nf (--clusterOptions)

Advanced usage

If not specifying the option -profile it will use a default one that is suitable to testing the pipeline on a typical laptop computer (using the test dataset included with the Supernova package). In a high-performance computing environment (and with real data) you should specify one of the hpc profiles. For instance for a compute cluster with the Slurm job scheduler and Singularity version >= 2.4 installed, hpc_singularity_slurm.


Pipeline overview

NGI-NeutronStarChart


Credits

These scripts were written for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden. Written by Remi-Andre Olsen (@remiolsen).


SciLifeLab National Genomics Infrastructure


ngi-neutronstar's People

Contributors

ewels avatar remiolsen avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ewels

ngi-neutronstar's Issues

Remis master To-do list

For transparency, here's my "design document"


Main requirements

  • Nextflow? yes
  • Supernova running in $SNIC_TMP (Irma compatible?)
    • 1.20 compatible — multiple input parameter assemblies
    • [ ] Use nextflow publishdata in stead of rsync couldn't make it work. Use rsync!
    • Make a this optional
  • Rsync supernova assembly back to workdir
  • supernova mkoutput - pseudohap, megabubbles
    • gunzip
    • parameter of additional outputs — always output .phased.fasta
    • parameter for minimum length
  • QUAST
    • make it run on Irma
  • BUSCO
    • UPPMAX — beforeScript
  • MultiQC
    • Needs testing
  • support for --no-preflight flag
  • Documentation
    • Readme.md
  • dump software versions & commands that were run
  • Send mail when done pipeline is finished
  • Clean up and generalize the configs
    • Common HPC config
    • Common Uppmax config
    • Make a general local run config
  • Release tags

Docker / Singularity

  • Supernova (copyright issues?)
  • Quast
  • BUSCO
  • Script for automatic singularity/docker download / installation

NX script

  • input configuration:
    • id
      • fastqs
      • sample
      • maxreads
      • bcfrac
    • genomesize
  • memory parameter
  • cpu parameter
  • make Longranger / fastqc optional

Input_validation

  • id — only numbers, letters, dash, and underscore allowed
  • bcfrac (0,1)
  • maxreads - num

MultiQC

  • Fix when having empty molecule.yaml files
  • Does having “ASSEMBLER_CS” folders break multiqc?
  • Fix QUAST module. It breaks when running with -s option

Testing

  • Test data from NA12878 run.
  • Travis-CI integration

Could haves

  • Tigmint evaluation
  • Delivery template mail / output folder structure
  • BWA align
    • picard-tools
    • remove dups
    • collectinsertsize
  • qaTools-singularity
  • FRC-singularity
  • BUSCOv2 datasets in config
    • auto-script to download datasets

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.