GithubHelp home page GithubHelp logo

granek / dorado Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nanoporetech/dorado

0.0 0.0 0.0 3.69 MB

A LibTorch Basecaller for Oxford Nanopore Reads

Home Page: https://nanoporetech.com/

License: Other

C++ 84.87% C 0.42% CMake 5.28% Metal 9.43%

dorado's Introduction

Dorado

Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

Features

  • One executable with sensible defaults, automatic hardware detection and configuration.
  • Runs on Apple silicon (M1/2 family) and Nvidia GPUs including multi-GPU with linear scaling.
  • Modified basecalling (Remora models).
  • Duplex basecalling.
  • POD5 support for highest basecalling performance.
  • Based on libtorch, the C++ API for pytorch.
  • Multiple custom optimisations in CUDA and Metal for maximising inference performance.

If you encounter any problems building or running Dorado please report an issue.

Installation

Running

To run Dorado, download a model and point it to POD5 files (Fast5 files are supported but will not be as performant).

$ dorado download --model [email protected]
$ dorado basecaller [email protected] pod5s/ > calls.sam

To call modifications simply add --modified-bases.

$ dorado basecaller [email protected] pod5s/ --modified-bases 5mCG_5hmCG > calls.sam

For unaligned BAM output, dorado output can be piped to BAM using samtoools:

$ dorado basecaller [email protected] pod5s/ | samtools view -Sh > calls.bam

Stereo Duplex Calling:

$ dorado duplex [email protected] pod5s/ --pairs pairs.txt > duplex.sam

Platforms

Dorado has been tested on the following systems:

Platform GPU/CPU
Windows (G)V100, A100
Apple M1, M1 Pro, M1 Max, M1 Ultra
Linux (G)V100, A100

Systems not listed above but which have Nvidia GPUs with >=8GB VRAM and architecture from Volta onwards have not been widely tested but are expected to work. If you encounter problems with running on your system please report an issue

Roadmap

Dorado is still in alpha stage and not feature-complete, the following features form the core of our roadmap:

  1. DNA Barcode multiplexing
  2. Alignment (output aligned BAMs).
  3. Python API

Performance tips

  1. For optimal performance Dorado requires POD5 file input. Please convert your Fast5 files before basecalling.
  2. Dorado will automatically detect your GPUs' free memory and select an appropriate batch size.
  3. Dorado will automatically run in multi-GPU ('cuda:all') mode. If you have a hetrogenous collection of GPUs select the faster GPUs using the --device flag (e.g --device "cuda:0,2). Not doing this will have a detrimental impact on performance.

Available basecalling models

To download all available dorado models run:

$ dorado download --model all

The following models are currently available:

Developer quickstart

Linux dependencies

The following packages are necessary to build dorado in a barebones environment (e.g. the official ubuntu:jammy docker image)

apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        git \
        ca-certificates \
        build-essential \
        nvidia-cuda-toolkit \
        libhdf5-dev \
        libssl-dev \
        libzstd-dev \
        cmake \
        autoconf \
        automake

Clone and build

The commands below will build dorado and install it in /opt. NUM_THREADS controls the number of threads that cmake uses to compile dorado. It can be set to a value higher than "1", but using too many threads can use all available RAM and cause compilation to fail. Peak memory usage seems to be 1-2GB per thread.

export NUM_THREADS=1
git clone https://github.com/nanoporetech/dorado.git /dorado
cd /dorado
cmake -S . -B cmake-build -DCMAKE_CUDA_COMPILER=nvcc
cmake --build cmake-build --config Release --parallel $NUM_THREADS
ctest --test-dir cmake-build
cmake --install cmake-build --prefix /opt
rm -rf /dorado
cd /

Pre commit

The project uses pre-commit to ensure code is consistently formatted, you can set this up using pip:

$ pip install pre-commit
$ pre-commit install

Licence and Copyright

(c) 2022 Oxford Nanopore Technologies Ltd.

Dorado is distributed under the terms of the Oxford Nanopore Technologies, Ltd. Public License, v. 1.0. If a copy of the License was not distributed with this file, You can obtain one at http://nanoporetech.com

dorado's People

Contributors

0x55555555 avatar blawrence-ont avatar epislim avatar gkolling avatar granek avatar hiruna72 avatar iiseymour avatar jon-church-nanoporetech-com avatar kamaldaniels avatar malton-ont avatar markbicknellont avatar stuartabercrombie avatar vellamike avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.