GithubHelp home page GithubHelp logo

twang18 / connet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hku-bal/connet

0.0 1.0 0.0 15.65 MB

CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning

Makefile 0.80% C++ 42.24% Python 50.08% Shell 6.88%

connet's Introduction

CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning


Introduction

Single-molecule sequencing technologies produce much longer reads compared to next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies, and a computationally-intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered that the spatial relationship of alignment pileup is crucial to high-quality consensus and developed a deep learning-based consensus tool, CONNET, which outperforms the fastest tools, based on partial-order alignment, in terms of both accuracy and speed. We tested CONNET using a 90x dataset of E. coli and a 37x human dataset. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. Diploid consensus on the above human assembly further reduced 12% of the consensus errors made in the haploid results.


Installation

# make sure the following tools are installed
samtools 
minimap2
parallel
python2

# make sure the following Python packages are installed
tensorflow == 1.13.1
keras == 2.2.4
numpy == 1.16.4

git clone https://github.com/HKU-BAL/CONNET.git
cd CONNET

python2 setup.py build_ext --inplace
# This will compile a `parse_pileup.so` in current folder.

export CONNET=$PWD/connet.py 
export CONNET_DIPLOID=$PWD/diploid.sh

Quick demo

  • Step 1. Install
  • Step 2. Obtain sample input
bash sample_data/download.sh
  • Step 3. Run
mkdir ecoli_demo
cd ecoli_demo
python2 $CONNET ../models/ecoli.model1 ../models/ecoli.model2 ../sample_data/ecoli_raw_reads.fq ../sample_data/ecoli_draft_assembly.fa
  • Step 4. Result is at 2.fa

By default, CONNET runs for 2 iterations

Result from iteration 1 is at 1.fa


Pretrained Models

Included at models/

  • Trained on E. coli: models/ecoli.*
  • Trained on H. sapiens chromosome 1: models/human.chr1.*

N.B. correction phase and recovery phase are trained separately, *.model1 is trained for correction phase, *.model2 is trained for recovery phase. They are not compatible and both are necessary.

General usage

Haploid Consensus

# haploid consensus
mkdir new_experiment
cd new_experiment
python2 $CONNET model1 model2 raw_reads.fa draft_assembly.fa

Diploid consensus

# make sure whatsapp, bgzip, tabix is installed
mkdir new_experiment
cd new_experiment
bash $CONNET_DIPLOID model1 model2 raw_reads.fa draft_assembly.fa

Notes

CONNET was benchmarked on a 24-core Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz workstation

  • For machines with limited processors, reduce T (number of thread) in connet.py.
  • For machines with limited memory, reduce PHASE1_BATCHSIZE, PHASE2_BATCHSIZE (in bp) in connet.py.

connet's People

Contributors

bbvan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.