morispi / consent Goto Github PK

Scalable long read self-correction and assembly polishing with multiple sequence alignment

Home Page: https://doi.org/10.1038/s41598-020-80757-5

License: GNU Affero General Public License v3.0

Makefile 1.87% Shell 7.78% C++ 89.98% Dockerfile 0.37%

ngs correction long-reads long reads self-correction self msa multiple-sequence-alignment consensus contigs assembly polishing

consent's Introduction

CONSENT

CONSENT (Scalable long read self-correction and assembly polishing with multiple sequence alignment) is a self-correction method for long reads. It works by, first, computing overlaps between the long reads, in order to define an alignment pile (i.e. a set of overlapping reads used for correction) for each read. Each read's alignment pile is then further divided into smaller windows, that are corrected idependently. First, a multiple alignment strategy is used in order to compute consensus. Then, this consensus is further polished with a local de Bruijn graph, in order to get rid of the remaining errors. Additionally to error correction, CONSENT can also perform assembly polishing.

Requirements

A Linux based operating system.
Python3.
g++, minimum version 5.5.0.
CMake, minimum version 2.8.2.
Minimap2 available through you path.

Installation

Clone the CONSENT repository, along with its submodules with:

git clone --recursive https://github.com/morispi/CONSENT

Then run the install.sh script:

./install.sh

If you do not already have minimap2 available through your path, you can then run:

export PATH=$PWD/minimap2:$PATH

CONSENT should then be able to run.

Getting started

An example dataset (10x of simulated PacBio reads, raw assembly, and reference genome) is provided in the example folder.

Please run the following commands to try out CONSENT on this example.

Self-correction

To perform self-correction on the example dataset, run the following command:

./CONSENT-correct --in example/reads.fasta --out example/correctedReads.fasta --type PB

This should take about 2 min and use up to 750 MB of RAM, using 4 cores.

Polishing

To perform assembly polishing on the example dataset, run the following command:

./CONSENT-polish --contigs example/rawAssembly.fasta --reads example/reads.fasta --out example/polishedAssembly.fasta

This should take about 15 sec and use at most 150 MB of RAM, using 4 cores.

Running CONSENT

Self-correction

To run CONSENT for long reads self-correction, run the following command:

./CONSENT-correct --in longReads.fast[a|q] --out result.fasta --type readsTechnology

longReads.fast[a|q]: fasta or fastq file of long reads to .
result.fasta: fasta file where to output the corrected long reads.
readsTechnology: Indicate whether the long reads are from PacBio (--type PB) or Oxford Nanopore (--type ONT)

Polishing

To run CONSENT for assembly polishing, run the followning command:

./CONSENT-polish --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta

contigs.fast[a|q]: fasta or fastq file of contigs to polish.
longReads.fast[a|q]: fasta or fastq file of long reads to use for polishing.
result.fasta: fasta file where to output the polished contigs.

Options

  --windowSize INT, -l INT:      Size of the windows to process. (default: 500)
  --minSupport INT, -s INT:      Minimum support to consider a window for correction. (default: 4)
  --maxSupport INT, -S INT:      Maximum number of overlaps to include in a pile. (default: 150)
  --maxMSA INT, -M:              Maximum number of sequences to include into the MSA. (default: 150)
  --merSize INT, -k INT:         k-mer size for chaining and polishing. (default: 9)
  --solid INT, -f INT:           Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
  --anchorSupport INT, -c INT:   Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
  --minAnchors INT, -a INT:      Minimum number of anchors in a window to allow consensus computation. (default: 2)
  --windowOverlap INT, -o INT:   Overlap size between consecutive windows. (default: 50)
  --nproc INT, -j INT:           Number of processes to run in parallel (default: number of cores).
  --minimapIndex INT, -m INT:    Split minimap2 index every INT input bases (default: 500M).
  --tmpdir STRING, -t STRING:    Path where to store the temporary overlaps file (default: working directory, as Alignments_dateTimeStamp.paf).
  --help, -h:                    Print this help message.

Notes

CONSENT has been developed and tested on x86-64 GNU/Linux.
Support for any other platform has not been tested.

Authors

Pierre Morisse, Camille Marchet, Antoine Limasset, Arnaud Lefebvre and Thierry Lecroq.

Reference

Morisse, P., Marchet, C., Limasset, A. et al. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 11, 761 (2021). https://doi.org/10.1038/s41598-020-80757-5

Contact

You can report problems and bugs to pierre[dot]morisse[at]inria[dot]fr

consent's People

Contributors

Stargazers

Watchers

Forkers

francesco-peverelli pythseq wkusmirek wangdi2014

consent's Issues

CONSENT-correct: line 188

When correcting a Sample of Sup High Accuracy Basecalled Sequences (2M) I ran into this Problem:

/mnt/d/Dropbox/User/Projects/Sequencing/TCR_Analysis/TCR_Analysis_Tools/CONSENT/CONSENT-correct --in /mnt/d/Dropbox/User/Sequencing_Data/TCR/Samp9/Samp9_sup_basecalled/Sup_Sample/Samp9_sample_merge.fastq --out /mnt/d/Dropbox/User/Projects/Sequencing/Samples/SpaTCR/Samp9/CONSENT/Sup_BC_Sample/Samp9_sample_merge.fasta --type ONT
[Thu Jul 8 14:14:19 CEST 2021] Overlapping the long reads (minimap2)
/mnt/d/Dropbox/User/Projects/Sequencing/TCR_Analysis/TCR_Analysis_Tools/CONSENT/CONSENT-correct: line 188: 1581 Killed minimap2 -k15 -w5 -m100 -g10000 -r2000 --max-chain-skip 25 --dual=yes -PD --no-long-join -t"$nproc" -I"$minimapMemory" "$reads" "$reads" > $tmpdir/"$alignments" 2> $tmpdir/"$minimapErrlog"

Do I understand correctly that minimap runs into a Ram Problem? Is there any way to limit minimaps Ram consume or to bypass this?

CONSENT-polish error: "std::out_of_range"

Hello, I tried using CONSENT-polish on an assembly I just made, and the reads mapped successfully, but during the polishing step, I got this output:

[seg nov 18 19:14:22 -03 2019] Aligning the long reads to the contigs (minimap2)
[M::mm_idx_gen::15.9451.76] collected minimizers
[M::mm_idx_gen::18.3353.14] sorted minimizers
[M::main::18.3353.14] loaded/built the index for 127 target sequence(s)
[M::mm_mapopt_update::20.3232.93] mid_occ = 68
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 127
[M::mm_idx_stat::21.3452.84] distinct minimizers: 92021864 (66.90% are singletons); average occurrences: 1.726; average spacing: 2.921
[M::worker_pipeline::50.2847.54] mapped 29353 sequences
[M::worker_pipeline::67.9769.78] mapped 52649 sequences
[M::worker_pipeline::80.97010.97] mapped 116369 sequences
[M::worker_pipeline::95.15211.70] mapped 124348 sequences
[M::worker_pipeline::110.22212.20] mapped 130275 sequences
[M::worker_pipeline::124.94312.63] mapped 128853 sequences
[M::worker_pipeline::139.30612.98] mapped 127235 sequences
[M::worker_pipeline::152.57413.37] mapped 126839 sequences
[M::worker_pipeline::166.22713.65] mapped 127379 sequences
[M::worker_pipeline::180.04713.83] mapped 127576 sequences
[M::worker_pipeline::192.86814.03] mapped 125926 sequences
[M::worker_pipeline::206.34414.18] mapped 123109 sequences
[M::worker_pipeline::219.05414.30] mapped 124745 sequences
[M::worker_pipeline::232.38014.37] mapped 123676 sequences
[M::worker_pipeline::249.81914.34] mapped 146939 sequences
[M::worker_pipeline::264.53514.44] mapped 147196 sequences
[M::worker_pipeline::279.35214.52] mapped 145912 sequences
[M::worker_pipeline::294.14114.60] mapped 153104 sequences
[M::worker_pipeline::309.65714.66] mapped 140964 sequences
[M::worker_pipeline::325.49914.74] mapped 107461 sequences
[M::worker_pipeline::341.47314.85] mapped 36040 sequences
[M::worker_pipeline::358.33014.95] mapped 50015 sequences
[M::worker_pipeline::372.50015.03] mapped 140480 sequences
[M::worker_pipeline::386.61215.08] mapped 147703 sequences
[M::worker_pipeline::404.67715.04] mapped 138115 sequences
[M::worker_pipeline::418.09415.12] mapped 133603 sequences
[M::worker_pipeline::432.20615.19] mapped 132248 sequences
[M::worker_pipeline::449.49815.30] mapped 149423 sequences
[M::worker_pipeline::462.89915.33] mapped 147904 sequences
[M::worker_pipeline::479.87315.42] mapped 143469 sequences
[M::worker_pipeline::491.89815.46] mapped 140355 sequences
[M::worker_pipeline::507.33715.49] mapped 135256 sequences
[M::worker_pipeline::526.41715.58] mapped 130704 sequences
[M::worker_pipeline::537.43415.59] mapped 132700 sequences
[M::worker_pipeline::553.27115.60] mapped 125887 sequences
[M::worker_pipeline::567.98915.64] mapped 130033 sequences
[M::worker_pipeline::582.50315.68] mapped 124767 sequences
[M::worker_pipeline::594.313*15.70] mapped 107302 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t24 -I1G SRR10150407_1_assembly.fasta tmp/18Gb/SRR10150407_1_merged.fq.gz
[M::main] Real time: 594.691 sec; CPU: 9333.764 sec; Peak RSS: 7.730 GB
[seg nov 18 19:24:17 -03 2019] Sorting the overlaps
[seg nov 18 19:38:09 -03 2019] Polishing the contigs
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 55) > this->size() (which is 0)
/home/src/clazosky/miniconda3/envs/CONSENT/CONSENT/CONSENT-polish, linha 201: 27324 Abortado (imagem do núcleo gravada) $LRSCf/bin/CONSENT-polishing -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$contigs" -R "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

Any ideas on how to solve?

EDIT: The output file is empty, the sorted_exploded is present

HiFi performance?

Hi,

Has CONSENT been tested on HiFi long read data?

Thanks,
Giulia

paf file-size estimate

Hi,

I am excited to test CONSENT with a nanopore dataset of about 60x of a 600Mb genome. Its about 2.8 mio reads (41 Gb total length). Unfortunately, all-vs-all alignments expands very fast and i had to terminate after paf file reached 2.1 Terabyte.
Is there a size estimate what is needed as temporary storage size for such a dataset?

In your bioarxiv publication, you ran CONSENT on 30x human data, what was the file-size of all-vs-all alignments there?

Cheers,
Michel

terminate called after throwing an instance of 'std::out_of_range'

Getting this error on the "correcting the long reads" step:

[M::mm_idx_gen::22.928*1.52] collected minimizers
[M::mm_idx_gen::25.269*2.72] sorted minimizers
[M::main::25.270*2.72] loaded/built the index for 456779 target sequence(s)
[M::mm_mapopt_update::28.654*2.52] mid_occ = 283
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 456779
[M::mm_idx_stat::30.533*2.43] distinct minimizers: 73328105 (64.06% are singletons); average occurrences: 2.303; average spacing: 2.960
[M::worker_pipeline::294.408*7.22] mapped 456772 sequences
[M::worker_pipeline::458.370*4.73] mapped 293826 sequences
[M::mm_idx_gen::474.209*4.63] collected minimizers
[M::mm_idx_gen::475.271*4.67] sorted minimizers
[M::main::475.272*4.67] loaded/built the index for 293819 target sequence(s)
[M::mm_mapopt_update::475.272*4.67] mid_occ = 283
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 293819
[M::mm_idx_stat::475.851*4.67] distinct minimizers: 48724457 (70.21% are singletons); average occurrences: 2.138; average spacing: 2.960
[M::worker_pipeline::749.425*5.49] mapped 456772 sequences
[M::worker_pipeline::926.878*4.49] mapped 293826 sequences
[M::main] Version: 2.14-r894-dirty
[M::main] CMD: /genomics/users/irubia/tools/CONSENT/minimap2/minimap2 -k15 -w5 -m100 -g10000 -r2000 --max-chain-skip 25 --dual=yes -PD --no-long-join -t24 -I500M /genomics/users/irubia/runs/Hopkins_1_new_iso/Hopkins1.gte500.renamed.fa /genomics/users/irubia/runs/Hopkins_1_new_iso/Hopkins1.gte500.renamed.fa
[M::main] Real time: 928.346 sec; CPU: 4164.746 sec; Peak RSS: 10.984 GB
[Tue Feb 19 16:39:57 CET 2019] Correcting the long reads
terminate called after throwing an instance of 'std::out_of_range'
  what():  stoi
/genomics/users/irubia/tools/CONSENT/CONSENT-correct: line 181:  4891 Aborted                 $LRSCf/bin/CONSENT -i $tmpdir/"$PAFIndex" -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

Dataset is cDNA from the Nanopore WGS Consortium (https://github.com/nanopore-wgs-consortium/NA12878).
I filtered the dataset so that the shortest read is 500bp. Tried running with default parameters, with --windowSize 500 and --windowSize 499

aborted on example data

Hi,

Installation of CONSENT and dependencies completed without any warnings or issues on a GNU/Linux VM:
Linux ip-172-26-3-204 5.10.0-19-cloud-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux

but the example failed with the following error:

./CONSENT-correct --in example/reads.fasta --out example/correctedReads.fasta --type PB
[Fri Jan 13 16:19:16 UTC 2023] Overlapping the long reads (minimap2)
[Fri Jan 13 16:19:25 UTC 2023] Correcting the long reads
free(): invalid next size (normal)
./CONSENT-correct: line 202: 321161 Aborted $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

Any help would be appreciated!

RAM consumption of CONSENT-correct

Hello,

I am using CONSENT correct on my nanopore reads. I have a library of 800k reads and the RAM consumption of the fpa index process gets extremely high when I run CONSENT-correct on it <350GB. However, if I split my library into 4 subsets, I get RAM usage below 20GB. Is this normal ?

install.sh error: no explode and merge file in bin

When I tried to install CONSENT, the following error was output.

/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: no match for call to \u2018(getAnchors(robin_hood::unordered_map<unsigned int, unsigned int>&, std::string, std::string, unsigned int, unsigned int)::__lambda8) (std::pair<std::basic_string<char>, std::basic_string<char> >&, const std::pair<std::basic_string<char>, std::basic_string<char> >&)\u2019
    while (__comp(*__first, __pivot))

/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: error: no match for call to \u2018(getAnchors(robin_hood::unordered_map<unsigned int, unsigned int>&, std::string, std::string, unsigned int, unsigned int)::__lambda8) (const std::pair<std::basic_string<char>, std::basic_string<char> >&, std::pair<std::basic_string<char>, std::basic_string<char> >&)\u2019
    while (__comp(__pivot, *__last))

As a result, there is not explode and merge in bin directory.
Any ideas for solving this problem?

* Error in `/..... / CONSENT/bin/CONSENT-correction': free(): invalid pointer: 0x00002aaab0019df0 *

Hi,

Thanks for what appears to be awesome software. I am trying to run Consent-correct on a Pacbio CCS.fastq file. I have installed Consent and its dependencies, and I have tried running the command on both a local Linux system and our HPC.

When I run this command on the HPC:

module load python3/3.7.2
module load minimap2
module load gcc/7.2.0
module load cmake/3.19.4
cd / ..... /CCS
/..... /CONSENT/CONSENT-correct --in CCSQ40.clean.fastq --out CCSQ40.clean.correct.fasta --type PB

I get the following error:
[Thu Nov 25 01:00:07 EST 2021] Overlapping the long reads (minimap2) [Thu Nov 25 01:00:08 EST 2021] Correcting the long reads Error in /GWSPH/home/ndreyer/software/CONSENT/bin/CONSENT-correction': free(): invalid pointer: 0x00002aaab0019df0

The results file is empty obviously.

A snippet of my fastq file:
@m54057_190926_040405/27722242/ccs GAATTGCGATGAATCTAATCAAAACATCAATCGCTCATTCATCCTTGAAATATACTGAATCACTGCTGAGCAATCCACCGGTATACCGTATGCAGTTTCAAACATCGCACACCGTCCCATGGAGGACCATTCACATGGAATTGAGGAACATTGAAACAACAGTCCTCGAACGGTGAGGCCGTGCGCTCTGATATTCGTTGGTCCCGATCTGTCGACGAGAATGTTAGAAGGGAACGAGATTCAGCAGTTCCCACACGCATCAACGTCGCTTCGCTGATGAACTATCGCCACTCCTTGCAACAGGGAACTACCAGAACCCACTGCACCCTGTAGCCCAATTTCACCGCACGCTGGGGCCACGCGACAGAGAGAGGGAGATTTCAATCCTCAAACACACGCATGCATACACCGATGCGACATATTTCAGTATCGGTATTGACCAAAGCTCAATCCCTGACGTATCAGTGTTCGTTTATCATACTGTGTAACGATTCAGCCATTATGCATATGTACTGACTATCGGTATTTATTGAAGCGTTGATTTTATGTAAGTTGTGTCTGTTATTTATTGAAGCGTTTACGTATGCTGTATTCGTTATTTAGACTCTCAATAGAGTTTTGTTGACAGATCTAACAAACGTTTATACGATTACAATTGATATATAGTCAATATAAAGTCATCGAACGAAGAATACTTTCATTGATTCGACACTGGCACTGCATGTTCACTTTTCTATCACGAAACAACAAGTTTCGTTTGAAAAGCGCATGCCGGCGCACACAAATGAACAACAGGTCGCCACTCCATCCATCATTGGATGGCCAACGCCAATGTGTGTTCCGTCTATTTCACATCGACACGGAAGTAGCTTTGAATTTCAAACAGAACGCTCTCCCTGGTTGGGCTGAGGCATGCCACCAGAGGCCTCCATTTGACACTCCCTCCTCCCCCACACGACAATACACAGACTCCCCCCCCCCGCTGAATTTCAACACTGTTCATCGTCCGTGTCGTCTTCGGCGTTGCTGCTCGACGCCTGCGGTGGCAGCTGGAACGGCGGCGCCACAGTCGCCGTCGCCTGGACCTGGAACTGCTGCATCAGCTGCTGCTCCGCCATGGCTTGCTCCTCCCGCGCTTTGGCGAAAAGCTCCTGCTGTTGTCGCCAGAGCTCCTTTTCGGGTATGCCGAGATGTTCCAGGCGGGAGCTGTGCTTCCGGCGCTGCGCGGCCACCGCCTTGCAGTCTTCCACCACCTCCTCGGCGTCCTTCCTGTAGTCTCCGAACCCCAGCCTGTCCAGAGCTGAAAGAACGTGCTCTGCGTTGATGGTCTTCTTCTCCTGTTCATTGCATATGTCGTTCGCCTCGGAGGCTATCAAGTGGATGAACTCGGAGCAGCAGTTCAGTATCACTTCTCGCACTTGGTTCGCCACGCGTATGTTGGGCAGAATCTCTTTAATCATCTTGTTCATGGCCGCCCGCGGTATGGTTAACTCGTCGTCTGCCAGCTCCGGGGCGGTCGCCATGGCAGCGAGGGGGCGAGAACGACGCGCGCTCACGCTCCTAGCATCGACTGTGTGCAACAATACTTGCCTTCATATCGTCACACTCCGTGCGCTTATGAAAAATCCTATACGTCCCCAACCCTGCGACTTCA + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

I get the same error when running it locally on my Linux system. When I run the same command but with Consent as a module I get the error "CONSENT-correct: command not found", but that must be an error that we need to fix with the HPC team.

Any idea whether this is an error related to my reads or maybe an installation problem? Open to any suggestions.

Thanks a lot and best,
Niklas

compilation

Interesting.

Any ideas why it's not picking up zlib.h despite the package being present?


sudo apt install zlib1g-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
zlib1g-dev is already the newest version (1:1.2.8.dfsg-2ubuntu4.1).


(base) rcug@hpc01:/mnt/ngsnfs/tools/CONSENT$ sudo apt install zlib1g-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
zlib1g-dev is already the newest version (1:1.2.8.dfsg-2ubuntu4.1).
0 upgraded, 0 newly installed, 0 to remove and 71 not upgraded.
(base) rcug@hpc01:/mnt/ngsnfs/tools/CONSENT$ bash install.sh 
make: Nothing to be done for 'default'.
make: 'poa' is up to date.
make: Nothing to be done for 'all'.
/mnt/ngsnfs/tools/miniconda3/bin/x86_64-conda_cos6-linux-gnu-cc -c -g -Wall -O2 -Wc++-compat  -DHAVE_KALLOC  bseq.c -o bseq.o
bseq.c:1:10: fatal error: zlib.h: No such file or directory
 #include <zlib.h>
          ^~~~~~~~
compilation terminated.
make: *** [Makefile:29: bseq.o] Error 1

35 days and going - is it normal to take this long?

Hello,

This is my first time running CONSENT. My input file is raw, ONT genome reads from the tick D. reticulatus (27,098,808 reads). Below is my submission script. The script has been running for 35+ days so far. Do you think this is normal?

#!/bin/bash
#SBATCH --job-name=CONSENT
#SBATCH --partition=iob_p
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=500gb
#SBATCH --export=NONE
#SBATCH --time=30-
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --mail-user=[email protected]
#SBATCH --mail-type=BEGIN,END,FAIL

ml CMake/3.26.3-GCCcore-12.3.0 minimap2/2.26-GCCcore-12.2.0

raw_reads='/scratch/kcd88651/ticks/Dermacentor_Reticulatus/raw_reads/4_R9R10_G638_combo.fastq'

git clone --recursive https://github.com/morispi/CONSENT

cd CONSENT
./install.sh

cd CONSENT
./CONSENT-correct --in $raw_reads --out Dr_4_R9R10_G638_combo_CONSENTc.fasta --type ONT

Error in bin/explode

Hi,

I am running CONSENT to correct some Nanopore reads and I encountered some errors.

First, I tried running directly in the command line in Linux with 4 cores and 4 GB each with the following command:
./CONSENT-correct -j 8 --in /users/PHS0338/jpac1984/local/src/Porechop/DemulT7BAR.fasta --out demult.fasta --type ONT

and the logs were:
[M::worker_pipeline::831.184*0.97] mapped 93033 sequences
[M::main] Version: 2.17-r974-dirty
[M::main] CMD: minimap2 -k15 -w5 -m100 -g10000 -r2000 --max-chain-skip 25 --dual=yes -PD --no-long-join -t8 -I1G /users/PHS0338/jpac1984/local/src/Porechop/DemulT7BAR.fasta /users/PHS0
338/jpac1984/local/src/Porechop/DemulT7BAR.fasta
[M::main] Real time: 831.641 sec; CPU: 806.339 sec; Peak RSS: 8.466 GB
[Tue Dec 8 20:56:10 EST 2020] Sorting the overlaps
./CONSENT-correct: line 189: /fs/scratch/PHS0338/appz/CONSENT/bin/explode: No such file or directory

Then, since I have a bigger dataset, I submitted to the cluster,
#!/bin/bash
#SBATCH --mem=120gb
#SBATCH --ntasks=28
#SBATCH --job-name=CONSENT-ONT-VALID
#SBATCH --time=12:00:00
#SBATCH --account=PHS0338
export PATH=$PWD/minimap2:/fs/scratch/PHS0338/appz/minimap2
./CONSENT-correct -j 28 --in /fs/scratch/PHS0338/data/ONTq_combine.fasta --out ONT-V-corr.fasta --type ONT
and I got the following error:
./CONSENT-correct: line 39: nproc: command not found

I want to use all the cores requested.

Thank you very much;

CONSENT-correct: line 202: 96410 Segmentation fault

Hi,
See below for a segmentation fault that occurred while trying to error correct pacbio reads. Data can be found at ncbi under the sra accession: SRR8499555. Any thoughts or help is appreciated.

./CONSENT-correct \
> --nproc 30 \
> --in /home/jon/Working_Files/sea_cuke_species_data/stichopus_chloronotus/pacbio/SRR8499555.oneline.fastq \
> --out SRR8499555.oneline.error_corrected.fastq \
> --type PB
[Sun 16 Jan 2022 12:08:43 PM PST] Overlapping the long reads (minimap2)
[Sun 16 Jan 2022 12:32:48 PM PST] Processing the overlaps
[Sun 16 Jan 2022 12:37:17 PM PST] Correcting the long reads
./CONSENT-correct: line 202: 96410 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

it also fails on the initial test

./CONSENT-correct --in example/reads.fasta --out example/correctedReads.fasta --type PB
[Mon 17 Jan 2022 12:18:31 PM PST] Overlapping the long reads (minimap2)
[Mon 17 Jan 2022 12:18:38 PM PST] Correcting the long reads
free(): invalid pointer
./CONSENT-correct: line 202:  6992 Aborted                 (core dumped) $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

problem with CONSENT

Dear Developer, I get this error. Can you please help me?

lfaino@LabReverberiPT /nanopore/data/MLST_3_samples/demultiplex $ ~/bin/CONSENT/CONSENT-correct --in barcode10_flipflop_barcode.fastq --minSupport 100 --out barcode10_flipflop_barcode.fasta --type ONT
[Fri Feb 15 20:50:02 CET 2019] Self-aligning the long reads (minimap2)
error: Found argument '-f' which wasn't expected, or isn't valid in this context

USAGE:
fpa [FLAGS] [OPTIONS] [ARGS]

For more information try --help
[M::mm_idx_gen::0.1091.01] collected minimizers
[M::mm_idx_gen::0.1371.87] sorted minimizers
[M::main::0.1371.87] loaded/built the index for 9600 target sequence(s)
[M::mm_mapopt_update::0.1451.82] mid_occ = 448
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 9600
[M::mm_idx_stat::0.151*1.79] distinct minimizers: 592279 (89.77% are singletons); average occurrences: 2.130; average spacing: 3.054>

Cheers
Luigi

CONSENT/bin/explode: No such file or directory

member questions in bmean.cpp

Hi:
I have followed your installation process. Environment includes python3.8, zlib1.2.12, cmake3.19.1, and g++10.2.0. Then I encountered the following problems during execution of ./install. Have you ever encountered them during your development?
"g++ -o bmean.o -c bmean.cpp -O3 -std=c++11 -lpthread -Ispoa/include -Ispoa/include/ -Lspoa/build/lib/ -lspoa
bmean.cpp: In function ‘std::vector<std::__cxx11::basic_string > consensus_SPOA(std::vector<std::__cxx11::basic_string >&, unsigned int, std::string)’:
bmean.cpp:586:32: error: ‘createAlignmentEngine’ is not a member of ‘spoa’; did you mean ‘AlignmentEngine’?
586 | auto alignment_engine = spoa::createAlignmentEngine(static_castspoa::AlignmentType(0), 5, -10, -4, -4);
| ^~~~~~~~~~~~~~~~~~~~~
| AlignmentEngine
bmean.cpp:588:21: error: ‘createGraph’ is not a member of ‘spoa’
588 | auto graph = spoa::createGraph();
| ^~~~~~~~~~~
make: *** [bmean.o] Error 1"
Looking forward to your reply gratefully!
Wang Rongshu

Inflated size of corrected reads compared to input fasta

Hello Pierre,

I am running CONSENT-correct on a 20x PacBio dataset for a 1Gb genome. The version was cloned from your git repository on the 18th Feb 2021 (i.e. the most current version).

It has yet to complete, but in the process of trying to figure out how close to done it might be, I have been checking the output.
There are 3.2M uncorrected reads in the dataset but over 13M corrected reads so far written to the corrected.fasta. This is not simply a case of reads being split as there are 23Gb of sequence in the input dataset and >82Gb in the output.

I have seen some indications in the issues thread that this behaviour has been seen before but I would value your opinion on if/how I can salvage something from this run (or how to avoid this problem on repeating).

I checked for header uniqueness by sorting output and running uniq and find that the inflation can be explained by this.

Many thanks,
Annabel

Process takes too long - how to filter PAF

The whole process takes too long with my data. At least for a first try.

What I would do is to filter the PAF file for column 7 (target length, > 20'000bp) and then to process the next steps manually/modifying the script.

Is it correct to filter PAF according to target length? I guess this way I will only polish the reads above my threshold of 20'000. Is this in accordance with the downstream processing? I guess the reads which have no hits in the PAF (as target) will just get dropped?

CONSENT-correct problem

Hi,

I got some problem when running CONSENT-correct.

My error log is shown below:

Self-aligning the long reads (minimap2)
[M::mm_idx_gen::12.180*1.76] collected minimizers
[M::mm_idx_gen::13.815*3.11] sorted minimizers
[M::main::13.815*3.11] loaded/built the index for 262860 target sequence(s)
[M::mm_mapopt_update::14.927*2.95] mid_occ = 599
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 262860
[M::mm_idx_stat::15.697*2.86] distinct minimizers: 58840350 (70.40% are singletons); average occurrences: 2.953; average spacing: 2.877
[M::worker_pipeline::1241.866*2.38] mapped 262855 sequences
[M::worker_pipeline::2840.264*1.57] mapped 257111 sequences
thread 'main' panicked at 'Trouble during read of input: Error(Deserialize { pos: Some(Position { byte: 135947718566, line: 872754811, record: 872754810 }), err: DeserializeError { field: None, kind: Message("invalid length 6, expected a tuple of size 12") } })', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Cheers,
Windz

Error during correction

Hi,
Hope you are doing great.
I did a quick Nanopore run, and I got some data that I would like to correct.
I am running CONSENT and it is failing after changing different parameters, like -j or -k.

Error:
"
CONSENT-correct --type ONT --in Bc.04.fasta --out Bc.04.corr.fasta
[Mon Sep 25 13:31:55 EDT 2023] Overlapping the long reads (minimap2)
[Mon Sep 25 13:32:19 EDT 2023] Correcting the long reads
/home/juaguila/.conda/envs/consent/bin/CONSENT-correct: line 202: 106441 Illegal instruction CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"
"

My dataset stats are: min: 43, max: 134,709, average: 6.642, and about 35,464 sequences, size: 238M.

Any reason why CONSENT is failing?
I base-called it with Guppy, then merged the gz files, and then converted the fastq with seqkit to fasta.

Thanks;

Segmentation fault during polishing step

Hi,

I have been recently attempting to polish a draft (human genome) constructed from a PromethION sample. However, at the polishing step, it returns a segmentation fault. Any suggestions on fixing this?

Command used:

CONSENT-polish  --contigs $DRAFT --reads $READS  --out $OUTPUT  --nproc 64 -m 50G

Stdout:

[Wed Sep 16 14:07:21 AEST 2020] Aligning the long reads to the contigs (minimap2)
[Wed Sep 16 16:28:59 AEST 2020] Sorting the overlaps
[Thu Sep 17 20:11:04 AEST 2020] Polishing the contigs

stderr:

[M::mm_idx_gen::84.924*1.86] collected minimizers
[M::mm_idx_gen::92.015*3.89] sorted minimizers
[M::main::92.015*3.89] loaded/built the index for 3855 target sequence(s)
[M::mm_mapopt_update::96.598*3.76] mid_occ = 667
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 3855
[M::mm_idx_stat::99.125*3.69] distinct minimizers: 165963083 (35.72% are singletons); average occurrences: 5.831; average spacing: 2.922
[M::worker_pipeline::174.006*15.81] mapped 87711 sequences
......
[M::worker_pipeline::8492.457*27.15] mapped 25672 sequences
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t64 -I50G assembly.fasta pass.fastq
[M::main] Real time: 8495.410 sec; CPU: 230553.598 sec; Peak RSS: 37.940 GB
CONSENT-polish: line 203: 39123 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT-polishing -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$contigs" -R "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"
Command exited with non-zero status 139

fpa

Hi I am trying to run CONSENT-correct buy I run in the following error:

${consent} --in ${input_directory} --out ${output}/test_consent.fasta --type ONT

Self-aligning the long reads (minimap2)
[ERROR] failed to open file 'i-k15'
error: Found argument '-' which wasn't expected, or isn't valid in this context

USAGE:
fpa [OPTIONS] [SUBCOMMAND]

For more information try --help

CONSENT-polish fails at sort step

CONSENT v1.2.2

Hi,

Thank you for this tool.

I'm struggling to use CONSENT on a cluster, and the latest error I get is at the sort step :

Error log

[M::main] Version: 2.17-r941
[M::main] CMD: /home/nguiglie/Tools/minimap2-2.17_x64-linux/minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t30 -I1G raven_default.min100kb.assembly_1.fasta reads.pacbio.01.fasta
[M::main] Real time: 7419.892 sec; CPU: 179145.556 sec; Peak RSS: 18.149 GB
sort: erreur d'écriture: Erreur d'entrée/sortie

Output log

[sam. nov. 23 23:20:46 CET 2019] Aligning the long reads to the contigs (minimap2)
[dim. nov. 24 01:24:34 CET 2019] Sorting the overlaps

Sort version: sort (GNU coreutils) 8.22

Infos about output files:

124G 24 nov. 01:47 exploded_66171_2
121G 24 nov. 02:08 exploded_66171_3
115G 24 nov. 04:00 sorted_exploded_66171_1
19G 24 nov. 06:01 sorted_exploded_66171_2
359G 24 nov. 01:24 tmp_Alignments_66171.paf

Have you got any idea what could be the problem?
Also, it would be convenient to be able to restart the program.

long read files input format problem

I used CONSENT v2.0 to do nanopore long reads correction, but I encountered some problem all errors are core dump.
I traced the source code and found the function indexReads in utils.cpp is the problem.
This function read readsfile in the pattern "1 header 1 seq", e.g,
>seq001
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGGAAATAAAGTAAATTTTTGTTGTTGTACTTCGTTCAGTTTGGGTGTTTAACCAGATGTCGCCTACCGTGACAAGAAAGTTGAAAGAAAATAAGAAAATACGGCGCTGTCGCGGTTCGAACCACAGACCTTGACCCCCAGCAATATCAGCACCAACGAAACACAAGACACCGACAACTTTCTTGTC
So I modified my fasta format like the pattern, CONSENT worked.
My original fasta file format(60 char per line every sequence):
>seq001
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGGAAATAAAGTAAATTTTTGTT
GTTGTACTTCGTTCAGTTTGGGTGTTTAACCAGATGTCGCCTACCGTGACAAGAAAGTTG
AAAGAAAATAAGAAAATACGGCGCTGTCGCGGTTCGAACCACAGACCTTGACCCCCAGCA
ATATCAGCACCAACGAAACACAAGACACCGACAACTTTCTTGTC
But when I correcting another nanopore long reads file it was a new problem minimap2 core dump.
I tried to run minimap2 with original format was working, but the CONSENT will core dump
first file is 1G, modified format all good
second file is 13G, modified format minimap2 core dump

If CONSENT can be downloaded from conda (my system gcc is too low)

./install.sh error: undefined reference

Hello. When running ./install.sh I get a very long message, ending with:
collect2: error: ld returned 1 exit status make: *** [CONSENT-correction] Error 1

When tracing through it, the message includes the following several times:
DBG.cpp:(.text+0x78d): undefined reference to str2num correctionDBG.cpp:(.text+0xc04): undefined reference to str2num

I downloaded the latest version of Consent, and have Python v. 3.8.2 and Gcc v.8.3.0
What could be causing this issue? Thanks in advance

Illegal Instruction in CONSENT-polish

Hello Pierre,

I am running CONSENT-polish on a contig of ~10Mbp. I installed CONSENT v2.2.2 with conda. The commend was:
CONSENT-polish --contigs $contig.fa --reads $read.fastq --out polished.fa

The alignment and sorting was successful. But I got an error at polishing step saying illegal instruction of CONSENT-polishing:

/data/user/maggic/anaconda2/envs/consent/bin/CONSENT-polish: line 197: 23038 Illegal instruction     CONSENT-polishing -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$contigs" -R "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

Then I checked the CONSENT-polishing with CONSENT-polishing -h and I noticed there is no options like -r or -R, and the -m option seems not matching:

CONSENT-polishing -h
Usage: CONSENT-polishing [-a alignmentFile.paf] [-k merSize] [-s minSupportForGoodRegions] [-l minLengthForGoodRegions] 
[-f freqThresholdForKMers] [-e maxError] [-p freqThresholdForKPersFreqs] [-c freqThresholdForKPersCons]
 [-m mode (0 for regions, 1 for cluster)] [-j threadsNb]

So I tried manually starting polishing step by CONSENT-polishing -a Alignments_22661.paf >> polished.fa , and I got segmentation fault error:

slurm_script: line 37: 25624 Segmentation fault      CONSENT-polishing -a Alignments_22661.paf >> polished.fa

The job failed after only 8 seconds and the memory usage was 1.64MB (I requested over 90GB) so I don't think it was a memory issue. Then I tried providing the contig (-r) and read (-R) files to CONSENT-polishing and I got the same illegal instruction error. I also tried converting reads.fastq to reads.fa and it didn't work.

One thing interesting is, I have several contigs (all are ~10Mbp in length) and corresponding reads.fastq files. I tried to polish them with same default settings and same amount of resources. Some successfully completed while others failed with the illegal instruction error. I could not find out what are different between these datasets.

I uploaded 3 failed and 1 successful contig+read files here (read_1.fasta is used for polishing contig_1.fa, etc). These are PacBio CLR reads.

Do you have any idea what might be the reason?
Thank you so much!
Maggi

malloc(): corrupted top size

I am getting the following error message. The input is attached (.txt suffix instead of .fna so that Github lets me attach the file).

niklas@niklas-Zenbook:~/code/CONSENT$ ./CONSENT-correct --in reads.fna --out example/correctedReads.fasta --type ONT
[ti 1.11.2022 14.52.44 +0200] Overlapping the long reads (minimap2)
[ti 1.11.2022 14.52.44 +0200] Correcting the long reads
malloc(): corrupted top size
./CONSENT-correct: line 202: 14609 Aborted                 (core dumped) $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

reads.txt

CONSENT-correct: line 202: 34177 Illegal instruction

I'm getting the following error during correction:

CONSENT-correct: line 202: 34177 Illegal instruction $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

It gets as far as generating the corrected reads fast file, but this is empty.

I'm running this on RHEL7

Phase retained?

Hi again,

will the phase of the reads be retained following correction ?

i.e. if I want to use the corrected reads for determining phasing and haplotypes , will this information be mixed from the alternative alleles or retained ?

I have used canu before to get nicely corrected reads, sadly the phase information is not retained.

cheers,
Colin

CONSENT-correction core dumped

Hi，when I use this fastq(https://github.com/lirr-lichen/1_testdata.git samle97.570.fastq )， with commond：CONSENT-correct --in samle97.570.fastq --out CONSENT.fasta --tmpdir ./ --nproc 12 --type ONT
I meet this：
CONSENT-correct: line 199: 297133 Segmentation fault (core dumped) $LRSCf/bin/CONSENT-correction -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

but when I use other fastq with the same commond, I do not meet this，and can run to the end。

Is there somethin wrong with this fastq?

Thank you

Error in conda version

I tried using the conda installation of CONSENT but it was failing to polish the example reads:

CONSENT-polish --contigs example/rawAssembly.fasta --reads example/reads.fasta --out example/polishedAssembly.fasta
[Tue 18 Jan 14:15:53 GMT 2022] Aligning the long reads to the contigs (minimap2)
[Tue 18 Jan 14:15:57 GMT 2022] Processing the overlaps
/mnt/shared/scratch/apps/conda/envs/CONSENT/bin/CONSENT-polish: line 193: reformatPAF: command not found

I looked at the conda bin and noticed that 'reformatPAF' was actually 'CONSENT-reformatPAF'. Renaming it to this on line 193 of CONSENT-polish fixed this error.

(Not sure if this is the right place to report errors in the conda version!)

Segmentation fault on long-read correction

I'm trying to correct a dataset of real ONT reads and I'm getting a segmentation fault error after the mapping with minimap2:

[irubia@kepler consent]$ /genomics/users/irubia/tools/CONSENT/CONSENT-correct --in /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa --out corrected.fa --type ONT
[Wed Feb 13 12:24:42 CET 2019] Self-aligning the long reads (minimap2)
[M::mm_idx_gen::5.006*1.15] collected minimizers
[M::mm_idx_gen::5.682*1.64] sorted minimizers
[M::main::5.682*1.64] loaded/built the index for 116426 target sequence(s)
[M::mm_mapopt_update::6.006*1.60] mid_occ = 473
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 116426
[M::mm_idx_stat::6.203*1.58] distinct minimizers: 8869169 (59.58% are singletons); average occurrences: 2.436; average spacing: 3.014
[M::worker_pipeline::30.410*6.06] mapped 116426 sequences
[M::main] Version: 2.14-r894-dirty
[M::main] CMD: /genomics/users/irubia/tools/CONSENT/minimap2/minimap2 -k15 -w5 -m100 -g10000 -r2000 --max-chain-skip 25 --dual=yes -PD --no-long-join -I100G -t8 /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa
[M::main] Real time: 30.458 sec; CPU: 184.468 sec; Peak RSS: 1.088 GB
[Wed Feb 13 12:25:14 CET 2019] Correcting the long reads
/genomics/users/irubia/tools/CONSENT/CONSENT-correct: line 173:  9156 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

I've tried on two different datasets and I'm getting the same error.

OS info:

[irubia@kepler consent]$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.4.1708 (Core) 
Release:	7.4.1708
Codename:	Core

[irubia@kepler consent]$ g++ --version
g++ (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[irubia@kepler consent]$ python --version
Python 3.6.2

CONSENT on clustered cDNA with small Window Size

Hi after reconsidering CONSENT-correct,

wouldn´t it make sense to use it with a small clustered Reads from a Clustering Algorithm (e.g. isONClust, RATTLE) and then correct the Reads using a small Windowsize to retain highly variable Splice Junctions?

Trying that approch gave me considerable Results but I couldn´t lower the window Size lower then 70.
Might this be related to Computation Power (I am trying it on Laptop with 64Gb Ram) or might there be a deeper Reason?

v2.2.2 install error

I download latest zip file and unzipped. When I run install.sh, error as below show:

./install.sh: line 5: ./install.sh: No such file or directory
make: *** No targets specified and no makefile found.  Stop.
g++ -std=c++11 -o src/alignmentPiles.o -c src/alignmentPiles.cpp -Wall -O3 -std=c++11
g++ -std=c++11 -o src/alignmentWindows.o -c src/alignmentWindows.cpp -Wall -O3 -std=c++11
g++ -std=c++11 -o src/reverseComplement.o -c src/reverseComplement.cpp -Wall -O3 -std=c++11
g++ -std=c++11 -o src/utils.o -c src/utils.cpp -Wall -O3 -std=c++11
g++ -std=c++11 -o src/correctionAlignment.o -c src/correctionAlignment.cpp -Wall -O3 -std=c++11
In file included from src/correctionAlignment.cpp:2:0:
src/correctionAlignment.h:4:10: fatal error: ../BMEAN/utils.h: No such file or directory
 #include "../BMEAN/utils.h"
          ^~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:47: recipe for target 'correctionAlignment.o' failed
make: *** [correctionAlignment.o] Error 1

morispi / consent Goto Github PK

consent's Introduction

CONSENT

Requirements

Installation

Getting started

Self-correction

Polishing

Running CONSENT

Self-correction

Polishing

Options

Notes

Authors

Reference

Contact

consent's People

Contributors

Stargazers

Watchers

Forkers

consent's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs