loneknightpy / idba Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 53.0 994 KB

C 0.11% Shell 2.34% C++ 95.84% Perl 0.64% Python 0.65% Makefile 0.39% M4 0.04%

idba's People

Contributors

Stargazers

Watchers

idba's Issues

bin/Makefile.am noinst_PROGRAMS

I am wondering why most of the programs of bin/Makefile.am are tagged as noinst_PROGRAMS, because they will not be installed via make install this way.

Are they not supposed to be installed?

If I execute make install I expect idba, idba_ud, etc. all to be installed.

idba_ud memory problem and crash (killed)

Using 1.1.2 and still get this problem. I have itnerleaved fasta file of 300bp reads, so have to use "-l" option in 1.1.2. Get the same error with any kmer length but 124 bp shown. The program uses all memory (80GB), then swap disks until it is 'killed' as shown (not by me). Thanks for any help. Can't use idba at all until this is fixed.

idba_ud -l ~/d/031/reads/int_fasta/2937_WT.fasta -o ~/d/031/idba/2937_WT --num_threads 29 --mink 124 --maxk 124 --step 0
number of threads 29
reads 0
long reads 1963614
extra reads 0
read_length 0
kmer 124
kmers 3906398 3897506
merge bubble 92
Killed

Only IDBA_Hybrid available

Dear Yu,

After installing release 1.1.3 according to the instructions, it seems like "idba_hybrid" is the only option available, as all the others ("idba_ud", "idba_tran" etc.) only gives "command not found". Why can this be? Are all IDBA functions now gathered in "idba_hybrid"?

Kind regards,
Even, PhD student,
University of Oslo, Norway

How to modifie the maxk(124) parameter

i want to set --maxk 200, but i don't know how to modife the limitation of maxk parameter.thank you

idba_ud produces no contigs

Ran idba_ud on a metagenomic dataset but it never produced contigs, only the following files: begin, kmer, log.

Also no error message, the only output:
number of threads 64
reads 48621072
long reads 0
extra reads 0
read_length 101
kmer 60
kmers 113414007 110213772
merge bubble 1873

Does anything else need to be ran? Was there an error?

PS. Seems like the process just terminated for whatever reason and didn't run til the end, since there was no error message it was hard to figure out what the real reason was. After I restarted the job it ran til completion.

Dockerfile not building

the links to the repos are broken because too old

new release?

Hi,

I'm attempting to package IDBA for the GNU Guix packaging system. I have 1.1.2 going, but I'm wondering if the commit tagged 1.2.0 should be packaged instead? If so, would it be possible to make a new release on GitHub please?

Thanks,
ben

throw an error when loading fastq

Can you please implement that when loading a fastq file instead of fasta that idba will throw an error instead of loading it into memory and just sits there doing nothing? I just forgot about it and I was wondering why it was not doing anything....

Unable to install. cannot find input file: `Makefile.in'

I am trying to install IDBA in a MacOS computer. I followed the following steps:

git clone https://github.com/loneknightpy/idba.git

cd idba
sudo ./build.sh

Then this is the output. Could you please point what I am doing wrong or what I need to fix the error. Thanks:

./build.sh: line 1: aclocal: command not found
aclocal.m4:21: warning: this file was generated for autoconf 2.68.
You have another version of autoconf.  It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically 'autoreconf'.
./build.sh: line 3: automake: command not found
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking for ranlib... ranlib
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking for an ANSI C-conforming const... yes
checking for working memcmp... yes
checking for vprintf... yes
checking for _doprnt... no
checking for sqrt... yes
configure: creating ./config.status
config.status: error: cannot find input file: `Makefile.in'
make: *** No rule to make target 'clean'.  Stop.
make: *** No targets specified and no makefile found.  Stop.

terminate called after throwing an instance of 'std::logic_error'

I'm running latest release 1.1.3

Here's the log file. But I noticed during the run:

distance mean -nan sd -nan
invalid insert distance. median -nan sd -nan

And then at the end it terminated with a similar error as other users. If there is any other information that would help troubleshoot please let me know.

idba_ud --num_threads 30 -l ./APAJMS011_PE150.fa -o ./ >& ./log &

Log file:

number of threads 30
reads 0
long reads 11289322
extra reads 0
read_length 0
kmer 20
kmers 27975525 28275531
merge bubble 38171
contigs: 104165 n50: 652 max: 20271 mean: 183 total length: 19092680 n80: 245
aligned 0 reads
confirmed bases: 0 correct reads: 0 bases: 0
distance mean -nan sd -nan
invalid insert distance. median -nan sd -nan
kmer 40
kmers 25034539 25146073
merge bubble 6275
contigs: 23881 n50: 2635 max: 65504 mean: 775 total length: 18511217 n80: 625
aligned 0 reads
confirmed bases: 0 correct reads: 0 bases: 0
distance mean -nan sd -nan
invalid insert distance. median -nan sd -nan
kmer 60
kmers 24223803 24284326
merge bubble 3312
contigs: 16975 n50: 3677 max: 90072 mean: 1088 total length: 18482189 n80: 732
aligned 0 reads
confirmed bases: 0 correct reads: 0 bases: 0
distance mean -nan sd -nan
invalid insert distance. median -nan sd -nan
kmer 80
kmers 22202764 22218073
merge bubble 3017
contigs: 14234 n50: 4202 max: 122653 mean: 1295 total length: 18445899 n80: 802
aligned 0 reads
confirmed bases: 0 correct reads: 0 bases: 0
distance mean -nan sd -nan
invalid insert distance. median -nan sd -nan
kmer 100
kmers 19141521 19137008
merge bubble 981
contigs: 15193 n50: 4065 max: 122693 mean: 1234 total length: 18750941 n80: 778
terminate called after throwing an instance of 'std::logic_error'
what(): SequenceReader::SequenceReader() istream is invalid

docker

The execution files are placed inside /root, which is inaccessible by non-root users, eg: when running from an execution assistant. A quick fix is to add:

RUN chmod 577 root/

to the bottom of the Dockerfile, however adding files to /root is against best practices so I don't want to submit a pull request to do that. Wondering if there are any thoughts?

fa2fq help text is the same as fq2fa

Thank you for creating and publishing this tool. It has been very useful to many people and I'd like to contribute in a small way to your project. I noticed that the help text and arguments for fa2fq is not correct and appears to be copied from fq2fa without any changes:

./bin/fa2fq 
not enough parameters
fq2fa - Convert Fastq sequences to Fasta sequences.
Usage: fq2fa tmp.fq tmp.fa [...] 
       fq2fa --paired tmp.fq tmp.fa
       fq2fa --merge tmp_1.fq tmp_2.fq tmp.fa
Allowed Options: 
      --paired                           if the reads are paired-end in one file
      --merge                            if the reads are paired-end in two files
      --filter                           filter out reads containing 'N'

This is confusing as fa2fq only accepts two positional arguments (a fasta input filepath and a fastq output filepath). None of the options in the help text are used and if you follow the usage help you'll end up with unexpected results.

IDBA-Hybrid

Hi,

When I ran fq2fa to merge two FASTQ read files to a single file I didn't get any output and also didn't receive any error message. What is the reason for it?
Another thing, does IDBA-Hybrid insert to the assembly sequences from the reference genome that aren't existed in the FASTQ reads?

Thanks

How to use IDBA-UD to perform scaffolding on assembled contigs?

Dear IDBA-UD developers,

I came across some literature where people have used IDBA-UD to perform a scaffolding step on assembled contigs from other assemblers such as MEGAHIT. It's not quite clear how it's supposed to be done from reading the IDBA-UD help page, and I am thinking of using the following code to perform the task, but not sure if it's appropriate.
idba --read merged_raw_reads.fa --read_level_2 megahit_contigs.fa --out idba_scaffolds.fa

Could you help with this? Much appreciated!

Rui

How to Install????????

I do not understand how to install this software.

After the ./build script executes, then what should I do?

idba_ud : Segmentation fault

Dear Yu Peng
Thanks for your great tool for assembly the metagenome reads. But when i assembled a bigger data than before(about 46Gb,150bp paired-end reads),i met some troubles as blow:

number of threads 50
reads 166504600
long reads 0
extra reads 0
read_length 150
kmer 95
kmers 4043406679 3973976227
merge bubble 74
contigs: 1643 n50: 144 max: 7843 mean: 167 total length: 274971 n80: 107
aligned 14418 reads
confirmed bases: 27136 correct reads: 7834 bases: 3762
kmer 40
kmers 7586101528 7531112438
merge bubble 1564262
contigs: 3213050 n50: 150 max: 20467 mean: 138 total length: 446525742 n80: 110
aligned 226248 reads
confirmed bases: 2332670 correct reads: 27184 bases: 6318
distance mean -nan sd -nan
invalid insert distance
more kmers
... ...
/opt/pbs/dispatcher/mom_priv/jobs/20145.mgmt.SC: line 14: 151550 Segmentation fault (core dumped) /share/home/yuanqingc/idba-master/bin/idba_ud -r /share/home/yuanqingc/test_data/F7.fa --mink 40 --maxk 150 --step 5 --num_threads 50 --min_contig 1000 --pre_correction -o idba_F7out/

All the local-contig-*.fa file were ampty. and my server have 96x32 Gb RAM can use.my hard disk capacity is about 1.5T
Can you give some suggestions for these troubles

Best regards
chenziwu

IDBA-UD on cluster-server

I want to assemble metagenomic data on a cluster-server.
Is it possible to use IDBA-UD in connection with MPI-Parallelization, Torque PBS or OpenMP?

Thanks,
david

Data that used to work, now killed.

Hi,
A few years ago I used idba successfully, now on a new server and every meaningful dataset I have (pe len=100bp) ends with the 'killed' statement. Troubleshooting this is difficult. Would someone mind sharing a dataset that they know works with idba v 1.1.3?
Many thanks
James

Restart Analysis?

Good Day,

Can IDBA be restarted where it last terminated?

Regards,
Allison

Ported Debian package tp Python3

Hi,
as you probably know Python2 is end of life at 2020-01-01. Thus Debian will remove all Python2 packages in the next stable release. I used 2to3 to create a patch to port the Python2 scripts in idba. Please feel free to take these over into your next release.
Kind regards, Andreas.

IDBA-hybrid --similar option default 0.95

I have a question about IDBA-hybrid and the way it reconstructs contings:
The default option in --similar is 95%. Is this number the minimum accepted to reconstruct contigs or the software will only reconstruct contigs with an exact 95% similarity.

For example, What happen if I decrease to --similar = 0.80. Will IDBA-hy consider only regions with 80% of similarity or with 'at least' 80% of similarity?

Thanks for the clarification.

Best

IDBA-Tran and IDBA-UD generate segmentation faults and "invalid insert size" error.

Hello, the first time I ran IDBA-Tran I received several "invalid insert size" errors as shown here:

The command I ran:
/home/jcornel3/tools/idba-1.1.1/bin/idba_tran -r /scratch/Xlav/RNA/data_for_build/normalizedk20.fasta -o ./ --num_threads 32 --max_isoforms 1

Log of the output:
number of threads 32
reads 116728374
long reads 0
reads 116728128
long reads 0
read_length 90
kmer 20
kmers 1251934185 1281430428
merge bubble 753628
contigs: 87112541 n50: 34 max: 1340 mean: 33 total length: 2876019569 n80: 21
contigs: 13350365 n50: 91 max: 3603 mean: 51 total length: 691769806 n80: 23
contigs: 321705 n50: 417 max: 3603 mean: 429 total length: 138028498 n80: 337
aligned 16215662 reads
confirmed bases: 116106490 correct reads: 1586511 bases: 185900
distance mean 149.455 sd 171.931
seed contigs 651896 local contigs 26700730
kmer 30
kmers 1352385351 1355369574
merge bubble 261359
contigs: 48240381 n50: 61 max: 5415 mean: 56 total length: 2721032711 n80: 33
contigs: 9744387 n50: 173 max: 8115 mean: 124 total length: 1209648933 n80: 87
contigs: 1384743 n50: 632 max: 14978 mean: 600 total length: 831992867 n80: 395
aligned 41531630 reads
confirmed bases: 325970532 correct reads: 4992476 bases: 483422
distance mean 146.052 sd 444.039
invalid insert distance
kmer 40
kmers 1314925458 1309923734
merge bubble 53515
contigs: 31990658 n50: 89 max: 5415 mean: 79 total length: 2528490281 n80: 48
contigs: 9324460 n50: 180 max: 8125 mean: 150 total length: 1407613381 n80: 92
contigs: 1665614 n50: 787 max: 14936 mean: 689 total length: 1148295112 n80: 430
aligned 49476710 reads
confirmed bases: 390329809 correct reads: 5972961 bases: 183994
distance mean 124.625 sd 330.759
invalid insert distance
kmer 50
kmers 1225819992 1218220098
merge bubble 10262
contigs: 21377438 n50: 112 max: 5415 mean: 101 total length: 2168392996 n80: 63
contigs: 7570936 n50: 206 max: 8135 mean: 180 total length: 1366986053 n80: 118
contigs: 1672231 n50: 851 max: 21594 mean: 720 total length: 1204534803 n80: 445
aligned 54056339 reads
confirmed bases: 427486046 correct reads: 6452478 bases: 106997
distance mean 149.159 sd 415.923
invalid insert distance
kmer 60
kmers 1064552217 1057789314
merge bubble 3040
contigs: 15286308 n50: 140 max: 5415 mean: 123 total length: 1887680552 n80: 76
contigs: 6281749 n50: 228 max: 7992 mean: 205 total length: 1289601622 n80: 138
contigs: 1522085 n50: 798 max: 24294 mean: 691 total length: 1052547585 n80: 431
aligned 56603526 reads
confirmed bases: 447139164 correct reads: 6715696 bases: 58809
distance mean 160.791 sd 427.802
invalid insert distance

On subsequent attempts, I tried some different settings and now I just get segmentation faults:

Command:
/home/jcornel3/tools/idba-1.1.1/bin/idba_tran -r /scratch/Xlav/RNA/data_for_build/normalizedk20.fasta -o ./ --num_threads 32 --pre_correction

Output:
number of threads 32
reads 233456750
long reads 0
reads 224304660
long reads 0
read_length 91
kmer 40
Segmentation fault (core dumped)

Command:
/home/jcornel3/tools/idba-1.1.1/bin/idba_tran -r /scratch/Xlav/RNA/data_for_build/normalizedk20.fasta -o ./ --num_threads 16

Output:
umber of threads 16
reads 233456750
long reads 0
reads 224304660
long reads 0
read_length 91
kmer 20
Segmentation fault (core dumped)

Other people have used IDBA-UD and encountered a similar issue, here is a link to the google group thread of the issue:
https://groups.google.com/forum/#!topic/hku-idba/Y-7dLOcMXvU

The other thread noted that the reads need to be formatted correctly. I'm fairly certain mine are formatted correctly since I used this dataset with SOAPdenovo-Trans and it worked well.

basic question: how does idba_ud treat even k-mer sizes (or palindromic kmers)?

Hi, this is just a basic question for understanding how IDBA-UD works in comparison to other assemblers.
Almost all de bruijn graph assemblers specifically forbid even kmer lengths to avoid palindromic kmers. However, i can run IDBA_UD with even or uneven k-mer lengths and the default values seem to be even.
How come that IDBA-UD does not seem to have any issue with even or palindromic kmers?

IDBA-UD Output Question

I am concerned that my IDBA-UD run had problems. Although the output contained a contigs.fa file, there was no scaffold.fa file generated. I changed the constant kMaxShortSequence in src/sequence/short_sequence.h to support my longer read length. Any suggestions would be appreciated, thanks.

logic_error: ShortSequence: Sequence is too long.

Hello,
I am new to IDBA and the genomics world, so I am hoping to get some help to resolve this error.
My command is:
idba_ud -r db.fa.gz -o /Results/
After a split second I get the following logic error:
terminate called after throwing an instance of 'std::logic_error'
what(): ShortSequence: Sequence is too long.

My sequencing reads are in an interleaved gzipped fasta file; between 100-200bp. In the manual, it is written that the -r accepts fasta read file (<=600). I'm assuming this is 600bp? So, I'm unsure how my reads are too long.

Thanks for any guidance!

IDBA_UD 'std::logic_error'

Different dataset with exactly same parameters set, sometimes I got error message as below, but still get a contig.fa file.

terminate called after throwing an instance of 'std::logic_error' what(): SequenceReader::SequenceReader() istream is invalid.

Do you have suggestions about it?

idba_hybrid caused 'std::logic_error'

Hi
I tried to run idba_hybrid in the following way:

$ fq2fa --paired out_mit.fq out_mit.fa
$ idba_hybrid --reference mit.fasta -l out_mit.fa -o `pwd`/idba-mit --num_threads 10 --pre_correction --maxk 124 --step 5

but I received the following error:

terminate called after throwing an instance of 'std::logic_error'
   what():  SequenceReader::SequenceReader() istream is invalid

What did I do wrong?

Thank yo in advance.

Mic

Detailed Docs?

Hi,

is there any documentation available apart from <idba_hyb/ud> without parameters?
E.g. how do the '--read_level_n' parameter work? If i had two MP lib, I'd use '--read_level_2' for the first and '--read_level_3' for the second lib? Is the MP insert size relevant for these parameters?

The command output in general is really minimal; a more detailed description of the parameters would be really great.

best,
Sven

the number of scaffold.fa is more than contig.fa

How to explain the question that the number of scaffold.fa is more than contig.fa ?

puzzled with the results of the idba_ud

Hello Yu Peng,
The idba_ud tool is published by you.It is a so useful tool that I want use in my project.However,I got some problems in the process.
My command is "idba_ud -r 01_merge.fa --pre_correction --min_contig 500 --step 10 --seed_kmer 55 --num_threads 9 --mink 52 --maxk 92 -o 02_idba_assem_2".The results include "contig-52.fa,contig-62.fa,contig-72.fa,contig-82.fa,contig-92.fa,contig.fa,scaffold.fa",I do not know which one I should use to regard as the final contigs ,in order to predict the genes.In other words,in which condition,I should use the contig.fa,in which condition I should use the scaffold.fa.
Because it make me very puzzled for a long time,I want to get some help from you.
I wish you can give me suggestions.
The best wishes to you.
I am looking forward to you reply.Thanks a lot.

Hardware requirements?

Hi,

I'm just wondering what the hardware requirements are for running this algorithm? For example, is there a minimum amount of RAM recommended etc?

low rate of input reads mapping to contigs

Hello,

I'd be grateful to anyone who could give me some advice and/or help with the following. I am running a test dataset through idba_ud. The test dataset consists of ~10 bacterial species (metagen) broken into 150bp paired-end reads. I modified my idba install as explained here http://bit.ly/1P2VMlU to accomodate the input. My usage is below, but I essentially make contigs with an input file, then use bowtie2 to align the reads from my input file back to my contig.fa output of idba. When I do so, my alignment rate is roughly ~65% which is lower than I would have imagined it would be..

Is this expected? I cannot figure out or find documentation for whether idba_ud would be throwing out reads. My input data has fake, high quality scores as to ensure that that isn't an issue. I haven't played around with the bowtie2 scoring defaults.

When I do a similar test with real metagenomic data, I get a similar alignment rate. When I do a similar test with genomic data and a different assembler, the alignment rate is ~99% but with idba-ud is ~50%.

I would be happy to hear any and all opinions- thank you!

--Fiona

$fq2fa --merge --filter reads1.fastq reads2.fastq reads.fa
$idba_ud -r reads.fa -o idba_out/ --min_contig 50
number of threads 24
reads 199502
long reads 0
extra reads 0
read_length 150
kmer 20
kmers 7317208 7298126
merge bubble 3751
contigs: 42819 n50: 245 max: 2210 mean: 178 total length: 7638759 n80: 131
aligned 80918 reads
confirmed bases: 688566 correct reads: 1040 bases: 28
distance mean 240.59 sd 93.1852
seed contigs 37131 local contigs 85638
kmer 40
kmers 7340190 7312415
merge bubble 772
contigs: 33047 n50: 340 max: 3134 mean: 253 total length: 8392599 n80: 150
aligned 107120 reads
confirmed bases: 992784 correct reads: 1674 bases: 29
distance mean 264.159 sd 90.9357
seed contigs 32834 local contigs 66094
kmer 60
kmers 6839939 6812671
merge bubble 369
contigs: 28675 n50: 400 max: 3422 mean: 291 total length: 8362177 n80: 150
aligned 113868 reads
confirmed bases: 1086059 correct reads: 1827 bases: 4
distance mean 270.413 sd 89.6104
seed contigs 28675 local contigs 57350
kmer 80
kmers 6241857 6216146
merge bubble 223
contigs: 25270 n50: 435 max: 3422 mean: 320 total length: 8088398 n80: 173
aligned 114768 reads
confirmed bases: 1112629 correct reads: 1899 bases: 0
distance mean 273.898 sd 88.4771
seed contigs 25270 local contigs 50540
kmer 100
kmers 5667844 5644274
merge bubble 157
contigs: 21570 n50: 470 max: 5957 mean: 354 total length: 7642588 n80: 213
reads 199502
aligned 113829 reads
distance mean 277.28 sd 87.3132
expected coverage 6.02324e-07
edgs 27
contigs: 21543 n50: 471 max: 5957 mean: 354 total length: 7638711 n80: 213
$chdir idba_out/
$bowtie2 build contig.fa contig
$bowtie2 -x contig -1 ../reads1.fastq -2 ../reads2.fastq -S test.sam
99762 reads; of these:
  99762 (100.00%) were paired; of these:
    50873 (50.99%) aligned concordantly 0 times
    48148 (48.26%) aligned concordantly exactly 1 time
    741 (0.74%) aligned concordantly >1 times
    ----
    50873 pairs aligned concordantly 0 times; of these:
      7270 (14.29%) aligned discordantly 1 time
    ----
    43603 pairs aligned 0 times concordantly or discordantly; of these:
      87206 mates make up the pairs; of these:
        73290 (84.04%) aligned 0 times
        12294 (14.10%) aligned exactly 1 time
        1622 (1.86%) aligned >1 times
63.27% overall alignment rate
$bowtie2 -x contig -f ../reads.fa -S bowtie.sam
199502 reads; of these:
  199502 (100.00%) were unpaired; of these:
    72937 (36.56%) aligned 0 times
    121259 (60.78%) aligned exactly 1 time
    5306 (2.66%) aligned >1 times
63.44% overall alignment rate

Question: Record which reads map to which contigs?

Good morning, and thank you for making IDBA available! I have a quick question which I hope you don't mind me posting here: Is it possible to record which reads map to a particular contig with IDBA_UD? I know that I could use something like BWA to map reads back to the contigs produced with IDBA_UD, but was wondering if it is possible to save this information as the contigs are being produced.

Thanks!
--Fiona

Is there a limit to the amount of data processed by this software?

Dear Yu Peng
Thanks for your great tool for assembly the metagenomic reads. But when i assembled a large data (about 114Gb,150bp paired-end reads),I got some problems in the process.
My command is " idba_ud -r PE.fa --maxk 100 --step 10 -o idba_out/ --num_threads 32 --min_contig 200". there is a error after running ,the error is only that "terminate''.
The log is that"
number of threads 32
reads 707147590
long reads 0
extra reads 0
read_length 150
kmer 20
kmers 16309389272 17530273423
”
Does my data size exceed the maximum amount of data that can be assembled by this software?
Because it makes me very puzzled now, I hope you can help me.
I am looking forward to you reply.Thank you very much!
Best wishes!

-o outfile not working with --mink ?

I am using this command:
idba_ud -r ${infile} --mink 55 --maxk 124 --min_contig 500 --num_threads 10 -o ${oufile}
or this:
idba_ud -r ${infile} --mink 55 --min_contig 500 --num_threads 10 -o ${oufile}
idba_ud -r ${infile} --mink 83 --min_contig 500 --num_threads 10 -o ${oufile}
idba_ud -r ${infile} --mink 101 --min_contig 500 --num_threads 10 -o ${oufile}

which creates an unspecified directory for the assembly called 895319?

I don't know where this number comes from. It is not what I tell it.

If I use this command:
idba_ud -r ${infile} -o ${outfile} --min_contig 500 --num_threads 10

I get the actual output directory I specify.

Any thoughts? Thank you.

read_count in output

I ran IDBA-UD recently. My output looks like this

scaffold_0 length_285165 read_count_39166
CGGGCCCTACCGTAGCAGCCGCTACGGTAGGGCCCGTGTCCAGT......
scaffold_1 length_297291 read_count_43258
GAGAGCGTCATCGATCATCGCGACGCAGGTTAACAGCGATTGCCGATGTTTACAACC........

I am wondering if read_count now refers to reads that aligned to a scaffold. I have seen previous discussion that 'read_count' refers to the k-mer count https://groups.google.com/forum/#!topic/hku-idba/5z696QgAooM
Thanks,

Puzzled with the error after running idba_ud

Hi Yu Peng,
The idba_ud tool is published by you.It's a very useful tool that I want use in my project.However,I got some problems in the process.
My command is " idba_ud -l ../PE.fa --mink 27 --maxk 117 --step 10 --num_threads 2 --pre_correction -o idba_scaffold --no_bubble". there is a error after running ,the error is that "terminate called after throwing an instance of 'std::logic_error'
what(): SequenceReader::SequenceReader() istream is invalid".
Because it make me very puzzled now, I hope you can help me.
I am looking forward to you reply.Thank you very much!
Best wishes!

--no_correction question

Hi I am planning on assembling contigs using idba but I came across some of the optional flags for the script. Does the --no_correction flag mean that error correction will not be done? I ask because I've already done an error correction step after the quality trimming of my raw sequences.

Best,
Clarisse

IDBA-UD for low coverage eukaryote genome

Hello,
I found that this software was used to assemble a polyploid plant and I am wondering if it could work as well in my case.
I have 4 plant genomes (sequenced at about 20x, PE libraries up to 450 bp, 2x100 or 2x150 bp) that are from 1 to 1.4 Gb in size and are heterozygous. I assembled them with Meraculous (with some scaffolding and gap closing) and I can assemble 50-80% of the size, while the scaffold N50 is between 5 and 12 kb.
Do you think this tool will work well? Will I need to run some error correction? I am a bit reluctant since the coverage is quite low. Shall I still try scaffolding and closing gaps after a IDBA-UD run?
Thanks,
Dario

No Contig Cutoff?

My resulting assembly has quite a few (80 of 180) contigs that are smaller than 1000bp, is there an option to only keep contigs above specified length on IDBA like Velvet and SPADES?

Also, IDBA happens to output scaffold.fa and contigs.fa, are there any differences between those even if using IDBA-UD? Thanks

what(): ShortSequence: Sequence is too long.

Hi,

I am new to the IDBAUD bioinformatics research. I came across with the following problem and hope anyone might help 👍 :

My code is:

#!/bin/bash
#SBATCH -t 80:00:00
#SBATCH -N 1
#SBATCH -c 8
#SBATCH --mem=80G

module load IDBAUD/1.1.0

idba_ud -r A_S1_outmergefile.fa -o A_S1_assemblesample --num_threads 8 --mink 40 --maxk 120 --step 10

scontrol show job $SLURM_JOB_ID

The error message is (from the Slurm file):
terminate called after throwing an instance of 'std::logic_error'
what(): ShortSequence: Sequence is too long.
/var/lib/slurmd/job64877788/slurm_script: line 9: 11279 Aborted

Thank you in advance for your kind help!

Best,
Luke

idba-1.1.1: 'make install' does not install all binaries

Hi,
it seems something is wrong with the install target in Makefile. I see the binaries were compiled:


x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o idba_hybrid idba_hybrid.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o sort_reads sort_reads.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o parallel_rna_blat parallel_rna_blat.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o test test.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o print_graph print_graph.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o validate_rna validate_rna.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o scaffold scaffold.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o shuffle_reads shuffle_reads.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o fa2fq fa2fq.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o validate_contigs_mummer validate_contigs_mummer.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o filter_blat filter_blat.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o sort_psl sort_psl.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o validate_reads_blat validate_reads_blat.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o validate_component validate_component.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o filter_contigs filter_contigs.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o sample_reads sample_reads.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o split_fq split_fq.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o idba_tran_test idba_tran_test.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o split_fa split_fa.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o fq2fa fq2fa.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o raw_n50 raw_n50.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o parallel_blat parallel_blat.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o split_scaffold split_scaffold.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o validate_contigs_blat validate_contigs_blat.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o filterfa filterfa.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o sim_reads sim_reads.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o idba_ud idba_ud.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o sim_reads_tran sim_reads_tran.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o idba_tran idba_tran.o  ../lib/libassembly.a
x86_64-pc-linux-gnu-g++ -Wall -O3 -fopenmp -pthread -O2 -pipe -maes -mpclmul -mpopcnt -mavx -march=native -fopenmp -pthread -Wl,-O1 -Wl,--as-needed -o idba idba.o  ../lib/libassembly.a
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/bin'

but copied into ${DESTDIR} were only:

make -j2 DESTDIR=/scratch/var/tmp/portage/sci-biology/idba-1.1.1/image/ install 
Making install in lib
make[1]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/lib'
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/lib'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/lib'
make[1]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/lib'
Making install in bin
make[1]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/bin'
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/bin'
make[2]: Nothing to be done for 'install-data-am'.
 /bin/mkdir -p '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/image//usr/bin'
  /usr/lib/portage/python2.7/ebuild-helpers/xattr/install -c idba_hybrid '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/image//usr/bin'
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/bin'
make[1]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/bin'
Making install in test
make[1]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/test'
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/test'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/test'
make[1]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/test'
Making install in script
make[1]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/script'
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/script'
make[2]: Nothing to be done for 'install-data-am'.
 /bin/mkdir -p '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/image//usr/bin'
 /usr/lib/portage/python2.7/ebuild-helpers/xattr/install -c scan.py run-unittest.py validate_blat validate_blat_parallel '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/image//usr/bin'
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/script'
make[1]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1/script'
make[1]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1'
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1'
make[1]: Leaving directory '/scratch/var/tmp/portage/sci-biology/idba-1.1.1/work/idba-1.1.1'
>>> Completed installing idba-1.1.1 into /scratch/var/tmp/portage/sci-biology/idba-1.1.1/image/

So, I ended up with only these installed:

/usr
/usr/bin
/usr/bin/idba_hybrid
/usr/bin/run-unittest.py
/usr/bin/scan.py
/usr/bin/validate_blat
/usr/bin/validate_blat_parallel
/usr/share
/usr/share/doc
/usr/share/doc/idba-1.1.1
/usr/share/doc/idba-1.1.1/AUTHORS
/usr/share/doc/idba-1.1.1/README.bz2

No scaffold.fa file at the end of the process

Visualizing assembly graph?

I was wondering if there is a way to output the assembly graph generated by IDBA-UD into a format that can be visualized with a tool like Bandage? I am specifically interested in inspecting the edges connecting my contigs. Currently fastg, LastGraph, Trinity.fasta, ASQG, and GFA formats are supported by Bandage.

Thanks!

Modifying "kMaxShortSequence" or using "--long_read arg"

Hi Yu Peng,

what is the difference between modifying "kMaxShortSequence" in src/sequence/short_sequence.h or using the parameter "--long_read arg" when using longer (PE) reads, e.g. 300bp?
Or, what are "long reads" for?

I assume it is not the same, but I couldn't find info on that.

best,
Sven

typo in README

thanks for your software.

I found a typo in the README file:
epspacially instead of especially I guess.

Does not build using gcc 6

Hi,

as you can read in the Debian bug report
https://bugs.debian.org/811644
idba does not build with gcc version 6 which has stronger type checks:

g++ -DHAVE_CONFIG_H -I. -I.. -I../src -I../gtest_src -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -O3 -fopenmp -pthread -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -c -o sequence_reader.o test -f '../src/sequence/sequence_reader.cpp' || echo './'../src/sequence/sequence_reader.cpp
../src/sequence/sequence_reader.cpp: In member function 'virtual bool FastaReader::ReadRecord(Sequence&, std::__cxx11::string&, std::_cxx11::string&)':
../src/sequence/sequence_reader.cpp:50:40: error: cannot convert 'std::istream {aka std::basic_istream}' to 'bool' in return
return ReadFasta(*is, seq, comment);
^

../src/sequence/sequence_reader.cpp: In member function 'virtual bool FastqReader::ReadRecord(Sequence&, std::__cxx11::string&, std::_cxx11::string&)':
../src/sequence/sequence_reader.cpp:55:49: error: cannot convert 'std::istream {aka std::basic_istream}' to 'bool' in return
return ReadFastq(*is, seq, comment, quality);
^

Makefile:441: recipe for target 'sequence_reader.o' failed

Please note that only the first issue is reported and there might be similar ones later in the build sequence.

Kind regards

   Andreas.

ShortSequence: Sequence is too long

I combined paired fq files into a fa file through " fq2fa --merge --filter 10.1.fa 10.2.fa 10.fa", then got the error hit when I run command "idba_ud -r 10.fa -o ../01assembly/".
the sequences I used is 150-300 bp.
the 10.fa file looks good.
the error hit is:terminate called after throwing an instance of 'std::logic_error'
what(): ShortSequence: Sequence is too long.
Aborted (core dumped)
I donot know what is wrong, could you please help me?

Using assembly as reference

Hi,

I´m wondering if there is a way to give idba a previous assembly from a subset of reads as a backbone/reference for an additional assembly of total reads. Would idba_hybrid provide such a functionality or is it only meant for whole reference genomes?
I´m searching something like the "--trusted-contigs" of spades, but spades uses too much computational resources.

Best,
Christ

idba_hybrid : Segmentation fault (kMaxShortSequence=256)

Dear Yu Peng,

I tried to use idba_hybrid to assemble illumina PE/MP read data (genomic) using abyss-assembly-scaffolds generated with this dataset.
In one approach I used the MP data for scaffolding (read_level_n), in another I used the MP data together with the PE data as normal reads (-r).

reads 1
long reads 0
extra reads 0
read_length 83
aligned 0 reads
distance mean -nan sd -nan
confirmed bases: 0
reference contigs: 1 n50: 0 max: 0 mean: 0 total length: 0 n80: 0
invalid insert distance
kmer 41
kmers 26 26
merge bubble 0
contigs: 1 n50: 66 max: 66 mean: 66 total length: 66 n80: 66
aligned 0 reads
distance mean -nan sd -nan
invalid insert distance

[more kmers]

Both died with a Segmentation fault on high memory machines (1TiB RAM)... without further error message or any hint what was going wrong. A program shouldn't segfault ..

What more info can I provide? Any idea?

best,
Sven

loneknightpy / idba Goto Github PK

idba's People

Contributors

Stargazers

Watchers

Forkers

idba's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs