GithubHelp home page GithubHelp logo

gabaldonlab / redundans Goto Github PK

View Code? Open in Web Editor NEW
129.0 8.0 20.0 65.24 MB

Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.

License: GNU General Public License v3.0

Python 14.75% Shell 3.58% Perl 4.42% Dockerfile 0.10% Makefile 0.88% C++ 73.50% C 0.11% R 0.66% Java 1.99%
genome-assembly python pipeline fasta contigs heterozygous polymorphic docker-image mate-pairs paired-end assembled-contigs gap closing scaffolding bioinformatics assembly genomics

redundans's Issues

Redundans dockerfile

Hello,
is it possible to get the redundans dockerfile somewhere ? I'm not looking for the image only as it is available on the dockerhub, but the dockerfile itself.
Our cluster architecture requires that we provide some modifications to the dockerfile in order to deploy the tools.
Best Regards

Error during execution of redundans

Hi!

I have successfully used redundans on one of my genome assemblies, but doing it on another one I get this error. Any help with this would be highly appreciated.

Philipp


Traceback (most recent call last):
  File "/home/philipp/src/redundans/redundans.py", line 450, in <module>
    main()
  File "/home/philipp/src/redundans/redundans.py", line 445, in main
    o.verbose, o.log)
  File "/home/philipp/src/redundans/redundans.py", line 311, in redundans
    identity, overlap, minLength, resume)
  File "/home/philipp/src/redundans/redundans.py", line 150, in run_scaffolding
    threads, limit, iters=1, resume=resume, verbose=0, log=log, basename=basename)
  File "/home/philipp/src/redundans/redundans.py", line 242, in run_gapclosing
    fasta_stats(index)
  File "/home/philipp/src/redundans/fasta_stats.py", line 26, in fasta_stats
    A, C, G, T = map(sum, zip(*[stats[-4:] for stats in id2stats.itervalues()]))
ValueError: need more than 0 values to unpack

Error while installng redundans

Hello !

I've tried ton install the easy way redundans. I'veperformed the following steps :

git clone --recursive https://github.com/lpryszcz/redundans.git
cd redundans && bin/.compile.sh

As it is written that every dependencies is already included, I did not expected the following error I got while trying to run :

./redundans.py -v -i /media/loutre/SUZUKII/illumina_reads/*.fastq -l '/media/loutre/SUZUKII/fasta_reads_pacbio/filtered_subreads_clean.fasta' -f '/media/loutre/SUZUKII/assembly/Drosophila-suzukii-p.fasta' -t 30 -o '/media/loutre/SUZUKII/redundans'

Loutre:~/redundans$ ./redundans.py -v -i /media/loutre/SUZUKII/illumina_reads/*.fastq -l '/media/loutre/SUZUKII/fasta_reads_pacbio/filtered_subreads_clean.fasta' -f '/media/loutre/SUZUKII/assembly/Drosophila-suzukii-p.fasta' -t 30 -o '/media/loutre/SUZUKII/redundans'
Options: Namespace(fasta='/media/loutre/SUZUKII/assembly/Drosophila-suzukii-p.fasta', fastq=['/media/loutre/SUZUKII/illumina_reads/SRR1002946_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR1002946_1_trimmed.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR1002946_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR1002946_2_trimmed.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942797_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942797_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942798_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942798_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942799_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942799_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942800_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942800_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942801_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942801_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942802_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942802_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942803_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942803_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942804_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942804_2.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942805_1.fastq', '/media/loutre/SUZUKII/illumina_reads/SRR942805_2.fastq'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f73392991e0>, longreads=['/media/loutre/SUZUKII/fasta_reads_pacbio/filtered_subreads_clean.fasta'], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='/media/loutre/SUZUKII/redundans', overlap=0.8, reference='', resume=False, threads=30, verbose=True)
[ERROR] lastdb: not found

[ERROR] lastal: not found

When I'm looking in redundans installation, I can see the tool Last, so I guess it was downloaded, but maybe not compiled correctly.

Did I made a mistake or forgot something in the process ?

Cheers,

Roxane

blurfl/quux for blurfl/quux: No such file or directory at ~/redundans/SSPACE/SSPACE_Standard_v3.0.pl

Hi,
I received the below error when executing /home/lorencm/apps/redundans/redundans/redundans.py -i *.fq -f StriDe-contigs.fa -o redundans --sspacebin /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl -t 7

redundans/contigs.fa    372614  301     75352   20.22   87      28.90   93.679  0       297773  79.91   214     71.10
   59555 pairs. 35742 passed filtering [60.02%]. 12505 in different contigs [21.00%].
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 385.
   59555 pairs. 36473 passed filtering [61.24%]. 10051 in different contigs [16.88%].
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 761.
blurfl/quux for blurfl/quux: No such file or directory at /home/lorencm/apps/redundans/SSPACE/SSPACE_Standard_v3.0.pl line 385.
#fname  contigs bases   GC [%]  contigs >1kb    bases in contigs >1kb   N50     N90     Ns      longest
redundans/contigs.fa    301     372614  34.815  228     320138  1203    978     0       6074
redundans/contigs.reduced.fa    214     297773  33.639  197     281296  1289    1018    0       6074
redundans/_sspace.1.1.fa        188     299268  33.639  175     286585  1416    1027    1495    8690
redundans/_sspace.1.2.fa        188     299268  33.639  175     286585  1416    1027    1495    8690
redundans/scaffolds.fa  188     299268  33.639  175     286585  1416    1027    1495    8690
redundans/_gapcloser.1.1.fa     188     299625  33.656  175     286942  1416    1031    195     8955
redundans/_gapcloser.1.2.fa     188     299644  33.655  175     286961  1416    1031    195     8956
redundans/scaffolds.filled.fa   188     299644  33.655  175     286961  1416    1031    195     8956
#Time elapsed: 0:00:39.005123
redundans/contigs.fa    372614  301     75352   20.22   87      28.90   93.679  0       297773  79.91   214

What did I do wrong?

Thank you in advance
mic

I/O checks

  • PREREQUISITES
    • dependencies installed
  • INPUT FILES
    • catch non-existing files
      • catching [Errno 95] Operation not supported in samba
    • check if supported format
  • LIBRARY QUALITY WARNINGS
    • high fraction of paired-end in mate pairs, or more generally not consisted read orientation
    • insert size stdev is large
  • OUTPUT FILES
    • check if all output files exists

redundans failed

Dear all,

I launched the following script to assist my assembly with redundans :

#!/bin/bash
redundans.py -i /home/raw/S04[56]_[12].trimmed.fastq -f /home/contig_k116.fasta -o /data_sra_home/scaffolding_redundans/run1/ -t 40 --resume -m 200 -l /data_sra_home/scaffolding_redundans/SR_subset_1K.fa

bellow is the screenshot of my terminal after the failed attempt :
##################################################
[Sun Mar 18 10:18:32 2018] Resuming previous run from /data_sra_home/scaffolding_redundans/run1/...
[WARNING] numpy or matplotlib missing! Cannot plot histogram
/data_sra_home/scaffolding_redundans/run1/contigs.fa 665152391 83469 141609580 21.29 42509 50.93 88.051 0 523542811 78.71 40960 49.07
Insert size statistics Mates orientation stats
FastQ files read length median mean stdev FF FR RF RR
Traceback (most recent call last):
File "/home1/software/redundans/redundans.py", line 539, in
main()
File "/home1/software/redundans/redundans.py", line 534, in main
o.norearrangements, o.verbose, o.usebwa, o.log, o.tmp)
File "/home1/software/redundans/redundans.py", line 321, in redundans
libraries = get_libraries(fastq, lastOutFn, mapq, threads, verbose, log, usebwa=usebwa)
File "/home1/software/redundans/redundans.py", line 61, in get_libraries
libdata = fastq2insert_size(log, fastq, fasta, mapq, threads, limit/100, genomeFrac, stdfracTh, maxcfracTh, usebwa=usebwa)
File "/home1/software/redundans/bin/fastq2insert_size.py", line 191, in fastq2insert_size
isstats = get_isize_stats(fq1, fq2, fasta, mapq, threads, limit, verbose, stdfracTh, maxcfracTh)
File "/home1/software/redundans/bin/fastq2insert_size.py", line 124, in get_isize_stats
rname, flag, chrom, pos, mapq, cigar, mchrom, mpos, isize, seq = sam.split('\t')[:10]
ValueError: need more than 1 value to unpack

thank you for your help to raise this issue!

subprocess error

Hi all, I am working on a polymorphic genome, and tried to use this software. But after installing everything, I kept getting the following error when using the test data. I am not sure what is the reason. Can anyone help me to solve this problem? Thank you so much.

Traceback (most recent call last):
File "./redundans.py", line 521, in
main()
File "./redundans.py", line 516, in main
o.norearrangements, o.verbose, o.log)
File "./redundans.py", line 315, in redundans
libraries = get_libraries(fastq, lastOutFn, mapq, threads, verbose, log)
File "./redundans.py", line 62, in get_libraries
genomeFrac, stdfracTh, maxcfracTh)
File "/home/xinw/software/redundans/bin/fastq2insert_size.py", line 189, in fastq2insert_size
isstats = get_isize_stats(fq1, fq2, fasta, mapq, threads, limit, verbose, stdfracTh, maxcfracTh)
File "/home/xinw/software/redundans/bin/fastq2insert_size.py", line 111, in get_isize_stats
aligner = _get_snap_proc(fq1, fq2, fasta, threads, verbose, alignerlog)
File "/home/xinw/software/redundans/bin/fastq2sspace.py", line 133, in _get_snap_proc
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=log)
File "/usr/local/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/usr/local/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Please make the installation relocatable

While I appreciate the automation provided with INSTALL.sh, it assumes several things:

  • You're installing for your own user only.
  • You're installing in your home directory.
  • All required libraries are present.
  • None of the dependencies are installed already.

In a typical HPC environment multiple users want to use a single installation of the same tool in multiple servers, typically on a shared volume over NFS or similar. Most of them also implement dynamic management of environment variables through Environment Modules, rather than relying on ~/.bashrc. or the like.

Why not providing a standard ./configure and Makefile to allow system administrators to relocate where to install (via --prefix)?
Alternatively, you could let the end user install the dependencies himself but still check for their presence in the environment PATH (the way you do now after compiling them).

Thanks

Bug: automake-1.15 needed

Regarding your installation instructions:

(cd bin/parallel && make clean && ./configure && make)

I believe make clean should be after the configure command.

Additionally, for some reason your configure script inside bin/parallel is requiring the exectable automake-1.15 instead of automake. It is important that we use specifically version 1.15 of automake? If not, perhaps you can fix this? My current workaround is to create a symbolic link ln -s /bin/automake automake-1.15.

optimize `sort` for large data sets

Hi Leszek,

I am currently trying to use redundans with a 3Gbp diploid fragmented plant genome. I have got 39 .psl files with a total of 138GB. Now I am kind of stuck in sort-ing those. Do have an idea on how to go about optimizing the sorting? Currently it uses on 5MB tmp files and does not run in parallel..

Cheers
Thomas

How is scaffolds.reduced.fa created?

I ran redundans.py using the options --noscaffolding and --nogapclosing.
Below is the full run command I used.

python redundans.py --noscaffolding --nogapclosing -t 64 -f draft.contig.fa --identity 0.90 --overlap 0.95 --log draft.contig.log -o draft.contig

I used --noscaffolding, but not only contigs.reduced.fa, but scaffolds.reduced.fa was also written as output.
I did not use paired-end reads as input. What is the criteria that scaffolds.reduced.fa is output?
What is the difference between contigs.reduced.fa and scaffolds.reduced.fa?

fasta2homozygous.py error

Dear Leszek,
Your Redundans assembly pipeline looks very promising. Unfortunately I got stuck when running the test run:

./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1

I got the following error message:
File "/home/mgalland/workspace/wild_tomato_genomes/scr/redundans/bin/fasta2homozygous.py", line 126 contig2skip = {c: 0 for c in faidx}
SyntaxError: invalid syntax

The forlooks like the problem here.

Can you help there?
Thanks in advance,
Marc

Maximum sequences limited to 327068

For redundans is there a upper limit of sequences that can be stored (first step). I have few assemblies that have more than 327,068 contigs, but by default redundans only reads the above mentioned scaffolds for processing.

Thanks for any input!

Not multiprocessing?

Hi,
This didn't seem to have been covered before. I'm running redundans with '-t 32' yet when I do a 'htop' to look at the processor usages, I see 32 instances of lastal being put through one processor. So they are being threaded through the one processor.

image

I did some speed tests with -t2 and -t32 and got no improvement in run time. I've tried two different systems with different linux os. Same problem.

When I run the lastal command at the commandline, I see the expected behaviour. My python skills are not particularly strong. I spoke with a colleague who said that getting python to run multiprocessor jobs can be a pain. Is there a problem?

Thanks

James

Docker - OSError: [Errno 2] No such file or directory

Hi,
I used your latest docker container but for some reasons I got OSError: [Errno 2] No such file or directory

$ docker run -v `pwd`:/test:rw -it lpryszcz/redundans

root@7bf273fe0c52:/# head /test/BAFB.contigs.fasta
>tig00000003 len=68902 reads=220 covStat=35.22 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
AAAAGTTTTCAGAGAAGTAAGCTTCTGTGGTTTTATCATGGATAGTGCAGTTATGTCCAGCGACAATAGATCTCTATCTTATTGAAGGAATTTTGTGTTA
TTCACACTCTTGTTTTTGGCCTTCAGGAAGACTGATCGAGTCATCTTCAGCTAGAGTGGTAATTCTTCTGTCTGTGGCATCTAGCCATTCTTCCAACCTG
TTTTCTTTTCTTCTTCTTTTTCTTGTTACATTACACTGTAGTTGATGCGTCGATAAGAAGCTAAGATTTTATTTGCTGTCATTGCTTAATGTCTTTGTCA
TGTTCAAACTGACTGTAAGAAAATACTTTGTTGGAAATTCATGCTTCTATGCTTCAAGTCACACCTTATATAGATTCTGATCAAAAGCTTTAAAGGTAAA
AGATGATATAAGGCTTATAGAACGACTCTGATAGTTTCTGGGAGCAATACTGTACCAAGACATCTTATTACTTGCCAACAGTTTAATCCATTCATATATT
CAGCTAGTGCATTGCTATGGTCTGGTAAATTTTCTTTTTCTTTTGAGTAAACAGTTTTGTCTAAAGGCTCTTCTGTTTTTTATTGTAATGTCAGATGCTA
AGTTGAGAAATAACACTAAATTTATGCTGCATCAATTCTAACTCATTCTAGCTTGCTGCCTGTTATCTTATATCAAAAGGCTGATATGGAGGAGACGGAG
AGAGACTTGAAAAAGAACTACGAGAGTATGGGTATTTCATAGACATGCATATTTCAGTGTGATATAGTCAGGTAAATGCATATTGCAAGTGGACTTGAAC
GACAATTTTAAGAATATTAAATTATGTCTTGCAAATGAACATATTCTAAAGTATCAAAAATCAAAGTTGTCAAGCTAAAGAGAAAGAAACTAAAGGTTCT

root@7bf273fe0c52:/# /root/src/redundans/bin/fasta2homozygous.py -i /test/BAFB.contigs.fasta -t 25
Homozygous assembly/ies will be written with input name + '.homozygous.fa.gz'
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs      [%]
Traceback (most recent call last):
  File "/root/src/redundans/bin/fasta2homozygous.py", line 238, in <module>
    main()
  File "/root/src/redundans/bin/fasta2homozygous.py", line 232, in main
    o.threads, o.verbose)
  File "/root/src/redundans/bin/fasta2homozygous.py", line 150, in fasta2homozygous
    contig2skip = fasta2skip(out, fasta, faidx, threads, identity, overlap, verbose)
  File "/root/src/redundans/bin/fasta2homozygous.py", line 63, in fasta2skip
    for i, (score, t, q, algLen, identity, overlap) in enumerate(hits, 1):
  File "/root/src/redundans/bin/fasta2homozygous.py", line 36, in fasta2hits
    last = run_last(fasta.name, identityTh, threads, verbose)
  File "/root/src/redundans/bin/fasta2homozygous.py", line 30, in run_last
    proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=sys.stderr)        
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

What did I miss?

Thank you in advance.

Michal

lastal: can't interpret TAB

Hello,

Ran into this error:

@BioPower3-IBM ~/programs/redundans $ ./redundans.py -v -i /shared/reads/Pseudoloma/GDR-24_R*_trim100.fastq -f ~/Pseudoloma/reads/mira_merged_sample50X_try1_out.unpadded_above500.fasta -o test/Pseudo_mira_100bp_Scaffold_try1 --sspacebin ~/programs/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl -t 8
Options: Namespace(fasta='/home/adrian/Pseudoloma/reads/mira_merged_sample50X_try1_out.unpadded_above500.fasta', fastq=['/shared/reads/Pseudoloma/GDR-24_R1_trim100.fastq', '/shared/reads/Pseudoloma/GDR-24_R2_trim100.fastq'], identity=0.51000000000000001, iters=2, joins=5, limit=0.20000000000000001, linkratio=0.69999999999999996, log=<open file '', mode 'w' at 0x7f84863ad1e0>, mapq=10, minLength=200, nocleaning=True, nogapclosing=True, noreduction=True, noscaffolding=True, outdir='test/Pseudo_mira_100bp_Scaffold_try1', overlap=0.66000000000000003, sspacebin='/home/adrian/programs/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl', threads=8, verbose=True)
Aligning 3594477 mates per library...
Insert size statistics Mates orientation stats
FastQ files median mean stdev FF FR RF RR
/shared/reads/Pseudoloma/GDR-24_R1_trim100.fastq /shared/reads/Pseudoloma/GDR-24_R2_trim100.fastq 264 265.46 14.71 1 9985 13 1

[Sat Mar 26 14:51:11 2016] Reduction...

file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]

lastal: can't interpret: TAB
test/Pseudo_mira_100bp_Scaffold_try1/contigs.fa 17972387 5935 0 0.00 0 0.00 0.000 0 17972387 100.00 5935 100.00
Aligning 3594477 mates per library...

[Sat Mar 26 14:51:40 2016] Scaffolding...
iteration 1.1 ...

It doesn't look like it removed redundancies. Any idea what is causing this?
Adrian

fasta2homozygous fails for --identity 0.33 --overlap 0.51

~/src/redundans/fasta2homozygous.py -i redundans/MAGSPI*/contigs.fa --identity 0.33 --overlap 0.33

Homozygous assembly/ies will be written with input name + '.homozygous.fa.gz'

file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]

[ERROR] seq21018_len7789_cov189 (('seq1_len104_cov121', 104, 0, 70, 'seq9735_len101_cov148', 101, 31, 101, '+', 0.8285714285714286, 0.693069306930693, 46)) not in contigs!
[ERROR] seq26477_len7324_cov148 (('seq1_len104_cov121', 104, 0, 81, 'seq9735_len101_cov148', 101, 20, 101, '+', 0.7777777777777778, 0.801980198019802, 45)) not in contigs!
[ERROR] seq20577_len9015_cov141 (('seq3_len238_cov168', 238, 0, 238, 'seq10_len238_cov189', 238, 0, 238, '+', 0.9831932773109243, 1.0, 230)) not in contigs!
[ERROR] seq19649_len8979_cov175 (('seq3_len238_cov168', 238, 0, 84, 'seq16825_len172_cov141', 172, 88, 172, '+', 1.0, 0.4883720930232558, 84)) not in contigs!
[ERROR] seq20482_len6873_cov155 (('seq8_len6349_cov128', 6349, 588, 743, 'seq27924_len149_cov114', 149, 0, 149, '+', 0.8456375838926175, 1.0, 103)) not in contigs!
[ERROR] seq20999_len4731_cov189 (('seq8_len6349_cov128', 6349, 5653, 5935, 'seq5386_len269_cov141', 269, 0, 269, '+', 0.6505576208178439, 1.0, 81)) not in contigs!
[ERROR] seq24053_len5120_cov168 (('seq8_len6349_cov128', 6349, 4540, 4646, 'seq27404_len104_cov189', 104, 0, 104, '+', 0.8653846153846154, 1.0, 77)) not in contigs!
[ERROR] seq24131_len10134_cov155 (('seq8_len6349_cov128', 6349, 6249, 6349, 'seq7853_len154_cov296', 154, 0, 100, '+', 0.84, 0.6493506493506493, 68)) not in contigs!
[ERROR] seq25312_len13069_cov148 (('seq8_len6349_cov128', 6349, 419, 679, 'seq5386_len269_cov141', 269, 0, 269, '+', 0.620817843866171, 1.0, 65)) not in contigs!
[ERROR] seq18534_len7476_cov148 (('seq8_len6349_cov128', 6349, 5849, 5999, 'seq27924_len149_cov114', 149, 0, 149, '+', 0.6912751677852349, 1.0, 57)) not in contigs!
[ERROR] seq24935_len5793_cov162 (('seq8_len6349_cov128', 6349, 493, 613, 'seq14809_len119_cov94', 119, 0, 119, '+', 0.6890756302521008, 1.0, 45)) not in contigs!
Traceback (most recent call last):
File "/home/lpryszcz/src/redundans/fasta2homozygous.py", line 234, in
main()
File "/home/lpryszcz/src/redundans/fasta2homozygous.py", line 228, in main
libraries, limit, o.threads, o.joinOverlap, o.endTrimming, o.verbose)
File "/home/lpryszcz/src/redundans/fasta2homozygous.py", line 122, in fasta2homozygous
contig2skip, identity = hits2skip(hits, faidx, verbose)
File "/home/lpryszcz/src/redundans/fasta2homozygous.py", line 74, in hits2skip
if contig2skip[q]:
KeyError: 'seq18807_len4442_cov148'

Canu assebly (PacBio)

Hi,
I have got a PacBio assembly produced by Canu. It has created 3 output files (BAFB.contigs.fasta, BAFB.unassembled.fasta, BAFB.unitigs.fasta). The meaning of the output files is described here.

Which of the files should I use for redundans?

Thank you in advance.

Michal

error running test and personal data set

I initially posted a comment on BioStar here. But for convenience, and a recent error, I'll continue the dialogue, here.
I've downloaded all of the dependencies for redundans and after assembling on Spades I attempted to run:

./redundans.py -i /home/molecularecology/Desktop/zcpb/CPBassembly/Ldec_180bp_male_1CLEAN.fq /home/molecularecology/Desktop/zcpb/CPBassembly/Ldec_180bp_male_2CLEAN.fq -f /home/molecularecology/Desktop/zcpb/CPBassembly/CPB_spades/CPBscaffolds.fasta -o redundansCPB

but received the error:

[ERROR] GapCloser: not found


Make sure you have installed all dependencies from https://github.com/lpryszcz/redundans#manual-installation !

and I'm fairly certain I have GapCloser. In the src directory:

41SSPACE-STANDARD-3.0_linux-x86_64.tar.gz  last-714
bwa                                        last-714.zip
bwa-0.7.12                                 redundans
bwa-0.7.12.tar.bz2                         redundans.install.log
GapCloser                                  redundans.tgz
GapCloser-bin-v1.12-r6.tgz                 SSPACE
GapCloser_Manual.pdf                       SSPACE-STANDARD-3.0_linux-x86_64
last

and redundans directory:

CHANGELOG.md          fastq2insert_size.py   GapFiller_v1-10_linux-x86_64
docs                  fastq2insert_size.pyc  INSTALL.sh
fasta2homozygous.py   fastq2mates.py         LICENSE
fasta2homozygous.pyc  fastq2sspace.py        pyScaf.py
FastaIndex.py         fastq2sspace.pyc       README.md
FastaIndex.pyc        filterReads.py         redundans.py
fasta_stats.py        filterReads.pyc        SSPACE-STANDARD-3.0_linux-x86_64
fasta_stats.pyc       GapCloser              test

So, any advice you could give would be greatly appreciated!

lastal: Input/output error

Redundans is working fine on most of my samples but for some I'm getting a lastal: Input/output error during scaffolding. Redundans seems to go ahead with gap closing without error, but it never terminates.

Here's an example log including the error. I terminated this job early but typically jobs with this error get through the second gap closing iteration also. I assigned the job 160 Gb of memory.

Options: Namespace(fasta='output/assembly3/GBSC.fasta', fastq=['output/filter_contamination2/GBSC.R1.fastq.gz', 'output/filter_contamination2/GBSC.R2.fastq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f47d9b0a1e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=True, noreduction=True, noscaffolding=True, outdir='output/redundans/GB', overlap=0.8, reference='input/ref/magna.nuc.fa', resume=False, threads=16, verbose=True)

##################################################
[Tue Apr 10 14:43:19 2018] Reduction...
#file name	genome size	contigs	heterozygous size	[%]	heterozygous contigs	[%]	identity [%]	possible joins	homozygous size	[%]	homozygous contigs	[%]
[WARNING] numpy or matplotlib missing! Cannot plot histogram
output/redundans/GB/contigs.fa	109746840	27555	7715577	7.03	20565	74.63	89.048	0	102031263	92.97	6990	25.37

##################################################
[Tue Apr 10 14:45:42 2018] Estimating parameters of libraries...
 Aligning 20406252 mates per library...
Insert size statistics				Mates orientation stats
FastQ files	read length	median	mean	stdev	FF	FR	RF	RR
output/filter_contamination2/GBSC.R1.fastq.gz output/filter_contamination2/GBSC.R2.fastq.gz	139	344	358.25	70.94	17	9114	859	10

##################################################
[Tue Apr 10 14:45:42 2018] Scaffolding...
 iteration 1.1: output/redundans/GB/contigs.reduced.fa	6990	102031263	40.535	6705	101849352	28763	6567	1007	201015
   20406253 pairs. 14761812 passed filtering [72.34%]. 207904 in different contigs [1.02%].
    1544643 pairs. 1476486 in different contigs [95.59%].
 iteration 1.2: output/redundans/GB/_sspace.1.1.fa	5923	102046669	40.535	5753	101939137	35144	7779	30809	272440
   20406253 pairs. 14868644 passed filtering [72.86%]. 184593 in different contigs [0.90%].

/usr/bin/bash: /mnt/lfs2/schaack/fmacrae/megadaph.private/megadaph/pipe/util/redundans/bin/last/src/lastal: Input/output error

    1476033 pairs. 1406284 in different contigs [95.27%].

##################################################
[Tue Apr 10 17:05:56 2018] Scaffolding based on reference...

##################################################
[Tue Apr 10 17:08:11 2018] Gap closing...
 iteration 1.1: output/redundans/GB/scaffolds.ref.fa	5781	119263174	40.535	5638	119173859	44959	9000	17253219	1690113

Thanks for building Redundans!

new features

  • AUTOMATISATION
    • estimate read limit
    • refine insert size estimation for each iteration
    • include optional genome assembly step, is it really needed?
    • include --resume option
    • installator (to assist less experienced users)
    • INSTALL.sh: UNIX installer
    • Docker image
    • Alpine Linux: strip numpy and biopython
    • official conda package #27 unofficial one
  • PERFORMANCE
  • SENSITIVITY
    • better similarity search algorithm ie. LAST
    • include depth of coverage
    • calculate insert size on major orientation only
  • Features
    • Support for long reads (PacBio, Nanopore)
    • Logging to file
    • FastaIndex
    • fasta2homozygous.py: report removed contigs
  • Exceptions
    • check LAST version
    • no read aligned due to very fragmented assembly
  • Long-read scaffolding
  • production version of pyScaf #51
  • excessive gaps when scaffolding pacbio reads
  • Code
  • Python3
  • remove nonsense #81
  • make sure single TMP is used across the pipeline #81
  • Output
  • gfa support (low priority) #72

sspace

any way around using sspace?

test fails after new install

Greetings. On CentOS (64 bit), installed in /usr/local automake 1.15 autoconf to 2.65 then, successfully
installed redundans (according to the log file), but the test failed. Suggestions? The devtoolset-4 is needed for a compiler which is recent enough to understand all the g++ command line switches. Typically programs produced this way need no special treatment when they are run.

cd ~/src
scl enable devtoolset-4 'source <(curl -Ls http://bit.ly/redundans_installer)' 2>&1 | tee redundans_installer.log
#worked!  Says to test it like so:
cd redundans
./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1
Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz', 'test/pacbio.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7fad26fc2270>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.8, reference='', resume=False, threads=4, verbose=True)

##################################################
[Tue Jan  9 15:17:39 2018] Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size    [%]     homozygous contigs      [%]
test/run1/contigs.fa    163897  245     66377   40.50   221     90.20   94.854  0       97520   59.50   24      9.80

##################################################
[Tue Jan  9 15:17:40 2018] Estimating parameters of libraries...
 Aligning 19504 mates per library...
Insert size statistics                          Mates orientation stats
FastQ files     read length     median  mean    stdev   FF      FR      RF      RR
test/5000_1.fq.gz test/5000_2.fq.gz     50      4986    4981.70 692.22  0       4067    14      0
test/600_1.fq.gz test/600_2.fq.gz       100     599     598.74  47.22   0       10000   0       0

##################################################
[Tue Jan  9 15:17:42 2018] Scaffolding...
 iteration 1.1: test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
   19505 pairs. 17302 passed filtering [88.71%]. 1627 in different contigs [8.34%].
    1526 pairs. 558 in different contigs [36.57%].
 iteration 1.2: test/run1/_sspace.1.1.fa        3       97626   39.344  3       97626   87536   6063    821     87536
   19505 pairs. 17607 passed filtering [90.27%]. 182 in different contigs [0.93%].
    1077 pairs. 124 in different contigs [11.51%].
 iteration 2.1: test/run1/_sspace.1.2.fa        3       97626   39.344  3       97626   87536   6063    821     87536
   19505 pairs. 15112 passed filtering [77.48%]. 1295 in different contigs [6.64%].
    3417 pairs. 396 in different contigs [11.59%].
 iteration 2.2: test/run1/_sspace.2.1.fa        1       99115   39.344  1       99115   99115   99115   2310    99115
   19505 pairs. 15151 passed filtering [77.68%]. 0 in different contigs [0.00%].
    3398 pairs. 0 in different contigs [0.00%].

##################################################
[Tue Jan  9 15:17:48 2018] Gap closing...
 iteration 1.1: test/run1/scaffolds.fa  1       99115   39.344  1       99115   99115   99115   2310    99115

##################################################
[Tue Jan  9 15:17:49 2018] Final reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size    [%]     homozygous contigs      [%]
Traceback (most recent call last):
  File "./redundans.py", line 521, in <module>
    main()
  File "./redundans.py", line 516, in main
    o.norearrangements, o.verbose, o.log)
  File "./redundans.py", line 391, in redundans
    info = fasta2homozygous(out, open(lastOutFn), identity, overlap,  minLength, threads, verbose=0, log=log)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 207, in fasta2homozygous
    contig2skip = fasta2skip(out, fasta, faidx, threads, identity, overlap, minLength, verbose)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 136, in fasta2skip
    plot_histograms(out.name, contig2skip, identities, sizes)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 160, in plot_histograms
    for i, isize in zip(np.digitize(best, bins, right=1), bestalgsizes):
ValueError: Both x and bins must have non-zero length
rm -rf test/run1
scl enable devtoolset-4 './redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1'
#fails exactly the same way

Suggestions?

Thanks.

lastdb invalid argument

In Ubuntu 14.04 Trusty the correct command is:
lastdb -v

While the script uses --:

aln@notik:~/Science/schools/ngschool2016/ngschool2016-materials/src/redundans$ python redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 --sspacebin $SSPACEBIN
Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f72fd1cc1e0>, mapq=10, minLength=200, nocleaning=True, nogapclosing=True, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.66, resume=False, sspacebin='/home/aln/Science/schools/ngschool2016/ngschool2016-materials/src/SSPACE/SSPACE_Standard_v3.0.pl', threads=4, verbose=True)
[WARNING] Problem checking lastdb version: lastdb: invalid option -- '-'
lastdb: bad option
Traceback (most recent call last):
  File "redundans.py", line 450, in <module>
    main()
  File "redundans.py", line 438, in main
    _check_dependencies(dependencies)
  File "redundans.py", line 370, in _check_dependencies
    if int(curver)<version:
ValueError: invalid literal for int() with base 10: 'option'

FastaIndex.py and fasta_stats.py missing?

I just installed redundans both through the git repository as well as with the UNIX installer, but both versions give me the following error when I try to run the test:

~/redundans$ ./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 Traceback (most recent call last): File "./redundans.py", line 34, in <module> from fasta2homozygous import fasta2homozygous File "/home/pepijn/redundans/bin/fasta2homozygous.py", line 17, in <module> from FastaIndex import FastaIndex

When I look at the bin folder it seems that the symlinks for both python scripts are present, but they seem empty. Any idea what the problem might be?

How is multiple redundancy handled?

How does the reduction step handle redundancy that involve more than 2 sequences? When mapped to itself, my assembly had some contigs that aligned to multiple other shorter contigs at high confidence (>99% identity, longer than 250nt, as showed in the attached image, which is actually after running fasta2homozygous.py). I want to remove all but the longest contig. When I ran redundans (only fasta2homozygous.py ), I saw messages like "matched already removed contig". Does this mean that redundans removes only the first matched sequence?

Thanks,

Takeo
clipboard

IndexError: list index out of range

Hi,
Thanks for the useful software. I am having an issue where redundans returns an index error.
Python 2.7.11; I also installed all programs manually from versions/links listed in INSTALL.sh. The test case ran without error.

redundans.py -v -i ../../Anf.1.fastq.gz ../../Anf.2.fastq.gz -f redundans.in.fasta -t 5 -o run1 --sspacebin ~/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl --noscaffolding --nogapclosing

Options: Namespace(fasta='redundans.in.fasta', fastq=['../../Anf.1.fastq.gz', '../../Anf.2.fastq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '', mode 'w' at 0x7f47e31fe1e0>, mapq=10, minLength=200, nocleaning=True, nogapclosing=False, noreduction=True, noscaffolding=False, outdir='run1', overlap=0.66, sspacebin='~/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl', threads=5, verbose=True)
Aligning 69476742 mates per library...
Insert size statistics Mates orientation stats
FastQ files median mean stdev FF FR RF RR
../../Anf.1.fastq.gz ../../Anf.2.fastq.gz 403 399.70 135.09 4 9974 22 0

[Wed Jul 6 15:45:19 2016] Reduction...

file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]

run1/contigs.fa 347383710 121440 88802556 25.56 49234 40.54 79.471 0 260148921 74.89 72206 59.46
Traceback (most recent call last):
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 403, in
main()
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 398, in main
o.verbose, o.log)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 263, in redundans
limit = get_read_limit(reducedFname, readLimit, verbose, log)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/redundans.py", line 99, in get_read_limit
stats = fasta_stats(open(fasta))
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/fasta_stats.py", line 18, in fasta_stats
faidx = FastaIndex(handle)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 37, in init
self._generate_index()
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 70, in _generate_index
stats = self.get_stats(header, seq, offset)
File "/afs/crc.nd.edu/user/s/ssmall2/programs_that_work/redundans/FastaIndex.py", line 186, in get_stats
linebases, linebytes = len(seq[0].strip()), len(seq[0])
IndexError: list index out of range

code/syntax improvement

  • threading
    • fasta2homozygous.py
  • code polishing
    • fasta2homozygous.py
      • removed unnecessary sort
      • memory-optimised (generator instead of list)
    • fasta_stats.py
      • headers in .stats
    • fastq2sspace.py
    • sspace intermetiate files/folders in cur dir instead in outrid
    • make sure subprocesses are closed if not longer needed i.e. bwa
  • deprecate
    • fasta2diverged.py
    • Biopyton & SQLite: causing problems
    • scipy, numpy

Help info

  • README
    • I/O paragraph
    • program parameters
    • performance info
  • test/README
    • simulations
    • de novo genome assembly
    • redundans pipeline
      • run statistics
      • method description
    • accuracy estimation

Scaffolding without Illumina reads

Hello,

First, I want to thank you for this remarkable tool.

I used Redundans with only PacBio reads since I do not have Illumina reads. Here is the command:

../redundans/redundans/redundans.py -v -t 6 --log LOG.redundans -l Reads_?.fastq -f contigs.fasta -o redundans_results

I was surprised to realize that Redundans can generate scaffolds without having pairing information provided by Illumina reads. Can I have some explications ? How it can determines the number of "N" ?

Thank you

Use for iontorrent data

I want to modify this to use for iontorrent data. Should I do that or it is a no-go because some dependencies work only with paired-end data?

Feature: Log file

Hello,

Just thought I would suggest a main log file that starts with the command issued and all parameters used. The log file would then be followed by all standard output the program generates.

As a little extra, you could also add the basic stats (n50, # contigs, longest contig) for both the input contigs and the output generated.

Adrian

How to continue the redundans command?

Hi,
In my command, the redundans.py is running in the "closing gaps ..." step. But for some reason, the command is killed when I run nearly a month. So, I want to know how to continue this command and I really don't want to rerun.

Thanks.

reduce Pacbio assembly ?

Hi,

I have a fasta file after a pacbio assembly. I would like to run redundans but only the reducing option (no scaffolding and no gapclosing) with that command:

redundans.py -i reads.fastq -f assembly.fasta -o output --identity 0.9 --overlap 0.9 --minLength 200 --noscaffolding --nogapclosing -t 10 2> redundans.log

Does it work with pacbio ? does redundans need the input files (-i) when only the reduced option is applied ?

Some questions

Hi,

  1. I'm not sure to understand how redundans works.
    a) For 2 contig of the same lengths: it makes a consensus: ok
    b) For contigs with different lengths: does it makes a consensus of the longest as a seed ?

  2. Can redundans be applied to scaffolds instead of contigs ?

Run reduction only - no scaffolding libraries

If I understand correctly, illumina libraries are not required for the reduction step. I am running redundans with --noscaffolding --nogapclosing because I want to scaffold the reduced assembly with PacBio data. However, redundans always seems to require -i/--fastq and than checks insert sizes for the libraries. Is there a way around that?

Unix install failure

I ran the suggested unix install command (on Ubuntu 16.04) and got install error from Lastal:

'
Redundans and its dependencies will be installed in: /media/data/software/redundans

Installation will take 5-10 minutes.
To track the installation status execute in the new terminal:
tail -f /media/data/software/redundans/install.log

Do you want to proceed with installation (y/n)? y

Fri Feb 23 08:41:57 CST 2018 Checking dependencies...
Everything looks good :) Let's proceed...
Fri Feb 23 08:41:57 CST 2018 Downloading Redundans...
Fri Feb 23 08:41:58 CST 2018 Updating submodules...
Fri Feb 23 08:41:59 CST 2018 Compiling dependencies...
=== You can find log in: /media/data/software/redundans/install.log ===
I'll use 63 thread(s) for compiling
Fri Feb 23 08:41:59 CST 2018 BWA
Fri Feb 23 08:42:04 CST 2018 LASTal
[2] ERROR!
^
lastal.cc:761:33: error: no matching function for call to ‘cbrc::Alignment::makeXdrop(cbrc::Centroid&, cbrc::GreedyXdropAligner&, bool&, const uchar*&, const uchar*&, int&, const int (&)[64], int&, cbrc::GeneralizedAffineGapCosts&, int&, int&, size_t&, const int (&)[64], const cbrc::TwoQualityScoreMatrix&, const uchar*&, const uchar*&, cbrc::Alphabet&, cbrc::AlignmentExtras&, double&, int&)’
args.gamma, args.outputType );
^
lastal.cc:761:33: note: candidate is:
In file included from AlignmentPot.hh:8:0,
from lastal.cc:16:
Alignment.hh:80:8: note: void cbrc::Alignment::makeXdrop(cbrc::GappedXdropAligner&, cbrc::Centroid&, const uchar*, const uchar*, int, const int ()[64], int, const cbrc::GeneralizedAffineGapCosts&, int, int, size_t, const int ()[64], const cbrc::TwoQualityScoreMatrix&, const uchar*, const uchar*, const cbrc::Alphabet&, cbrc::AlignmentExtras&, double, int)
void makeXdrop( GappedXdropAligner& aligner, Centroid& centroid,
^
Alignment.hh:80:8: note: candidate expects 19 arguments, 20 provided
lastal.cc: In function ‘void eraseWeakAlignments(LastAligner&, cbrc::AlignmentPot&, size_t, char, const uchar*)’:
lastal.cc:775:12: error: ‘struct cbrc::Alignment’ has no member named ‘hasGoodSegment’
if (!a.hasGoodSegment(dis.a, dis.b, args.minScoreGapped, dis.m, gapCosts,
^
makefile:100: recipe for target 'lastal.o8' failed
make[1]: *** [lastal.o8] Error 1
make[1]: Leaving directory '/media/data/software/redundans/bin/last/src'
makefile:3: recipe for target 'all' failed
make: *** [all] Error 2
'

error: unrecognized arguments: /mnt/md1/REFERENCE.fa

I would like to try a reference-guided assembly to compare with my de novo assembly.

I am doing something wrong and would like any guidance that you can offer.

My command is:
/home/rob/src/redundans/redundans.py -v -i Pcri_800bp_12repaired_R1.fastq Pcri_800bp_12repaired_R2.fastq Pcri_3kb_12repaired_R1.fastq Pcri_3kb_12repaired_R2.fastq Pcri_5kb_12repaired_R1.fastq Pcri_5kb_12repaired_R2.fastq Pcri_10kb_12repaired_R1.fastq Pcri_10kb_12repaired_R2.fastq Pcri_470bp_40pct_R1.fastq Pcri_470bp_40pct_R2.fastq -r /mnt/md1/Mguttatus.fa -f /mnt/md1/redundans_1Kcut_u.5_iden0.91/scaffolds.filled.fa -o /mnt/md1/redundans_1Kcut_u.5_iden0.90_overlap0.66_REF --noscaffolding --threads 80 --identity 0.90 --overlap 0.66 --minLength 1000 --mapq 15 --iters 4 --sspacebin /home/rob/src/SSPACE/SSPACE_Standard_v3.0.pl

Which throws the following error:
redundans.py: error: unrecognized arguments: /mnt/md1/Mguttatus.fa

Mguttatus.fa is the fasta file for the reference assembly.

Why does it not recognize my Mguttatus.fa file?

Rob

confusion with contigs.reduced.fa.hetero.tsv file

Hello,

I've tried redundans before, but now I need it for a different purpose, so I downloaded latest version.

I suppose that the info displayed in first column of this auxiliar file represents all the contigs that were removed that fit the --identity and --overlap criteria. I confirmed that by observing their absence in the contigs.reduced.fa file. My doubt lies on those longer contigs represented in column 3. I see that not all of them are represented in the final reduction fasta file, which means that those that are the longer representation of some contigs in column 1, are also the shorter representation of some contigs in column 3, thus they also get removed. Is that right?

Just want to clear this out, before I proceed with my analysis.
Thanks in advance,
Pedro

TypeError: 'generator' object has no attribute '__getitem__'

Hi,
I ran:

$ python fasta2homozygous.py -i /scratch/fruit-data/BAFB.contigs.fasta -t 25
Homozygous assembly/ies will be written with input name + '.homozygous.fa.gz'
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs      [%]
Traceback (most recent call last):
  File "fasta2homozygous.py", line 239, in <module>
    main()
  File "fasta2homozygous.py", line 233, in main
    o.threads, o.verbose)
  File "fasta2homozygous.py", line 151, in fasta2homozygous
    contig2skip = fasta2skip(out, fasta, faidx, threads, identity, overlap, verbose)
  File "fasta2homozygous.py", line 71, in fasta2skip
    sys.stderr.write(' [ERROR] `%s` (%s) not in contigs!\n'%(q, str(hits[i-1])))
TypeError: 'generator' object has no attribute '__getitem__'
ubuntu@waterhouse-1:/mnt/apps/redundans/bin$ lastal: write error

Any idea what did I do wrong?

Thank you in advance.

Michal

Issue in Scaffolding

Hi,

When running redundans on the test dataset it passes reduction and library parameters fine, but then returns an error during scaffold construction:

Traceback (most recent call last):
  File "./redundans.py", line 512, in 
    main()
  File "./redundans.py", line 507, in main
    o.norearrangements, o.verbose, o.log)
  File "./redundans.py", line 314, in redundans
    identity, overlap, minLength, resume)
  File "./redundans.py", line 121, in run_scaffolding
    sspacebin, verbose=0, log=log)
  File "/home/sb36g09/software/redundans/bin/fastq2sspace.py", line 191, in fastq2sspace
    tabFnames = get_tab_files(out, fasta, libnames, libFs, libRs, libIS, libISStDev, libreadlen, cores, mapq, upto, verbose, log)
  File "/home/sb36g09/software/redundans/bin/fastq2sspace.py", line 144, in get_tab_files
    proc = _get_aligner_proc(f1.name, f2.name, ref, cores, verbose, bwalog)
  File "/home/sb36g09/software/redundans/bin/fastq2sspace.py", line 117, in _get_snap_proc
    proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=log)
  File "/local/software/python/2.7.5/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/local/software/python/2.7.5/lib/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Command line was just the one for running the test data:

./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1

Cheers,
Steve

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.