GithubHelp home page GithubHelp logo

Hangs with no feedback about cenote-taker2 HOT 11 CLOSED

mtisza1 avatar mtisza1 commented on July 28, 2024
Hangs with no feedback

from cenote-taker2.

Comments (11)

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Tried it again with a slightly different (larger) test file. Again it seems to hang, but at a different point. I say 'hang' because although it hasn't thrown an error, there appears to be nothing happening (no tasks running, not memory being allocated)

File with .fasta extension detected, attempting to keep contigs over 1000 nt and find circular sequences with apc.pl
WebsterMelRebuild1021.fasta has DTRs/circularity
WebsterMelRebuild1022.fasta has DTRs/circularity
WebsterMelRebuild1066.fasta has DTRs/circularity
WebsterMelRebuild1502.fasta has DTRs/circularity
WebsterMelRebuild1591.fasta has DTRs/circularity
WebsterMelRebuild1757.fasta has DTRs/circularity
WebsterMelRebuild1758.fasta has DTRs/circularity
WebsterMelRebuild1964.fasta has DTRs/circularity
WebsterMelRebuild2062.fasta has DTRs/circularity
WebsterMelRebuild2440.fasta has DTRs/circularity
WebsterMelRebuild2522.fasta has DTRs/circularity
WebsterMelRebuild2523.fasta has DTRs/circularity
WebsterMelRebuild2524.fasta has DTRs/circularity
WebsterMelRebuild2525.fasta has DTRs/circularity
WebsterMelRebuild2526.fasta has DTRs/circularity
WebsterMelRebuild2742.fasta has DTRs/circularity
WebsterMelRebuild3594.fasta has DTRs/circularity
WebsterMelRebuild3595.fasta has DTRs/circularity
WebsterMelRebuild3596.fasta has DTRs/circularity
WebsterMelRebuild3671.fasta has DTRs/circularity
WebsterMelRebuild378.fasta has DTRs/circularity
WebsterMelRebuild4581.fasta has DTRs/circularity
WebsterMelRebuild4643.fasta has DTRs/circularity
WebsterMelRebuild4835.fasta has DTRs/circularity
WebsterMelRebuild4861.fasta has DTRs/circularity
WebsterMelRebuild649.fasta has DTRs/circularity
WebsterMelRebuild651.fasta has DTRs/circularity
WebsterMelRebuild885.fasta has DTRs/circularity
no reads provided or reads not found
Circular fasta file(s) detected

Putting non-circular contigs in a separate directory
time update: running IRF for ITRs in non-circular contigs 03-11-21---09:25:34
time update: running prodigal on linear contigs  03-11-21---09:25:42
time update: running linear contigs with hmmscan against virus hallmark gene database: standard  03-11-21---09:27:10
time update: Calling ORFs for circular/DTR sequences with prodigal  03-11-21---09:27:55
time update: running hmmscan on circular/DTR contigs  03-11-21---09:27:56
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
 Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits  03-11-21---09:27:57
 Combining tbl files from all search results AND fix overlapping ORF module
No ITR contigs with minimum hallmark genes found.
Annotating linear contigs
time update: running BLASTX, annotate linear contigs  03-11-21---09:27:57
time update: running Prodigal, annotate linear contigs  03-11-21---09:31:18
time update: running hmmscan1, annotating linear contigs  03-11-21---09:31:20
time update: running hmmscan2, annotating linear contigs  03-11-21---09:31:22

Been sitting at this point for 2 hours, with no tasks being executed (as far as I can guess, from htop)

from cenote-taker2.

mtisza1 avatar mtisza1 commented on July 28, 2024

Hi Darren,

Thanks for reaching out, and I'm sorry that it's hanging on you. I'm working to figure out what's happening. Just to be sure that it's not an issue with your input fasta files (weird headers?), can you run the test contigs that are provided with the repo (e.g. testcontigs_DNA_ct2.fasta)? In the meantime I'll try to replicate this error.

Mike

from cenote-taker2.

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Hi! Thanks for getting back to me so fast!

I'm hoping that cenote-taker2 will revolutionize my workflow (or perhaps just replace a post-doc)

My input is Trinity output from a few years ago ... [my understanding is that fasta makes no stipulation except that names start with a ">" followed by any characters at all, then a newline before sequence, and sequence continues until the next '>' ]

The test file turns up a new error, suggesting a library problem. I'm using the supplied conda environment on a pretty clean new Linux install (scientific linux, a redhat derivative).

I recently rean into this in another context -
merenlab/anvio#1479

when I was trying to set up a conda environment for the newest Trinity and Samtools, and it took an age to resolve - possibly because of a version conflict?

time update: running IRF for ITRs in non-circular contigs 03-11-21---14:07:15
time update: running prodigal on linear contigs  03-11-21---14:07:15
time update: running linear contigs with hmmscan against virus hallmark gene database: standard  03-11-21---14:07:17
time update: Calling ORFs for circular/DTR sequences with prodigal  03-11-21---14:07:20
time update: running hmmscan on circular/DTR contigs  03-11-21---14:07:20
Annotating DTR contigs
Traceback (most recent call last):
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/bin/circlator", line 57, in <module>
    exec('import circlator.tasks.' + task)
  File "<string>", line 1, in <module>
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/__init__.py", line 26, in <module>
    from circlator import *
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/bamfilter.py", line 2, in <module>
    import pysam
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/pysam/__init__.py", line 5, in <module>
    from pysam.libchtslib import *
ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/bin/circlator", line 57, in <module>
    exec('import circlator.tasks.' + task)
  File "<string>", line 1, in <module>
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/__init__.py", line 26, in <module>
    from circlator import *
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/bamfilter.py", line 2, in <module>
    import pysam
  File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/pysam/__init__.py", line 5, in <module>
    from pysam.libchtslib import *
ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
 Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits  03-11-21---14:07:24
 Combining tbl files from all search results AND fix overlapping ORF module
No ITR contigs with minimum hallmark genes found.
Annotating linear contigs
time update: running BLASTX, annotate linear contigs  03-11-21---14:07:24
time update: running PHANOTATE, annotate linear contigs  03-11-21---14:07:52
time update: running Prodigal, annotate linear contigs  03-11-21---14:07:56
time update: running hmmscan1, annotating linear contigs  03-11-21---14:07:57
time update: running hmmscan2, annotating linear contigs  03-11-21---14:07:57
time update: running BLASTN, linear contigs  03-11-21---14:08:00
Internal3.blastn.out not found
Internal4.fna  is closely related to a virus that has already been deposited in GenBank nt.
time update: running RPSBLAST, annotating linear contigs  03-11-21---14:11:47
/data/home/dobbard/scratch/test_cenote/Internal/no_end_contigs_with_viral_domain/COMBINED_RESULTS.rotate.AA.rpsblast.out
time update: running tRNAscan-SE  03-11-21---14:12:04
 Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits  03-11-21---14:12:05
/data/home/dobbard/scratch/test_cenote/Internal/no_end_contigs_with_viral_domain/Internal.rotate.out_all.hhr
 Combining tbl files from all search results AND fix overlapping ORF module, linear contigs
finalizing taxonomy for linear contigs
time update: finished annotating linear contigs  03-11-21---14:12:27
time update: running tbl2asn  03-11-21---14:12:28
[tbl2asn] This copy of tbl2asn is more than a year old.  Please download the current version.
[tbl2asn] Flatfile Internal3

[tbl2asn] Validating Internal3

[tbl2asn] Flatfile Internal4

[tbl2asn] Validating Internal4

Making gtf tables from final feature tables
removing ancillary files

time update: Finishing  03-11-21---14:12:28
Virus prediction summary:
4 virus contigs were detected/predicted. 2 contigs had DTRs/circularity. 0 contigs had ITRs. 2 were linear/had no end features
grep: DTR_contigs_with_viral_domain/DTR_seqs_for_phanotate.txt: No such file or directory
grep: DTR_contigs_with_viral_domain/DTR_seqs_for_phanotate.txt: No such file or directory
output directory: Internal
 >>>>>>CENOTE-TAKER 2 HAS FINISHED TAKING CENOTES<<<<<<

from cenote-taker2.

mtisza1 avatar mtisza1 commented on July 28, 2024

Hmmm. OK, based on the Anvio issue you referenced, maybe something is bugging out with circlator, and you can try to reinstall it like this?

conda install -c bioconda circlator=1.5.5 --force-reinstall

I sometimes regret having so many packages installed with Cenote-Taker 2 because if one of them breaks, the whole thing breaks. But I also didn't want to reinvent the wheel...

On the other hand, it seems like the error regarding "line 547: s/#/ /g" is no longer occurring with the provided test contigs, making me believe that Cenote-Taker 2 was mishandling the fasta header from your original runs. Could you do me a big favor and send some of the fasta headers from these files:

grep ">" LongWebster.fasta | head

grep ">" LongWebster.over_1000nt.fasta | head

from cenote-taker2.

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Without fixing the libcrypto.so.1.0.0 problem, I have cleaned up my sequence titles (no funny characters at all!) and it hangs in the same place as before.

It seems to die during

time update: running hmmscan2, annotating linear contigs  03-11-21---14:30:43

And this seems to be the last sequence it was looking at when it stops:

>CleanWebster15 TR29739c0_g2_i1_len5765
CAGAGCTAGATTTTATTGCGGTACAATATTATTATCACGAATGTTTAAACAAGATTTACAATTTGAAGAAAATGGAATTAAACCCTATGTGTGTGTTAGAAATGGAGGAAATCGAGCATGTTGCCGATTTTACCCCTTTTCATATAGAGAATACGCCACATTTGAGTATAAATTCCCTTTTGACCCGGAGGTTGAAGAGGAAAATGAAGAAGTGGTATCAACGAACTATTTTGCCATGTTGGCAGAATTTGTCTTGGGTATCAGTTATTTTTGTGTGCTTTATCAGCTACTTACTATATCGTCGTGGAAGAAGATTT
ATAGTGGGTATGCCCAGTGCGCAAGCAAAGAGAGATGTTACAAACTCGTGGTCGCATCTGAGAAAGGAGTTTTAGAAGTAGAAGTGAAGGGATATCATAAGCGTACTATTAAGCAATTTGCTAATTATATGGCTGTGGTAATTTTGAAGGAATATTTGACTAAAGAACAGGTACAGCAAATGTTATTTTATTATTCTAATATATTTGCATATGATGATGATATTTGTGAAGTGCAGGCAGAAAATTCTCACCCGAAAGAATCGGTTCAGGGTGAGGAAGTTTTGACAGGTACAAAACATAGTAATACTATTTTAACT
AATAGTACAGGAGATACAGAGAGTATACCTCTAGCAATTAGAGATGATACTTTGAATTACGCCTCGAGCGAAGCCTTACATCAATTTGATAGTTTAACTGATAGATGGATGCCGTTAGAAACAATAACAGTTACTACATCACAGATTTCTGGTACACTATTAAAGGAATGGTATTTACCATATGATTTGTTGCAATCTCATATTATAAATCCGAGTTTAGCTCCATTTATGCTATTTCGCTACGGTGCTTTATCAATAGAGATGAAATTTGTAGTGAACGCTCACAAATTTCAATCCGGTAAAGCCTTAGCGAGCAT
TAAGTATGATCCAGTCGGTTTAACAGATTTTGGTGATTCATTACCTACATGTTTGCAACGAGAGCACGTGATGTTAGACTTATCTACTAATAATCAAGGAACATTGCAAATTCCTTTTATTTACCATCGTTCGTTCTTGCATTTAAATTTGCAGCAAGGTACAGATCAAACCATGGTACCATCCACATATGCTAGAGTACAGTTACACATCCTGGCCAATTTATTAACAGGAACTAATCAAGCAGTTAGCATGAACATCCGTCCTTATTATCGCTTCTCGAAAGCTTCATTTGCTGGAATGGAAGCAGTTCATACTG
TCCAGATGGATGTGGATGCAGTTGTAAAGGGATTAATACCAACAAAATCATTGAAAGCGGTGTTAGTTGGCGCAGAGGCTCTTATAGATCAATTAGGGAAGACTTGCAACCAGGACAAGCCTACAATTACTTCTTCCACTCAAATTGTTCCGAAACCCCGCAGTCAGTTTGCATCAGGAAAGGGGATTTTCGGAGGAACAGTTCTGAGATTAAATCCGCAGGTAATCACGTCTGCAGTTGAAGTGAAACAATCATCACGTACCCCTAGAACTGTACTGGATATAGCTAGAGTATGGGGATTGAAGAAAATTATGACG
TGGACTACGAATGCTAAACCAGATGAGCACCTTGATGATATTGTGGTTGATTTGCACCATAATTTTAAAGGGGGTAATGATCGTATTGAAGCAAATATATTGACTCCAGTTGAATATATAGCGTCTTTATATGGATTTTGGTCAGGGACATTAGAATGTAGGTTGGACTTTATATCCAATCAATTTCACACTGGTGCTATTATGATCAGTATACAAGTATCAAATCAAGAGACAAAATTTCAAAAGGCGGCTTGTGTATATACTAAAATTTTCCATTTGGGGGGTCAGAAAAGCGTCACATTCACCATTCCTTATAT
ATACGATACTATATGGCGTCGTAACACAGCTCAAATATTTACACCTTACACGTTTGAGCAAGATAATAAACTCCCTGTAGATCATATATTTACACTCGGTACGAATGATTTTATGAGAATCCAATTTTATGTTGTTAATGAATTACGAGCTCCAGATACAGTAGCGAATGTAGTTCAAATATTAGCTTATGTACGTGCGGGGACTAGTTTTATGTTACATTCTTTAAAACCGTCGCATTTGGAAGTTATACAGGACATAGCTCTTTTTAGAGACATACCTATGTTTAATGTACCTCATTTGGCACCTAAATCTTATA
TAACTAAGTCTGAGGAAAAACACATCAAGTTAACGAAAGAACTAACACTGGAGTATAAAGAAATCAAGTTTCAGATGGAAGGCTCCTTAGCTGAGAATCCAGATGAAACTCCTGATTTTAGTGCGGGTTTGAATGCTTTGCATATACAAACTTTAGATTCTCAAGTTAATATAAAGGATATTTTAAGGCGTCCTATACAGTTAACAAAAGCTATATCTTTTAGTAATACTGAAATAAAGAATCATGTATCTCTTTTTATCCCTTTAATGGTCCCATCTCATAATATGGTATATTCGGATAGTTATGAAACCATATAT
GCGGATGGAGTTTCCCTTACACCAACCGCTATGCTAATGAATTTATTTCGTTTTTGGCGAGGTAGTATGCGTTTTACCTTTGTTGTAAACGATAATGTATCCAAGAATTGTACACATTGGATAACTCACATGCCCCATTCGGGAGTTCGGAAAATTGGAAAGATTGAATTTCCAAAAGGTCCGAGTTTAGTTGGATCATCATTTGCTAGTGTCCCACTAGTCGCCAACATCAACGCGACGGAATGTGTCGAGGTACCCTATGATACGGAATTAAACTGGACGCTGTGTCATTCAGCTCGAAATAACCAAATCTTATC
AGTAAGAGATCAAACAGATACTAATGCAGGACATATAGTATTTACACCATCTGGTACATGTGATGTTACAGTGTGGTGGGAAGCTGGGGACGATTTTGAATATGAGAATTTCTTAGGAGTTCCGGCTACCATCACACGGGATCGTTTGCACGGTGTATACGAAACGGAAATTAAATTCCAAGCAGAAACATCAATGTATTCCAAAACCCTTGCGAAAGTGAATACTATAATAAATTTGCCAGAGCAGATAGCAGATACATTAACGAATGCTAATAATGTTGGTGACGCTATTATAGCGAGTTCTACGAAAGCAGAAA
AATTATTAGTCAAAGGGTTAGAAGTGTGCGAGAATGCATCAGCTATGTTAGATAATATTTCTCCTTTGATGGAATCTTTAGAGGAAAAAATTCGGGAATCCTTAAAATCATTTCCTGGAAGTATTTATAATTCTACAATGTTTATTCAAAATGGGGTTGAAATTATAATGGATTTAGTTGTCGCTTGGTTATCTGAATCGTGGGCCGTACTTGGTAATATTTTCGTCAAAGCTATAGCACGGTTGCTGGGATTTAGTGCCATACAAACTATTTTGAAGTACGGTTCCCAAATAGCCGCTGCTATTCGTAATCTGGTG
AACCCACAAATAGTAGTTCAGGCTCCATCGCAAAATGTCACATTATTGGGAGTATTATGTGGTTTAGTAGGTACAGTAGTGGGTGTATCTCTGGAAACCCAAAATTATTCTAAGTTTATTTATAAATTGTCTGAAAGATTTGTGACAACTGGGGGTATAGCTTATCTTAATCAAGTCTTACGGTTTGTGCAGAGTACCTTTGAAGTTATTCGTGACTTGGTGATGGATGCCCTTGGTTACGCTGATCCTAATGTAAAGGCTTTACAGATGCTCAGTAAAGATACAGGTGTAATTAGCACATTTGTAAAGGAGGCTAA
TGTCATATTAAGTGAAGCGAACGCCTCATTATTGTCAGATCCCGGTTTTCGTAAACGTTTTTGGTACACTGTGTCTCAGGCATACCAAATTCAATCAATTCTAGCCGTGAGTCCTGCGAATGTAGTTTCACCCATTGTGACTCGTTTATGTACCGATGTCATAAAAGCATCGAGTGAAAAGTTCATGGACTTATCGTGTAGTCCTTGTCGCTACGAACCATTTGTGATTTGTATAGAGGGTGAACCTGGTATAGGAAAATCTTTTATGACAGAGACCATGGTTTCCGAATTGCTTGGATCAATTGGTTTCGATCGTC
CATCCAGTGGCTTAATTTACACTCGGCCTCCTGGAGCACGATTCTGGTCAGGATATAAAAATCAGCCTGTAGTTGTTTATGATGATTGGATGAATTTGAACGATTCAGACCAAATACTGAGTCAGTTAAGTGAATTGTACCAGATGAAATCAACTAGTGATTTCATTCCAGAAATGGCTCACTTAGAAGAAAAGAAAATCAAAGCGAACCCTTTAATTGTCGTGCTATTGTGTAATGGTGCATTCCCCTCGTGTATAGGTCAAAAAGCGATTTATCCTGATGCTATTTTCAGACGTCGAGACTTAGTTTTGCGAGCC
TCTCTGAAGGAAGAATGGGTAGGAAAAGATTTACGCGACCTAACTGATAGTGAATCAGCTGAGTGTGGACATCTATTGTTTCAACGATATACTAGTGCGAAAATTGAGAATAGTTTAACCACAGCTCAAAAGACCTGGTCTGAAGTAAAACCTTGGTTGTGTGCCACATATAAACGCTACCACCAACAAGAAACACTTTTAGTACGTAAAAGAATTAAAAAGTTTCAAACTCAGATGCGTTTAAATAGTGAGAATTATCTAGACTATTCAGATCCTTTTTCTCTATTCTACACTAGCACCATTGATGTTATGGAAGA
CTCTGAGTGTAATCCTAATGGGTGGTTACCTAGTGAACAATTGGAGGCAGCTGTGTTGAGAGTTGTTGATATAATAAAGGAGAAGAAGGACGAAGTATTGGAATTTCATATAGATTCTAAACCTGAAAACGTCTTTCAGGGCTTTCCGGTGGGATGGGAAGATCTATCAATGAGCTTAACTAGTGGTATACTTTTTAGTGGAGGTGTTATGGCGCAAGTTTTAGACTGGACCGCTCAGGGTATAGGAGCTTTCATGAAACCACTATTAGAAAGTACGGGTCAGAGTATAGAACACGAGTGTATGACATGTCTTGAGC
AAATGCCCTGTTACTACGTATGTGGAGGTGTGCGTTCCCACTCTAACCCCAAAGCTCATCATTACATGTGCATGGATTGTATGATTCGCATGAAGCGAGCTAATATGGGTTCTCACTGTCCCATGTGTCGTGTAGAGCCTATGCTAGCTTGTTTACCTAAACATCTAACTCGCTTGTATATAGTGTTACGTTGGGCGTTGGTTAATGTTAGTGATAGATTAGTATGGATTTTTGCATTCTTTAGGGATTTTCTCCGTTCAAGGTCTATGGTAAATTCACGCTTATTATTATCTACCCTGGCATCATTAACTGCATTC
TTACAGGGCGATGGTATTACAACTACCATTGCTGCTTCATATGTAGGGGCAAGTGTGGTAGATGCTATATATGATCCAGAATTATTTACTAATGTAGCACAATCCTGGATATTTAACCCCTTGGATATGTTAGTTCCTTCAGAAGAATATTACACGCCTCCTTCGGAAATAATAAACGCTAGCGTGCAATGCATGCAGTTTGAAAGTCTTGGGCAGAGAGAGGTTGGTTGTAGCAACCTTGAGCCGGAGAAAGATTCATGGGATGTACTTACTCCTAAAGAAGAGGCTATACTTCGTTGTGAACGCAATAAGAACAA
AATGGATACTGCCTTAGTTATAAACAAAGCAGAACTCGAAAATATTCGAAAGAAGCGGG

after successfully writing a blank file called "CleanWebster15.all_called_hmmscans.txt"

I'm trying one on this sequence alone ....

from cenote-taker2.

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Part I - sequence names

The old-style Trinity headers had a nasty '|' , but also '=' and '[' and ']' and ' '

TR29739|c0_g2_i1 len=5765 path=[11551:0-1439 11555:1440-3480 11548:3481-5764] [-1, 11551, 11555, 11548, -2]

but I've cleaned this to

TR29739c0_g2_i1_len5765

Run on its own, the sequence above is OK, so maybe that wasn't the cause ...

Part II, Circulator

looks promising:

Solving environment: done

## Package Plan ##

  environment location: /data/home/dobbard/miniconda3/envs/cenote-taker2_env

  added / updated specs:
    - circlator=1.5.5


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         143 KB

The following packages will be UPDATED:

  certifi            pkgs/main::certifi-2020.12.5-py36h06a~ --> conda-forge::certifi-2020.12.5-py36h5fab9bb_1

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2021.1.19-~ --> conda-forge::ca-certificates-2020.12.5-ha878542_0

but no,

ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

from cenote-taker2.

mtisza1 avatar mtisza1 commented on July 28, 2024

OK, I believe I've figured out at least one issue. Thank you for bearing with me here.

The circlator issue may actually be a pysam issue per: this issue

Can you check your pysam version (should be 0.15.3) and update if necessary

$ conda list | grep "pysam"
pysam                     0.15.3           py36hda2845c_1    bioconda

conda install -c conda-forge -c bioconda pysam==0.15.3

The other issue may have to do with a problem on my end that I've possibly fixed. The trinity headers were not the issue. You've got RNA virus contig(s) where the whole contig is covered by an ORF that may not have a start and stop codon. I had incorrectly coded prodigal to use -c for closed genomes for these step, requiring start/stop codons. The program is expecting at least 1 ORF, and it's not there due to this setting. I should have tested these types of contigs before releasing the update! If you do cd Cenote-Taker2 then git pull. I think this should fix it. If you forgo the blastn step when you test this, you should get quicker results.

Let me know if this helps.

from cenote-taker2.

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Hi! Fantastic, thank you.

The pysam was indeed the issue, and the test file now runs happily!

My own trial dataset (with the long ORF that lacks a start of stop, and the nasty headers) now runs to completion!

But there are still some things that worry me ...:

This still happens:

/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory

And when running blastn, what do lines like this imply?

MediumWebster1462.blastn.out not found

Is it just a virus / phage not in nt?

Then I get some hits that report like this:

cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Eremoneura; Cyclorrhapha; Schizophora; Acalyptratae; Ephydroi$
ea; Drosophilidae; Drosophilinae; Drosophilini; Drosophila; Sophophora; melanogaster group
; PREDICTED: Drosophila bipectinata twitchin (LOC108134366), transcript variant X2, mRNA
; PREDICTED: Drosophila bipectinata twitchin (LOC108134366), transcript variant X1, mRNA
; PREDICTED: Drosophila ananassae twitchin (LOC6501771), transcript variant X6, mRNA

What's the cause of this?

Then at the end I get a lot of this:

Virus prediction summary:
50 virus contigs were detected/predicted. 0 contigs had DTRs/circularity. 0 contigs had ITRs. 50 were linear/had no end features
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory

What does this indicate?

Thanks!

Darren

from cenote-taker2.

mtisza1 avatar mtisza1 commented on July 28, 2024

Darren, I again thank you for raising these issues, and I apologize that my testing wasn't as thorough as I thought. please do git pull again. Everything should be fixed and I have 2 questions for you.

I fixed the error with this s/#/ /g

As you thought, MediumWebster1462.blastn.out not found implies that it doesn't have a strong BLASTN hit in your database. I changed the message to say sequence.blastn.out not found, no close BLASTN hits for this sequence.

Regard the blast reports, you have the phylogeny of the top hit on the first line, then the description of the top 3 hits. The description of the top hit is also in the note in the ".gbf" and ".fsa" files in the sequin_and_genome_maps directory. I don't really know exactly what users want to do with BLASTN info. What are your thoughts? Should it inform taxonomy in the output?

I also fixed the error with grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt

My other question is, I know your lab has found some interesting segmented RNA viruses. You could of course use Cenote Taker 2 with -am True on a multifasta of segments from the same virus, but it might be confusing to have a separate ".gbf" for each output. I haven't looked into generating combined outputs for segmented viruses. I could possible add this feature if you have some insight into the formatting, etc.

from cenote-taker2.

DarrenObbard avatar DarrenObbard commented on July 28, 2024

Hi!

Thank you for the pipeline! I have played around with several virus finders, and I have never previously found one that I thought worked well enough to use. I'm thinking we might start to use this routinely - so you're going to have to keep maintaining it!

Regard the blast reports, you have the phylogeny of the top hit on the first line, then the description of the top 3 hits. The description of the top hit is also in the note in the ".gbf" and ".fsa" files in the sequin_and_genome_maps directory. I don't really know exactly what users want to do with BLASTN info. What are your thoughts? Should it inform taxonomy in the output?

So, as you might imagine, I have some opinions to share! I think this blastn screen (I'm using nt at the moment) is really useful, but I think you should make more use of it for the taxonomy. It looks like your taxonomy might be based on refseq? For viruses refseq is always so out of date as to relatively little use for spotting 'known' viruses.

I think that, where the blastn is currently reported, it could be done more cleanly- purely as taxonomic information. So, leaving out the gene/segment etc etc and just report the top hit with "Sequence identity 98% to ". This would be a really clear sign that the user might consider it a previously reported virus, or not (they can choose the threshold). I think this should be in the all the outputs it can be, including the overall summary table. In fact, if you have a 90% plus blastn hit over the whole length, I would replace any proposed taxonomy based on more sophisticated approaches.

Even better than the HSP identity would be a quick pairwise alignment between the new contig and its top blastn hit, and report the overall sequence identity for the shared length.

My other question is, I know your lab has found some interesting segmented RNA viruses. You could of course use Cenote Taker 2 with -am True on a multifasta of segments from the same virus, but it might be confusing to have a separate ".gbf" for each output. I haven't looked into generating combined outputs for segmented viruses. I could possible add this feature if you have some insight into the formatting, etc.

I think this would be great! I think genbank file could literally just be concatenated, as could gtf files to go with fsa files. I don't know if its too ugly, but folders could be created to hold the un-concatenated versions - then the concatenated file names could match the folders

I have a number of other questions / suggestions. Would you like them here, or by email?

from cenote-taker2.

mtisza1 avatar mtisza1 commented on July 28, 2024

Thanks for the feedback. Let's discuss further by email, and I'll make sure to include any changes that get made into the change log for the next update.
[email protected]

from cenote-taker2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.