hyattpd / prodigal Goto Github PK
View Code? Open in Web Editor NEWProdigal Gene Prediction Software
License: GNU General Public License v3.0
Prodigal Gene Prediction Software
License: GNU General Public License v3.0
Would prodigal malfunction with eukaryotic genomes?
The nucleotide count for my fasta file is larger than 32000000 bp long.
Is there a better method than splitting the fasta?
Cheers.
Hello,
Right now, I am trying to predict viral ORFs from the metagenomic viral contigs using Prodigal. However, as you know, Prodigal does not know viral genome specific rules, so I would like to estimate viral genes using information I trained other similar viral metagenomic contigs in advance. However, Prodigal does not seem to accept the meta mode with training files. Can I predict the ORFs using the normal mode?
I donwnloaded the installation file for mac but there is no extension, and the computer doesn't know what to do.
As far as I understand there is no specific lower length cut-off for gene calling with prodigal, is that correct?
But you penalize genes shorter than 250 bp, as far as I can see in the code. Based on this method, what are the shortest genes which get called in general? How did you get to the specific length of 250 bp?
Hi, I'm trying to run Prodigal on some assembled genomes and I get this message:
Error: Sequence must be 20000 characters (only 13949 read). (Consider running with the -p meta option or finding more contigs from the same genome.)
Can you please elucidate what it means? I'm using complete whole genomes, wouldn't it be unadvisable to use the meta parameter?
Thank you for any assistance you can provide.
V
Doug
Sean Jackman has alerted me to your git version of prodigal (aka 2.70) breaking backward compatibility of command line options with the long standing 2.60 version:
Prodigal is very popular (fast and reliable) and it is not customary to break interfaces for minor versions.
As I can see from your notes that most of this is in preparation for 3.00. Are you planning any official/stable releases for 2.70?
Torsten
Hi. I'm annotating a plant plastid using Prodigal, and I found it was missing a number of short but real genes ranging in size from 90 bp to 189 bp. I found that reducing the short CDS penalty size from 250 bp to 100 bp helped rescue 6 of 9 missing short genes. Could the 250 bp size please be made a command line parameter?
See…
Hi,
I am using Prodigal_v2.6.2 in meta version.
In my output files there are no 'partial' tags, why is that?
Best, Johanna
prodigal -p meta -a $OUTDIR/soil.contigs.genes.faa -d $OUTDIR/soil.contigs.genes.fna -f gff -o $OUTDIR/soil.contigs.genes.gff -i $INDIR/soil.contigs.fa
Hi,
I've noticed that the performance of Prodigal is greatly impacted by the word wrap length of the input FASTA file. An input file without any word wrapping takes almost double the time of one wrapped under 1000 characters.
I've observed this on multiple genomes, however, these results are from GCF_000754115.1_ASM75411v1 with n=20
replicates at word wrap lengths of: [1, 5, 10, 20, 40, 80, 150, 300, 500, 1000, 3000, 5000, 10000, 15000, 20000, 50000, 100000, 200000, 300000, 400000, 500000]
. Input was read from stdin.
I'd like to suggest forcefully word-wrapping user input, or adding a note to the documentation that the input should be strictly wrapped at 80 characters.
Thanks,
Aaron
The Mac download for 2.6.3 does not have an extension that is easily discernible. I don't know what program to use to open it.
Hi,
I noticed that the prodigal website is not available. Is the service still available, or does this indicate that prodigal should no longer be used? Thank you!
Hi,
I have a segmentation fault issue with some genomes as in here : https://code.google.com/p/prodigal/issues/detail?id=4
version used : Prodigal V2.6.1: July, 2013
Downloaded from : https://github.com/hyattpd/Prodigal/releases/download/v2.6.1/prodigal.macosx
computer used: Mac Pro mid 2012 with OSX 10.8.5
Download fasta file from:
rename files KJ484629.fst and KC340960.fst respectively
run the following command:
prodigal -i KJ484629.fst
it outputs:
-------------------------------------
PRODIGAL v2.6.1 [July, 2013]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------
Request: Single Genome, Phase: Training
Reading in the sequence(s) to train...112671 bp seq created, 50.70 pct GC
Locating all potential starts and stops...5798 nodes
Looking for GC bias in different frames...frame bias scores: 1.73 0.34 0.93
Building initial set of genes to train from...done!
Creating coding model and scoring nodes...done!
Examining upstream regions and training starts...Segmentation fault: 11
Same for KC340960.fst
NB:
- Both sequences are composed of ATGC characters exclusively
- when running the following command, it works:
prodigal -p meta -i KJ484629.fst
Is there something I can do to make it work ? (given that it works for most of the genome I'm using, and some are about the same size)
Or is this a known bug that should be fixed ?
Thanks
It is not a critical issue but some parts of the code, such as argument parsing in main.c, may be unsafe due to the usage of strcmp and atoi. Also, strcpy during memory allocation. Alternatives already exist and, in my opinion, should be used, instead.
D.
I'm using translation table 11 to annotate a plant plastid genome. Is it possible to limit the possible start codons to only AUG?
Vertebrate mitochondrial genomes use the genetic code 2. See https://en.wikipedia.org/wiki/Vertebrate_mitochondrial_code
They're also only 16 kbp, so less than Prodigal's threshold of 20 kbp, so must be run in -p meta
mode. -p meta
mode uses a database that include genetic codes 4 and 11, but not 2. See also the related issues #19 and #22.
Is there any workaround to use genetic code 2 on small sequences? One thought is that I could download a bunch of vertebrate mitochondrial genomes, add them to my FASTA file, and annotate the whole thing.
After the binning of assembled contigs file, I get a bin set. I want to make gene prediction on each bin.
However, I'm confused by the option -p single or meta
when I add it to command. It seems right to use -p single
, because each bin could be thought as draft genome. But on the other hand, each bin includes many contigs which were assembled from metagenomic reads, so it also seems right to to use -p meta
.
Which parameter should I use for each bin?
Thanks in advance!
This might actually be behaviour 'as designed' but I noticed a gene gets qualified as 'partial' even if the full sequence is there when it starts at the edge of a contig. I noticed this after searching for dnaA and rearranging a circular chromosome to start at dnaA.
Maybe a flag for circularity would help such that prodigal would know that the promoter of the first gene might be at the 'end' of given sequence
prediction on the raw contig: dnaA sits somewhere in the middle:
unitig_0_quiver Prodigal_v2.6.2 CDS 872920 874320 357.5 + 0 ID=1_808;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=357.50;cscore=357.42;sscore=0.08;rscore=-3.89;uscore=0.47;tscore=4.15;
prediction on the rearranged contig where dnaA starts at first nucleotide.
Chrom1 Prodigal_v2.6.2 CDS 1 1401 360.8 + 0 ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=360.75;cscore=357.54;sscore=3.22;rscore=0.00;uscore=3.22;tscore=0.00;
Both predictions contain the exact same sequence (of course)
In the output FASTA file from prodigal, one entry has 122 million null characters inserted into the middle of the called gene:
[('\x00', 122748928), ('V', 7), ('A', 6), ('L', 5), ('S', 5)]
ENVKHIWGRAQWLMPVIPALWEAKAGGSPEVR\x00...\x00ASSLWGLALVSYVEMNESHTPVCV
I wanted to take prodigal out for a spin, just to see what it can do. This is what I typed:
$ wget https://github.com/hyattpd/Prodigal/releases/download/v2.6.2/prodigal.linux
$ chmod +x prodigal.linux
$ ./prodigal.linux -help
./prodigal.linux: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./prodigal.linux)
Hello,
I'm trying to download prodigal from github here: https://github.com/hyattpd/Prodigal/releases
The Mac and Linux release links throw back Amazon AWS errors. I'm happy to build from source, but getting the binaries would be much more convenient for me.
Thanks!
Is it possible to do something like- (no.of protein-coding genes/no.of contigs in a metagenome)*100 = to get the percent of genes predicted in a metagenome?
Would this help make sense of how many genes were predicted in total, and compare it among samples?
The website link in the repo description is broken: http://prodigal.ornl.gov
Hi,
It's a feature proposition. (not a heavy one)
For a replicon named myreplicon
we'll get proteins named:
myreplicon_1 # xxx # yyy # 1 # ID=1_1
myreplicon_2 # xxx # yyy # 1 # ID=1_2
...
myreplicon_153 # xxx # yyy # 1 # ID=1_153
myreplicon_154 # xxx # yyy # 1 # ID=1_154
Is it possible to format the output as follows:
myreplicon_001 # xxx # yyy # 1 # ID=1_001
myreplicon_002 # xxx # yyy # 1 # ID=1_002
...
myreplicon_153 # xxx # yyy # 1 # ID=1_153
myreplicon_154 # xxx # yyy # 1 # ID=1_154
So that when doing some analysis afterwards we can sort on the protein name ?
Currently we'd have after sorting:
myreplicon_1
myreplicon_153
myreplicon_154
myreplicon_2
You could either number with as many 0s depending on the number of protein annotated. You can alternatively assume that you'll never get replicon with more than eg. 999,999 proteins and number them like _000001
or _000154
. I'll tend to favor the second method to have homogeneous format across different replicons.
Thanks
Hi,
I use bioconda to keep my packages up to date. Prodigal is still showing version 2.6.0. Can you update please ?
Thanks,
david
I noticed that when you have a partial gene and the beginning of that gene is unknown the first amino acid is translated to a methionine. I noticed this as the EMBL validator started complaining. Now I am wondering wether this is biology or a bug?
Plant mitochondrial genomes use the standard genetic code 1, mostly. There's two exceptions.
Is there an option to specify a list of start codons?
I'm trying to use translation table 25 (-g 25). However, prodigal (2.6.1) says: "Invalid translation table specified.".
Doug
I wanted to check if you are developing code to handle metagenomic samples where there are a mixture of organisms with different translation tables?
Torsten
Hi, I am writing unit tests for a script which called Prodigal. I need to compare the contents of the output files of different runs (using the same input data). However, I can't do this since the contents of the outputs differ per execution of Prodigal (same input data). How can I rectify this issue?
I get the following message when compiling from the git repo on Debian 9 (Stretch).
$ gcc --version
gcc (Debian 6.3.0-18) 6.3.0 20170516
$ make install
gcc -pedantic -Wall -O3 -c -o bitmap.o bitmap.c
gcc -pedantic -Wall -O3 -c -o dprog.o dprog.c
gcc -pedantic -Wall -O3 -c -o gene.o gene.c
gcc -pedantic -Wall -O3 -c -o main.o main.c
gcc -pedantic -Wall -O3 -c -o metagenomic.o metagenomic.c
gcc -pedantic -Wall -O3 -c -o node.o node.c
gcc -pedantic -Wall -O3 -c -o sequence.o sequence.c
sequence.c: In function ‘shine_dalgarno_mm’:
sequence.c:766:27: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
if(match[k] < 0.0 && (k <= j+1 || k >= j+i-2)) cur_ctr -= 10.0;
~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -pedantic -Wall -O3 -c -o training.o training.c
gcc -pedantic -Wall -O3 -o prodigal bitmap.o dprog.o gene.o main.o metagenomic.o node.o sequence.o training.o -lm
install -d -m 0755 /usr/local/bin
Everything seems to work afterwards, but I thought you might want to check that.
Cheers,
Nils
Hi, I've posted about this issue before and I'm hoping someone can help clarify this recurring issue.
My team modified node.h in order to better support peptide prediction (we simply redefined the MIN_EDGE_GENE and MIN_GENE variable), please see below the diff output to see details, with our changes at the bottom.
#define MIN_GENE 30
#define MIN_EDGE_GENE 30
When running the following command on the fasta file attached (please make note that github does not support the .fasta extension, so I uploaded this file as a .txt file)
problem_chunk.04.fasta.txt
, we run into this "invalid pointer" error (screenshot attached). It is important to note that 1) this error only occurs when we run our modified node.h, and it does work when we use the original,unmodified codebase and 2) the fasta file attached is only a chunk of a larger file, but we were able to identify that this particular chunk (file attached) is the cause of the error. Output files are created, and I have attached them below.
command: /path/to/modifiedprodigal -i problem_chunk.04.fasta -a 4.tmp.faa -d 4.tmp.fna -o 4.tmp.gff -f gff -p meta -c -m -q
Please let me know if anyone has anything that can help us verify the root of this issue. Thanks in advance.
I'm trying to run Prodigal on a very large genome and I keep getting this error. Is there a way to run the algorithm on very large contigs?
GCF_000470005.1_KK113_genomic.fna.gz
Hi, I am trying to run prodigal on the file attached below (GCF_000470005.1_KK113_genomic.fna.gz) and I ran into the error (screenshot attached below). Could you please clarify the issue behind this error? I've seen multiple occasions of this error now and I cannot understand what is going on.
Good Evening,
I tried to install prodigal using the steps described within the wiki, and I had this error:
==> Tapping hyattpd/prodigal
Cloning into '/usr/local/Homebrew/Library/Taps/hyattpd/homebrew-prodigal'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), done.
Error: Invalid formula: /usr/local/Homebrew/Library/Taps/hyattpd/homebrew-prodigal/prodigal.rb
prodigal: undefined method `sha1' for #<Class:0x00007ff0399f9e18>
Error: Cannot tap hyattpd/prodigal: invalid syntax in tap!
Here is my output to brew doctor in case this helps give away what the problem might be:
Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry or file an issue; just ignore this. Thanks!
Warning: Unbrewed dylibs were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.
Unexpected dylibs:
/usr/local/lib/libtcl8.6.dylib
/usr/local/lib/libtk8.6.dylib
Warning: Unbrewed header files were found in /usr/local/include.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.
Unexpected header files:
/usr/local/include/fakemysql.h
/usr/local/include/fakepq.h
/usr/local/include/fakesql.h
/usr/local/include/itcl.h
/usr/local/include/itcl2TclOO.h
/usr/local/include/itclDecls.h
/usr/local/include/itclInt.h
/usr/local/include/itclIntDecls.h
/usr/local/include/itclMigrate2TclCore.h
/usr/local/include/itclTclIntStubsFcn.h
/usr/local/include/mysqlStubs.h
/usr/local/include/odbcStubs.h
/usr/local/include/pqStubs.h
/usr/local/include/tcl.h
/usr/local/include/tclDecls.h
/usr/local/include/tclOO.h
/usr/local/include/tclOODecls.h
/usr/local/include/tclPlatDecls.h
/usr/local/include/tclThread.h
/usr/local/include/tclTomMath.h
/usr/local/include/tclTomMathDecls.h
/usr/local/include/tdbc.h
/usr/local/include/tdbcDecls.h
/usr/local/include/tdbcInt.h
/usr/local/include/tk.h
/usr/local/include/tkDecls.h
/usr/local/include/tkPlatDecls.h
Warning: Unbrewed .pc files were found in /usr/local/lib/pkgconfig.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.
Unexpected .pc files:
/usr/local/lib/pkgconfig/tcl.pc
/usr/local/lib/pkgconfig/tk.pc
Warning: Unbrewed static libraries were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.
Unexpected static libraries:
/usr/local/lib/libtclstub8.6.a
/usr/local/lib/libtkstub8.6.a
Thank you in advance for your help!
Hi!
Is anyone running the rc.1 version of prodigal 3 successfully on metagenomes?
It works fine for me on isolates (first training and then normal mode).
But when I run it on fasta file containing a metagenomic assembly and switch to --mode anon (also tried -p meta) no genes get predicted at all.
Prodigal 2.6.3 predicts plenty of genes though (with -p meta).
Here's an example with the shortest sequence out of the file for testing purposes:
marcelh@dint05 -> cat test.fna
>test_seq
GGATGAGCCAGTGGCTGATCAGCGGTTGGCGCGGGTACCGACCACTTCGATCTCGACGCG
GCGGTTCTTGGCGCGACCTTCTTTGGTGCGGTTGTCAGCGATGGGCTGCTTCTCGCCCTT
GCCTTCGGTGTAGATGCGGTTTTTCTCGATGCCCTTGCTGACCAAATAGGCCTTCACGGC
TTCGGAACGACGCACCGACA
marcelh@dint05 -> $OMICS/bin/prodigal -p meta -i test.fna
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------
Request: Metagenomic, Phase: Training
Initializing training files...done!
-------------------------------------
Request: Metagenomic, Phase: Gene Finding
Finding genes in sequence #1 (200 bp)...done!
DEFINITION seqnum=1;seqlen=200;seqhdr="test_seq";version=Prodigal.v2.6.3;run_type=Metagenomic;model="25|Marinobacter_aquaeolei_VT8|B|57.3|11|1";gc_cont=57.30;transl_table=11;uses_sd=1
FEATURES Location/Qualifiers
CDS complement(19..>198)
/note="ID=1_1;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.600;conf=99.91;score=30.71;cscore=27.49;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
//
marcelh@dint05 -> $OMICS/bin/prodigal3 --mode anon -i test.fna
-------------------------------------
PRODIGAL v3.0.0-rc.1 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------
Mode: Anonymous, Phase: Training
Initializing preset training files...done.
-------------------------------------
Mode: Anonymous, Phase: Gene Finding
Finding genes in sequence #1 (200 bp)...done.
DEFINITION seqnum=1;seqlen=200;seqhdr="test_seq";version=Prodigal.v3.0.0-rc.1;run_type=Anonymous;model="0|Mycoplasma_bovis_PG45|B|29.3|0|0";gc_cont=0.00;transl_table=0;uses_sd=0
FEATURES Location/Qualifiers
//
It basically looks like this for every contig in the file. Not a single gene gets predicted.
Is anyone experiencing the same issue?
Thanks,
Marcel
Hi there,
I've been using Prodigal V2.6.3 on a cluster to predict CDS in parallel on a large number of genomes (>10k, 50 jobs in parallel). Unfortunately after some time, one of the job inevitably fails with the error:
Could not delete tmp file tmp.prodigal.stdin.xxxxx.
with xxxxx a random number. This file is indeed no longer present when the job exits. I looked at how this filename is chosen in the code source, and it seems to use the PID of the process:
https://github.com/hyattpd/Prodigal/blob/GoogleImport/main.c#L95-L97
/* Filename for input copy if needed */
pid = getpid();
sprintf(input_copy, "tmp.prodigal.stdin.%d", pid);
It is hard to confirm, but I suspect this problem arises when two jobs happen to use the same PID at the same time on two different nodes: the tmp file is transparently overwritten and then deleted by the first job to finish, the second job failing to delete it afterwards. I can just ignore those errors and relaunch the analysis for the few failed jobs, but If this file is indeed transparently overwritten, there could be a bigger problem due to the collision. If the timing is unfortunate, some of the prodigal results could not correspond to the original fasta that was piped inside the command (if the job is the first to finish, it exits normally, and I have no way to know a filename collision occurred).
Would it be possible to add a command line option to allow the user to specify the location or name of the temporary file? This should also improve performance on clusters by using local filesystem on each node.
Alternatively, Prodigal could check if the tmp file already exists, and exit with an error before transparently overwriting it.
Cheers,
Nils
Prodigal V2.6.2: February, 2015
Command line and error:
SEBASTIANs-MacBook-Pro:~ FLFLFLLF$ prodigal -i /Users/FLFLFLLF/spades_3rd_lf10sc_notmp/contigs.fna -o lf10spades_coords.gbk -a lf10proteins.faa -p anon
Invalid meta/single genome type specified.
The format of my assembly fasta file:
NODE_4604_length_1093_cov_92.9644_ID_9355
ATCGCTGTAGCATCTCCGCCCCAGGCGGAGTTTGGTGGGTTGCCGCACCCGAGCATCTTG
GCGCTGTGTCATCTACTGGGTGTCAATGAGGTCTACGCCGTCGGCGGTGCCCAGGCGATC
G
It's my first time to use Prodigal....I didn't see anything wrong with my fasta file...Thanks for you help
Hi, I'd like to suggest a mode that converts between the output formats, like the genbank-like format, gff3, and so on.
I have the following sequence with an alternate start codon (TTG). The sequence is from E-coli.
Sample
ATGGTCCAGCCGTGTACATGGTTCAAACACGCCAGGCATTCGAGCGAACACGCAGTGATGCCTAACCCTTCCATCGAGGGGGACGTCCAAGGGCTGGCGCCCTTGGCCGCCCCTCATGTCAAACGTTGGGCGAACCCGGAGCCTCATTAATTGTTAGCCGTTAAAATTAAGCCCTTTACCAAACCAATACTTATTATGAAAAACACAATACATATCAACTTCGCTATTTTTTTAATAATTGCAAATATTATCTACAGCAGCGCCAGTGCATCAACAGATATCTCTACTGTTGCATCTCCATTATTTGAAGGAACTGAAGGTTGTTTTTTACTTTACGATGCATCCACAAACGCTGAAATTGCTCAATTCAATAAAGCAAAGTGTGCAACGCAAATGGCACCAGATTCAACTTTCAAGATCGCATTATCACTTATGGCATTTGATGCGGAAATAATAGATCAGAAAACCATATTCAAATGGGATAAAACCCCCAAAGGAATGGAGATCTGGAACAGCAATCATACACCAAAGACGTGGATGCAATTTTCTGTTGTTTGGGTTTCGCAAGAAATAACCCAAAAAATTGGATTAAATAAAATCAAGAATTATCTCAAAGATTTTGATTATGGAAATCAAGACTTCTCTGGAGATAAAGAAAGAAACAACGGATTAACAGAAGCATGGCTCGAAAGTAGCTTAAAAATTTCACCAGAAGAACAAATTCAATTCCTGCGTAAAATTATTAATCACAATCTCCCAGTTAAAAACTCAGCCATAGAAAACACCATAGAGAACATGTATCTACAAGATCTGGATAATAGTACAAAACTGTATGGGAAAACTGGTGCAGGATTCACAGCAAATAGAACCTTACAAAACGGATGGTTTGAAGGGTTTATTATAAGCAAATCAGGACATAAATATGTTTTTGTGTCCGCACTTACAGGAAACTTGGGGTCGAATTTAACATCAAGCATAAAAGCCAAGAAAAATGCGATCACCATTCTAAACACACTAAATTTATAAAAAATCTAATGGCAAAATCGCCCAACCCTTCAATCAAGTCGGGACGGCCAAAAGCAAGCTTTTGGCTCCCCTCGCTGGCGCTCGGCGCCCCTTATTTCAAACGTTAGACGGCAAAGTCACAGACCGCGGGATCTCTTATGACCAACTACTTTGATAGCCCCTTCAAAGGCAAGCTGCTTTCTGAGCAAGTGAAGAACCCCAATATCAAAGTTGGGCGGTACAGCTATTACTCTGGCTACTATCATGGGCACTCATTCGATGACTGCGCACGGTATCTGTTTCCGGACCGTGATGACGTTGATAAGTTGATCATCGGTAGTTTCTGCTCTATCGGGAGTGGGGCTTCCTTTATCATGGCTGGCAATCAGGGGCATCGGTACGACTGGGCATCATCTTTCCCGTTCTTTTATATGCAGGAAGAACCTGCATTCTCAAGCGCACTCGATGCCTTCCAAAAAGCAGGTAATACTGTCATTGGCAATGACGTTTGGATCGGCTCTGAGGCAATGGTCATGCCCGGAATCAAGATCGGGCACGGTGCGGTGATAGGCAGCCGCTCGTTGGTGACAAAAGATGTGGGGCACTGTTGCAAAGTTAGCGATGAGGCAGCCTTTTGTCTTATTCAAAGGCCTTACATTTCAAAAACTCTGCTTACCAGGCGCATTTCGCCCAGGGGATCACCATAATAAAATGCTGAGGCCTGGC
Prodigal is only picking up ATG start codon. But when I train using E-coli genome then the alternate start codon TTG is picked up correctly and I get the complete gene. Is there a way to get the alternate start codons without training on genomes related to the query sequence?
The Prodigal help text needs to be updated to include the newly named 'normal' and 'anon' modes.
$ prodigal -v
Prodigal V2.6.2: January, 2015
$ prodigal -h
Usage: prodigal [-a trans_file] [-c] [-d nuc_file] [-f output_type]
[-g tr_table] [-h] [-i input_file] [-m] [-n] [-o output_file]
[-p mode] [-q] [-s start_file] [-t training_file] [-v]
-a: Write protein translations to the selected file.
-c: Closed ends. Do not allow genes to run off edges.
-d: Write nucleotide sequences of genes to the selected file.
-f: Select output format (gbk, gff, or sco). Default is gbk.
-g: Specify a translation table to use (default 11).
-h: Print help menu and exit.
-i: Specify input file (default reads from stdin).
-m: Treat runs of n's as masked sequence and do not build genes across
them.
-n: Bypass the Shine-Dalgarno trainer and force the program to scan
for motifs.
-o: Specify output file (default writes to stdout).
-p: Select procedure (single or meta). Default is single.
-q: Run quietly (suppress normal stderr output).
-s: Write all potential genes (with scores) to the selected file.
-t: Write a training file (if none exists); otherwise, read and use
the specified training file.
-v: Print version number and exit.
Some bugs have been fixed, a new release would be good?
Hello,
I can't start prodigal.windows.exe on my Windows 10 (86bit). No messages appear. The window opens and then closes.
I'm running prodigal (Prodigal V2.6.3: February, 2016) and have contigs with some regions masked with 'N' where infernal cmscan has predicted non-coding RNAs. However, although I run prodigal with the '-m' option to "Treat runs of N as masked sequence; don't build genes across them." the output has genes predicted across those regions with protein sequences translated into stretches of 'X'. I saw that the stretches of 'N' need to be at least 50 characters and they all are but nevertheless it doesn't seem to act as a mask.
I know the '-m' option was used originally by JGI for masking, but isn't it implemented anymore in prodigal? Or am I using it wrong?
Sincerely,
John
Doug,
I managed to compile 2.6.2 on Linux for table 25 support for Prokka.
Will you eventually provide a Mac OS X and Linux binary and a 2.6.2 tagged release?
Torsten
In future versions, are there plans to set an option to input custom translation tables?
Possibly in the format of:
1. The Standard Code (transl_table=1)
By default all transl_table in GenBank flatfiles are equal to id 1, and this is not shown. When transl_table is not equal to id 1, it is shown as a qualifier on the CDS feature.
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M------**--*----M---------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Hi,
Are there any plans to make a CUDA version of Prodigal?
Thanks
When compiling Prodigal with GCC versions 4 to 6 the output produced is missing some information in the header of each sequence when compared with a Prodigal version compiled with GCC 7.
The incorrect output is (GCC versions 4.9.3 and 6.4.0 (alpine linux)):
>0_1 # 1 # 1344 # 1 # ;gc_cont=0.612
GTGGATAGCGGAGAGGTGAATCACTTGGAAGACCTGAAGGTCCTGTGGAACGAGATACTGGACAAGGGCA
...
whereas with GCC 7.2.0 (ubuntu-17.10) the output is:
>0_1 # 1 # 1344 # 1 # ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.612
GTGGATAGCGGAGAGGTGAATCACTTGGAAGACCTGAAGGTCCTGTGGAACGAGATACTGGACAAGGGCA
...
The source of the problem is:
Line 312 in 004218f
According to some sources, reading and writing to the same array can lead to undefined behavior.
The following code demonstrates the issue.
#### code.c ####
#include <stdio.h>
int main() {
FILE *ptr = stdout;
char data[500] = {0};
sprintf(data, "%sNOTVISIBLE ", data);
sprintf(data, "%sVISIBLE", data);
fprintf(ptr, "%s\n", data);
return 1;
}
#### Makefile ####
SHELL = /bin/sh
CC = gcc
CFLAGS += -pedantic -Wall -O3 -static
LFLAGS = -lm $(LDFLAGS)
TARGET = test
SOURCES = $(shell echo *.c)
OBJECTS = $(SOURCES:.c=.o)
all: $(TARGET)
$(TARGET): $(OBJECTS)
$(CC) $(CFLAGS) -o $@ $^ $(LFLAGS)
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
-rm -f $(OBJECTS)
-rm -f $(TARGET)
.PHONY: all clean
When compiled with GCC versions 4.9.3 and 6.4.0 I get VISIBLE
whereas with GCC 7.2.0 I get NOTVISIBLE VISIBLE
.
Hi Team,
Sorry, if I had not noticed it correctly or actually there is no frame(1-6) information associated with the translated file. when ran prodigal on nucleotide file using below command
Command:
prodigal -i sample_input.fasta -a sample_input_tarnslated.fasta -g 11
Thanks for help
-vijay
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.