GithubHelp home page GithubHelp logo

prodigal's People

Contributors

althonos avatar dnbaker avatar hyattpd avatar mr-c avatar neurotensin avatar unode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prodigal's Issues

upper limit

The nucleotide count for my fasta file is larger than 32000000 bp long.
Is there a better method than splitting the fasta?
Cheers.

Can I use a training file for metagenomic contigs?

Hello,

Right now, I am trying to predict viral ORFs from the metagenomic viral contigs using Prodigal. However, as you know, Prodigal does not know viral genome specific rules, so I would like to estimate viral genes using information I trained other similar viral metagenomic contigs in advance. However, Prodigal does not seem to accept the meta mode with training files. Can I predict the ORFs using the normal mode?

osx executable?

I donwnloaded the installation file for mac but there is no extension, and the computer doesn't know what to do.

Minimum length of gene

As far as I understand there is no specific lower length cut-off for gene calling with prodigal, is that correct?

But you penalize genes shorter than 250 bp, as far as I can see in the code. Based on this method, what are the shortest genes which get called in general? How did you get to the specific length of 250 bp?

Error: Sequence must be 20000 characters (only 13949 read).

Hi, I'm trying to run Prodigal on some assembled genomes and I get this message:

Error: Sequence must be 20000 characters (only 13949 read). (Consider running with the -p meta option or finding more contigs from the same genome.)

Can you please elucidate what it means? I'm using complete whole genomes, wouldn't it be unadvisable to use the meta parameter?

Thank you for any assistance you can provide.

V

Handling broken backward compatility between 2.60 and 2.70

Doug

Sean Jackman has alerted me to your git version of prodigal (aka 2.70) breaking backward compatibility of command line options with the long standing 2.60 version:

Prodigal is very popular (fast and reliable) and it is not customary to break interfaces for minor versions.

As I can see from your notes that most of this is in preparation for 3.00. Are you planning any official/stable releases for 2.70?

Torsten

Request: Parameter for short (< 250 bp) CDS penalty

Hi. I'm annotating a plant plastid using Prodigal, and I found it was missing a number of short but real genes ranging in size from 90 bp to 189 bp. I found that reducing the short CDS penalty size from 250 bp to 100 bp helped rescue 6 of 9 missing short genes. Could the 250 bp size please be made a command line parameter?

See…

No partial tag using metaProdigal

Hi,
I am using Prodigal_v2.6.2 in meta version.
In my output files there are no 'partial' tags, why is that?
Best, Johanna

prodigal -p meta -a $OUTDIR/soil.contigs.genes.faa -d $OUTDIR/soil.contigs.genes.fna -f gff -o $OUTDIR/soil.contigs.genes.gff -i $INDIR/soil.contigs.fa

Performance impacted by word wrap length

Hi,

I've noticed that the performance of Prodigal is greatly impacted by the word wrap length of the input FASTA file. An input file without any word wrapping takes almost double the time of one wrapped under 1000 characters.

I've observed this on multiple genomes, however, these results are from GCF_000754115.1_ASM75411v1 with n=20 replicates at word wrap lengths of: [1, 5, 10, 20, 40, 80, 150, 300, 500, 1000, 3000, 5000, 10000, 15000, 20000, 50000, 100000, 200000, 300000, 400000, 500000]. Input was read from stdin.

image

I'd like to suggest forcefully word-wrapping user input, or adding a note to the documentation that the input should be strictly wrapped at 80 characters.

Thanks,
Aaron

Mac Download

The Mac download for 2.6.3 does not have an extension that is easily discernible. I don't know what program to use to open it.

Is Prodigal Still Running?

Hi,

I noticed that the prodigal website is not available. Is the service still available, or does this indicate that prodigal should no longer be used? Thank you!

Segmentation fault 11

Hi,

I have a segmentation fault issue with some genomes as in here : https://code.google.com/p/prodigal/issues/detail?id=4

version used : Prodigal V2.6.1: July, 2013
Downloaded from : https://github.com/hyattpd/Prodigal/releases/download/v2.6.1/prodigal.macosx

computer used: Mac Pro mid 2012 with OSX 10.8.5

Step to reproduce the error:

  1. Download fasta file from:

  2. rename files KJ484629.fst and KC340960.fst respectively

  3. run the following command:

    prodigal -i KJ484629.fst

it outputs:

-------------------------------------
PRODIGAL v2.6.1 [July, 2013]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Single Genome, Phase:  Training
Reading in the sequence(s) to train...112671 bp seq created, 50.70 pct GC
Locating all potential starts and stops...5798 nodes
Looking for GC bias in different frames...frame bias scores: 1.73 0.34 0.93
Building initial set of genes to train from...done!
Creating coding model and scoring nodes...done!
Examining upstream regions and training starts...Segmentation fault: 11

Same for KC340960.fst

NB:
- Both sequences are composed of ATGC characters exclusively
- when running the following command, it works:

prodigal -p meta -i KJ484629.fst

Is there something I can do to make it work ? (given that it works for most of the genome I'm using, and some are about the same size)
Or is this a known bug that should be fixed ?

Thanks

Unsafe code

It is not a critical issue but some parts of the code, such as argument parsing in main.c, may be unsafe due to the usage of strcmp and atoi. Also, strcpy during memory allocation. Alternatives already exist and, in my opinion, should be used, instead.

D.

Restrict start codons to AUG

I'm using translation table 11 to annotate a plant plastid genome. Is it possible to limit the possible start codons to only AUG?

Annotating vertebrate mitochondrial genomes

Vertebrate mitochondrial genomes use the genetic code 2. See https://en.wikipedia.org/wiki/Vertebrate_mitochondrial_code
They're also only 16 kbp, so less than Prodigal's threshold of 20 kbp, so must be run in -p meta mode. -p meta mode uses a database that include genetic codes 4 and 11, but not 2. See also the related issues #19 and #22.
Is there any workaround to use genetic code 2 on small sequences? One thought is that I could download a bunch of vertebrate mitochondrial genomes, add them to my FASTA file, and annotate the whole thing.

Use -p single or -p meta for metagenomic bins

After the binning of assembled contigs file, I get a bin set. I want to make gene prediction on each bin.

However, I'm confused by the option -p single or meta when I add it to command. It seems right to use -p single, because each bin could be thought as draft genome. But on the other hand, each bin includes many contigs which were assembled from metagenomic reads, so it also seems right to to use -p meta.

Which parameter should I use for each bin?

Thanks in advance!

gene prediction labeled as partial if it starts at contig edge

This might actually be behaviour 'as designed' but I noticed a gene gets qualified as 'partial' even if the full sequence is there when it starts at the edge of a contig. I noticed this after searching for dnaA and rearranging a circular chromosome to start at dnaA.
Maybe a flag for circularity would help such that prodigal would know that the promoter of the first gene might be at the 'end' of given sequence

prediction on the raw contig: dnaA sits somewhere in the middle:
unitig_0_quiver Prodigal_v2.6.2 CDS 872920 874320 357.5 + 0 ID=1_808;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=357.50;cscore=357.42;sscore=0.08;rscore=-3.89;uscore=0.47;tscore=4.15;

prediction on the rearranged contig where dnaA starts at first nucleotide.
Chrom1 Prodigal_v2.6.2 CDS 1 1401 360.8 + 0 ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=360.75;cscore=357.54;sscore=3.22;rscore=0.00;uscore=3.22;tscore=0.00;

Both predictions contain the exact same sequence (of course)

Millions of null characters (\x00) inserted into called gene

In the output FASTA file from prodigal, one entry has 122 million null characters inserted into the middle of the called gene:

[('\x00', 122748928), ('V', 7), ('A', 6), ('L', 5), ('S', 5)]

ENVKHIWGRAQWLMPVIPALWEAKAGGSPEVR\x00...\x00ASSLWGLALVSYVEMNESHTPVCV

Unix binary provided doesn't work (wrong GLIBC)

I wanted to take prodigal out for a spin, just to see what it can do. This is what I typed:

$ wget https://github.com/hyattpd/Prodigal/releases/download/v2.6.2/prodigal.linux
$ chmod +x prodigal.linux
$ ./prodigal.linux -help
./prodigal.linux: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./prodigal.linux)

percent of genes predicted in a metagenome?

Is it possible to do something like- (no.of protein-coding genes/no.of contigs in a metagenome)*100 = to get the percent of genes predicted in a metagenome?

Would this help make sense of how many genes were predicted in total, and compare it among samples?

numbering proteins

Hi,

It's a feature proposition. (not a heavy one)
For a replicon named myreplicon we'll get proteins named:

myreplicon_1 # xxx # yyy # 1 # ID=1_1
myreplicon_2 # xxx # yyy # 1 # ID=1_2
...
myreplicon_153 # xxx # yyy # 1 # ID=1_153
myreplicon_154 # xxx # yyy # 1 # ID=1_154

Is it possible to format the output as follows:

myreplicon_001 # xxx # yyy # 1 # ID=1_001
myreplicon_002 # xxx # yyy # 1 # ID=1_002
...
myreplicon_153 # xxx # yyy # 1 # ID=1_153
myreplicon_154 # xxx # yyy # 1 # ID=1_154

So that when doing some analysis afterwards we can sort on the protein name ?

Currently we'd have after sorting:

myreplicon_1
myreplicon_153
myreplicon_154
myreplicon_2

You could either number with as many 0s depending on the number of protein annotated. You can alternatively assume that you'll never get replicon with more than eg. 999,999 proteins and number them like _000001 or _000154. I'll tend to favor the second method to have homogeneous format across different replicons.

Thanks

Bioconda

Hi,
I use bioconda to keep my packages up to date. Prodigal is still showing version 2.6.0. Can you update please ?
Thanks,
david

Partial protein Methionine

I noticed that when you have a partial gene and the beginning of that gene is unknown the first amino acid is translated to a methionine. I noticed this as the EMBL validator started complaining. Now I am wondering wether this is biology or a bug?

Alternative start codons (annotating plant mitochondrial genomes)

Plant mitochondrial genomes use the standard genetic code 1, mostly. There's two exceptions.

  • AUG start codons are used primarily
  • ACG codon is frequently edited to AUG via C-to-U RNA editing
  • GUG alternative start codon is used by two genes (rpl16 and rps19)
  • GCG codon may be edited to a GUG alternative start codon via C-to-U RNA editing

Is there an option to specify a list of start codons?

Compare outputs of different runs with same input data

Hi, I am writing unit tests for a script which called Prodigal. I need to compare the contents of the output files of different runs (using the same input data). However, I can't do this since the contents of the outputs differ per execution of Prodigal (same input data). How can I rectify this issue?

Warning when compiling on Linux

I get the following message when compiling from the git repo on Debian 9 (Stretch).

$ gcc --version
gcc (Debian 6.3.0-18) 6.3.0 20170516
$ make install
gcc -pedantic -Wall -O3 -c -o bitmap.o bitmap.c
gcc -pedantic -Wall -O3 -c -o dprog.o dprog.c
gcc -pedantic -Wall -O3 -c -o gene.o gene.c
gcc -pedantic -Wall -O3 -c -o main.o main.c
gcc -pedantic -Wall -O3 -c -o metagenomic.o metagenomic.c
gcc -pedantic -Wall -O3 -c -o node.o node.c
gcc -pedantic -Wall -O3 -c -o sequence.o sequence.c
sequence.c: In function ‘shine_dalgarno_mm’:
sequence.c:766:27: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
         if(match[k] < 0.0 && (k <= j+1 || k >= j+i-2)) cur_ctr -= 10.0;
            ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -pedantic -Wall -O3 -c -o training.o training.c

gcc -pedantic -Wall -O3 -o prodigal bitmap.o dprog.o gene.o main.o metagenomic.o node.o sequence.o training.o -lm 
install -d -m 0755 /usr/local/bin

Everything seems to work afterwards, but I thought you might want to check that.

Cheers,
Nils

free(): invalid pointer: 0x00007f8b8602b010

Hi, I've posted about this issue before and I'm hoping someone can help clarify this recurring issue.

My team modified node.h in order to better support peptide prediction (we simply redefined the MIN_EDGE_GENE and MIN_GENE variable), please see below the diff output to see details, with our changes at the bottom.


Changes made by TW on December 14th 2016

Reduced the min gene length to support peptide prediction

diff ../2.6.3/node.h ./node.h
30,31c30,31
< #define MIN_GENE 90
< #define MIN_EDGE_GENE 60

#define MIN_GENE 30
#define MIN_EDGE_GENE 30


When running the following command on the fasta file attached (please make note that github does not support the .fasta extension, so I uploaded this file as a .txt file)
problem_chunk.04.fasta.txt
, we run into this "invalid pointer" error (screenshot attached). It is important to note that 1) this error only occurs when we run our modified node.h, and it does work when we use the original,unmodified codebase and 2) the fasta file attached is only a chunk of a larger file, but we were able to identify that this particular chunk (file attached) is the cause of the error. Output files are created, and I have attached them below.

command: /path/to/modifiedprodigal -i problem_chunk.04.fasta -a 4.tmp.faa -d 4.tmp.fna -o 4.tmp.gff -f gff -p meta -c -m -q

Please let me know if anyone has anything that can help us verify the root of this issue. Thanks in advance.

screen shot 2018-06-22 at 4 18 00 pm

4.tmp.faa.txt
4.tmp.fna.txt
4.tmp.gff.txt

Invalid Pointer Error

screen shot 2018-01-22 at 2 25 06 pm

GCF_000470005.1_KK113_genomic.fna.gz
Hi, I am trying to run prodigal on the file attached below (GCF_000470005.1_KK113_genomic.fna.gz) and I ran into the error (screenshot attached below). Could you please clarify the issue behind this error? I've seen multiple occasions of this error now and I cannot understand what is going on.

Issue installing Prodigal through Brew on Mac Mojave 10.14

Good Evening,

I tried to install prodigal using the steps described within the wiki, and I had this error:

==> Tapping hyattpd/prodigal
Cloning into '/usr/local/Homebrew/Library/Taps/hyattpd/homebrew-prodigal'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), done.
Error: Invalid formula: /usr/local/Homebrew/Library/Taps/hyattpd/homebrew-prodigal/prodigal.rb
prodigal: undefined method `sha1' for #<Class:0x00007ff0399f9e18>
Error: Cannot tap hyattpd/prodigal: invalid syntax in tap!

Here is my output to brew doctor in case this helps give away what the problem might be:

Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry or file an issue; just ignore this. Thanks!

Warning: Unbrewed dylibs were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected dylibs:
  /usr/local/lib/libtcl8.6.dylib
  /usr/local/lib/libtk8.6.dylib

Warning: Unbrewed header files were found in /usr/local/include.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected header files:
  /usr/local/include/fakemysql.h
  /usr/local/include/fakepq.h
  /usr/local/include/fakesql.h
  /usr/local/include/itcl.h
  /usr/local/include/itcl2TclOO.h
  /usr/local/include/itclDecls.h
  /usr/local/include/itclInt.h
  /usr/local/include/itclIntDecls.h
  /usr/local/include/itclMigrate2TclCore.h
  /usr/local/include/itclTclIntStubsFcn.h
  /usr/local/include/mysqlStubs.h
  /usr/local/include/odbcStubs.h
  /usr/local/include/pqStubs.h
  /usr/local/include/tcl.h
  /usr/local/include/tclDecls.h
  /usr/local/include/tclOO.h
  /usr/local/include/tclOODecls.h
  /usr/local/include/tclPlatDecls.h
  /usr/local/include/tclThread.h
  /usr/local/include/tclTomMath.h
  /usr/local/include/tclTomMathDecls.h
  /usr/local/include/tdbc.h
  /usr/local/include/tdbcDecls.h
  /usr/local/include/tdbcInt.h
  /usr/local/include/tk.h
  /usr/local/include/tkDecls.h
  /usr/local/include/tkPlatDecls.h

Warning: Unbrewed .pc files were found in /usr/local/lib/pkgconfig.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected .pc files:
  /usr/local/lib/pkgconfig/tcl.pc
  /usr/local/lib/pkgconfig/tk.pc

Warning: Unbrewed static libraries were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected static libraries:
  /usr/local/lib/libtclstub8.6.a
  /usr/local/lib/libtkstub8.6.a

Thank you in advance for your help!

prodigal 3 not predicting any genes in anon mode

Hi!

Is anyone running the rc.1 version of prodigal 3 successfully on metagenomes?
It works fine for me on isolates (first training and then normal mode).
But when I run it on fasta file containing a metagenomic assembly and switch to --mode anon (also tried -p meta) no genes get predicted at all.
Prodigal 2.6.3 predicts plenty of genes though (with -p meta).
Here's an example with the shortest sequence out of the file for testing purposes:

marcelh@dint05 -> cat test.fna 
>test_seq
GGATGAGCCAGTGGCTGATCAGCGGTTGGCGCGGGTACCGACCACTTCGATCTCGACGCG
GCGGTTCTTGGCGCGACCTTCTTTGGTGCGGTTGTCAGCGATGGGCTGCTTCTCGCCCTT
GCCTTCGGTGTAGATGCGGTTTTTCTCGATGCCCTTGCTGACCAAATAGGCCTTCACGGC
TTCGGAACGACGCACCGACA

marcelh@dint05 -> $OMICS/bin/prodigal -p meta -i test.fna 
-------------------------------------
PRODIGAL v2.6.3 [February, 2016] 
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Metagenomic, Phase:  Training
Initializing training files...done!
-------------------------------------
Request:  Metagenomic, Phase:  Gene Finding
Finding genes in sequence #1 (200 bp)...done!
DEFINITION  seqnum=1;seqlen=200;seqhdr="test_seq";version=Prodigal.v2.6.3;run_type=Metagenomic;model="25|Marinobacter_aquaeolei_VT8|B|57.3|11|1";gc_cont=57.30;transl_table=11;uses_sd=1
FEATURES             Location/Qualifiers
     CDS             complement(19..>198)
                     /note="ID=1_1;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.600;conf=99.91;score=30.71;cscore=27.49;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
//

marcelh@dint05 -> $OMICS/bin/prodigal3 --mode anon -i test.fna 
-------------------------------------
PRODIGAL v3.0.0-rc.1 [February, 2016]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Mode: Anonymous, Phase: Training
Initializing preset training files...done.
-------------------------------------
Mode: Anonymous, Phase: Gene Finding
Finding genes in sequence #1 (200 bp)...done.
DEFINITION  seqnum=1;seqlen=200;seqhdr="test_seq";version=Prodigal.v3.0.0-rc.1;run_type=Anonymous;model="0|Mycoplasma_bovis_PG45|B|29.3|0|0";gc_cont=0.00;transl_table=0;uses_sd=0
FEATURES             Location/Qualifiers
//

It basically looks like this for every contig in the file. Not a single gene gets predicted.
Is anyone experiencing the same issue?

Thanks,
Marcel

tmp filename collision

Hi there,

I've been using Prodigal V2.6.3 on a cluster to predict CDS in parallel on a large number of genomes (>10k, 50 jobs in parallel). Unfortunately after some time, one of the job inevitably fails with the error:

Could not delete tmp file tmp.prodigal.stdin.xxxxx.

with xxxxx a random number. This file is indeed no longer present when the job exits. I looked at how this filename is chosen in the code source, and it seems to use the PID of the process:

https://github.com/hyattpd/Prodigal/blob/GoogleImport/main.c#L95-L97

/* Filename for input copy if needed */
pid = getpid();
sprintf(input_copy, "tmp.prodigal.stdin.%d", pid);

It is hard to confirm, but I suspect this problem arises when two jobs happen to use the same PID at the same time on two different nodes: the tmp file is transparently overwritten and then deleted by the first job to finish, the second job failing to delete it afterwards. I can just ignore those errors and relaunch the analysis for the few failed jobs, but If this file is indeed transparently overwritten, there could be a bigger problem due to the collision. If the timing is unfortunate, some of the prodigal results could not correspond to the original fasta that was piped inside the command (if the job is the first to finish, it exits normally, and I have no way to know a filename collision occurred).

Would it be possible to add a command line option to allow the user to specify the location or name of the temporary file? This should also improve performance on clusters by using local filesystem on each node.

Alternatively, Prodigal could check if the tmp file already exists, and exit with an error before transparently overwriting it.

Cheers,
Nils

Spades assembly being recongnized as Invalid meta/single genome type

Prodigal V2.6.2: February, 2015

Command line and error:

SEBASTIANs-MacBook-Pro:~ FLFLFLLF$ prodigal -i /Users/FLFLFLLF/spades_3rd_lf10sc_notmp/contigs.fna -o lf10spades_coords.gbk -a lf10proteins.faa -p anon

Invalid meta/single genome type specified.

The format of my assembly fasta file:
NODE_4604_length_1093_cov_92.9644_ID_9355

ATCGCTGTAGCATCTCCGCCCCAGGCGGAGTTTGGTGGGTTGCCGCACCCGAGCATCTTG
GCGCTGTGTCATCTACTGGGTGTCAATGAGGTCTACGCCGTCGGCGGTGCCCAGGCGATC
G

It's my first time to use Prodigal....I didn't see anything wrong with my fasta file...Thanks for you help

Convert between output formats

Hi, I'd like to suggest a mode that converts between the output formats, like the genbank-like format, gff3, and so on.

Alternate start codons

@hyattpd

I have the following sequence with an alternate start codon (TTG). The sequence is from E-coli.

Sample
ATGGTCCAGCCGTGTACATGGTTCAAACACGCCAGGCATTCGAGCGAACACGCAGTGATGCCTAACCCTTCCATCGAGGGGGACGTCCAAGGGCTGGCGCCCTTGGCCGCCCCTCATGTCAAACGTTGGGCGAACCCGGAGCCTCATTAATTGTTAGCCGTTAAAATTAAGCCCTTTACCAAACCAATACTTATTATGAAAAACACAATACATATCAACTTCGCTATTTTTTTAATAATTGCAAATATTATCTACAGCAGCGCCAGTGCATCAACAGATATCTCTACTGTTGCATCTCCATTATTTGAAGGAACTGAAGGTTGTTTTTTACTTTACGATGCATCCACAAACGCTGAAATTGCTCAATTCAATAAAGCAAAGTGTGCAACGCAAATGGCACCAGATTCAACTTTCAAGATCGCATTATCACTTATGGCATTTGATGCGGAAATAATAGATCAGAAAACCATATTCAAATGGGATAAAACCCCCAAAGGAATGGAGATCTGGAACAGCAATCATACACCAAAGACGTGGATGCAATTTTCTGTTGTTTGGGTTTCGCAAGAAATAACCCAAAAAATTGGATTAAATAAAATCAAGAATTATCTCAAAGATTTTGATTATGGAAATCAAGACTTCTCTGGAGATAAAGAAAGAAACAACGGATTAACAGAAGCATGGCTCGAAAGTAGCTTAAAAATTTCACCAGAAGAACAAATTCAATTCCTGCGTAAAATTATTAATCACAATCTCCCAGTTAAAAACTCAGCCATAGAAAACACCATAGAGAACATGTATCTACAAGATCTGGATAATAGTACAAAACTGTATGGGAAAACTGGTGCAGGATTCACAGCAAATAGAACCTTACAAAACGGATGGTTTGAAGGGTTTATTATAAGCAAATCAGGACATAAATATGTTTTTGTGTCCGCACTTACAGGAAACTTGGGGTCGAATTTAACATCAAGCATAAAAGCCAAGAAAAATGCGATCACCATTCTAAACACACTAAATTTATAAAAAATCTAATGGCAAAATCGCCCAACCCTTCAATCAAGTCGGGACGGCCAAAAGCAAGCTTTTGGCTCCCCTCGCTGGCGCTCGGCGCCCCTTATTTCAAACGTTAGACGGCAAAGTCACAGACCGCGGGATCTCTTATGACCAACTACTTTGATAGCCCCTTCAAAGGCAAGCTGCTTTCTGAGCAAGTGAAGAACCCCAATATCAAAGTTGGGCGGTACAGCTATTACTCTGGCTACTATCATGGGCACTCATTCGATGACTGCGCACGGTATCTGTTTCCGGACCGTGATGACGTTGATAAGTTGATCATCGGTAGTTTCTGCTCTATCGGGAGTGGGGCTTCCTTTATCATGGCTGGCAATCAGGGGCATCGGTACGACTGGGCATCATCTTTCCCGTTCTTTTATATGCAGGAAGAACCTGCATTCTCAAGCGCACTCGATGCCTTCCAAAAAGCAGGTAATACTGTCATTGGCAATGACGTTTGGATCGGCTCTGAGGCAATGGTCATGCCCGGAATCAAGATCGGGCACGGTGCGGTGATAGGCAGCCGCTCGTTGGTGACAAAAGATGTGGGGCACTGTTGCAAAGTTAGCGATGAGGCAGCCTTTTGTCTTATTCAAAGGCCTTACATTTCAAAAACTCTGCTTACCAGGCGCATTTCGCCCAGGGGATCACCATAATAAAATGCTGAGGCCTGGC

Prodigal is only picking up ATG start codon. But when I train using E-coli genome then the alternate start codon TTG is picked up correctly and I get the complete gene. Is there a way to get the alternate start codons without training on genomes related to the query sequence?

command-line help text still states 'single'/'meta' for mode

The Prodigal help text needs to be updated to include the newly named 'normal' and 'anon' modes.

$ prodigal -v
Prodigal V2.6.2: January, 2015
$ prodigal -h
Usage:  prodigal [-a trans_file] [-c] [-d nuc_file] [-f output_type]
                 [-g tr_table] [-h] [-i input_file] [-m] [-n] [-o output_file]
                 [-p mode] [-q] [-s start_file] [-t training_file] [-v]

         -a:  Write protein translations to the selected file.
         -c:  Closed ends.  Do not allow genes to run off edges.
         -d:  Write nucleotide sequences of genes to the selected file.
         -f:  Select output format (gbk, gff, or sco).  Default is gbk.
         -g:  Specify a translation table to use (default 11).
         -h:  Print help menu and exit.
         -i:  Specify input file (default reads from stdin).
         -m:  Treat runs of n's as masked sequence and do not build genes across 
              them.
         -n:  Bypass the Shine-Dalgarno trainer and force the program to scan
              for motifs.
         -o:  Specify output file (default writes to stdout).
         -p:  Select procedure (single or meta).  Default is single.
         -q:  Run quietly (suppress normal stderr output).
         -s:  Write all potential genes (with scores) to the selected file.
         -t:  Write a training file (if none exists); otherwise, read and use
              the specified training file.
         -v:  Print version number and exit.

prodigal.windows.exe

Hello,

I can't start prodigal.windows.exe on my Windows 10 (86bit). No messages appear. The window opens and then closes.

Running with '-m' still predicts genes across regions of N

I'm running prodigal (Prodigal V2.6.3: February, 2016) and have contigs with some regions masked with 'N' where infernal cmscan has predicted non-coding RNAs. However, although I run prodigal with the '-m' option to "Treat runs of N as masked sequence; don't build genes across them." the output has genes predicted across those regions with protein sequences translated into stretches of 'X'. I saw that the stretches of 'N' need to be at least 50 characters and they all are but nevertheless it doesn't seem to act as a mask.

I know the '-m' option was used originally by JGI for masking, but isn't it implemented anymore in prodigal? Or am I using it wrong?

Sincerely,
John

Binaries for version 2.6.2

Doug,

I managed to compile 2.6.2 on Linux for table 25 support for Prokka.

Will you eventually provide a Mac OS X and Linux binary and a 2.6.2 tagged release?

Torsten

Feature Request: custom translation table

In future versions, are there plans to set an option to input custom translation tables?

Possibly in the format of:

1. The Standard Code (transl_table=1)

By default all transl_table in GenBank flatfiles are equal to id 1, and this is not shown. When transl_table is not equal to id 1, it is shown as a qualifier on the CDS feature.

    AAs  = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts = ---M------**--*----M---------------M----------------------------
  Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Base2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
  Base3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

CUDA?

Hi,

Are there any plans to make a CUDA version of Prodigal?

Thanks

Undefined behavior leads to different program outputs when compiled with GCC 4-6 vs GCC 7

When compiling Prodigal with GCC versions 4 to 6 the output produced is missing some information in the header of each sequence when compared with a Prodigal version compiled with GCC 7.

The incorrect output is (GCC versions 4.9.3 and 6.4.0 (alpine linux)):

>0_1 # 1 # 1344 # 1 # ;gc_cont=0.612
GTGGATAGCGGAGAGGTGAATCACTTGGAAGACCTGAAGGTCCTGTGGAACGAGATACTGGACAAGGGCA
...

whereas with GCC 7.2.0 (ubuntu-17.10) the output is:

>0_1 # 1 # 1344 # 1 # ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.612
GTGGATAGCGGAGAGGTGAATCACTTGGAAGACCTGAAGGTCCTGTGGAACGAGATACTGGACAAGGGCA
...

The source of the problem is:

Prodigal/gene.c

Line 312 in 004218f

sprintf(genes[i].gene_data, "%s;gc_cont=%.3f", genes[i].gene_data,

According to some sources, reading and writing to the same array can lead to undefined behavior.

The following code demonstrates the issue.

#### code.c ####

#include <stdio.h>

int main() {

  FILE *ptr = stdout;
  char data[500] = {0};

  sprintf(data, "%sNOTVISIBLE ", data);
  sprintf(data, "%sVISIBLE", data);

  fprintf(ptr, "%s\n", data);

  return 1;

}

#### Makefile ####

SHELL   = /bin/sh
CC      = gcc

CFLAGS  += -pedantic -Wall -O3 -static
LFLAGS = -lm $(LDFLAGS)

TARGET  = test
SOURCES = $(shell echo *.c)
OBJECTS = $(SOURCES:.c=.o)

all: $(TARGET)

$(TARGET): $(OBJECTS)
	$(CC) $(CFLAGS) -o $@ $^ $(LFLAGS)

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	-rm -f $(OBJECTS)
	-rm -f $(TARGET)

.PHONY: all clean

When compiled with GCC versions 4.9.3 and 6.4.0 I get VISIBLE whereas with GCC 7.2.0 I get NOTVISIBLE VISIBLE.

Translated peptide frame information

Hi Team,

Sorry, if I had not noticed it correctly or actually there is no frame(1-6) information associated with the translated file. when ran prodigal on nucleotide file using below command
Command:
prodigal -i sample_input.fasta -a sample_input_tarnslated.fasta -g 11

Thanks for help
-vijay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.