Comments (9)
Since posting, I went back and tried a couple previous versions. The last version to work is v1.1.2 when I tried running v1.2.0, I got the same error as reported above. I guess I'll stick with the older version for now, but it would be nice to know what changed in the GFFReader that is causing stringtie to be inable to find transcripts in my gff files.
from stringtie.
The example you've shown here seems to be just a single-exon transcript. The format seems to be GFF3-like but there are no well formed child features (exons/CDS) belonging to a transcript parent feature. Indeed I have tighten the requirements for GFF3 parsing in the last version so a parent transcript feature is now expected, with well defined child feature(s) (i.e. having a Parent attribute with the same value as the ID attribute of its parent). I did that because it was way too loose before, leading to confusion and loss of transcript data in some cases. All major annotation sources nowadays use either this kind of GFF3 format (with matching ID/Parent attribute values), or the older GTF format with all features using just transcript_id
.
Assuming that all tabs are where they should be (even though they look like spaces here), you could try a combination of the 3rd line followed by the 4th line as shown in your attempts, in order to represent this transcript, but make sure you replace ID=
with Parent=
in the CDS feature line, like this:
Xam668_contig195 Prodigal:2.6 mRNA 700 1398 . - . ID=xam668_04238;gene=xam668_04238;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:RefSeq:WP_007974205.1;locus_tag=xam668_04238;product=partition protein
Xam668_contig195 Prodigal:2.6 CDS 700 1398 . - 0 Parent=xam668_04238
There is some info (and a simple example) about the minimal GFF3 format expected by stringtie and other programs, that you might find helpful, on this page: http://ccb.jhu.edu/software/stringtie/gff.shtml#format
from stringtie.
I'm currently working with a bacterial genome, and since there aren't frequently any introns in these genes, I don't need parent lines. This file was created by a bacterial genome denovo annotation program called prokka. So are you saying that there is no way that I can run this new version without creating parent lines for my gff? Or are you saying that I can use the GTF format for one line genes?
from stringtie.
It seems a more informative error here would have addressed this issue. Would it be difficult to throw a more detailed warning or error describing what in particular is causing an issue? For instance Error: missing Parent attribute
from stringtie.
Yes, in this case you might find it easier to use the GTF format instead, with just the CDS (or exon) features and transcript_id - but don't forget the double quotes for this format! It would look something like this:
Xam668_contig195 Prodigal:2.6 CDS 700 1398 . - 0 transcript_id "xam668_04238"; gene_id "xam668_04238"; inference "ab initio prediction:Prodigal:2.6,similar to AA sequence:RefSeq:WP_007974205.1"; locus_tag "xam668_04238";product "partition protein"
Please keep in mind that stringtie does not care about any other attributes there, so if you really want to save space, just transcript_id would be enough.
(snipped wrong opinion comment)
from stringtie.
As a side note I find it intriguing that you're using something like StringTie, which is mainly a sophisticated isoform assembler/resolver, on bacterial genomes and transcripts. I guess you are using it mostly for the abundance estimation but I am pretty sure there are more suitable tools out there for doing this on prokaryotic genomes..
from stringtie.
OK, it seems I was plain wrong in my assumption that matching ID/Parent values are somehow required by the GFF3 format, I just saw the Bacteriophage f1 example at http://www.sequenceontology.org/gff3.shtml and it is exactly as you described it that prokka wrote it -- just a CDS with an ID attribute should be enough to represent the whole gene. I guess I've been focusing on multi-exon transcripts for so long, I forgot that such a GFF3 record is perfectly OK..
So I am going to mark this down as a GFF3 parsing issue (a regression bug!) that should be fixed in my code. Thank you for bringing this problem to my attention..
from stringtie.
Thank you for addressing my issues. I'll use the older version of stringtie until I get around to writing a script to reformat my gff files.
from stringtie.
This should have been fixed in an earlier commit, should make it into the next release.
from stringtie.
Related Issues (20)
- Segmentation fault 11 HOT 6
- Segmentation fault only in "-mix" mode.
- when dealing with big genomes, the error occurs: "the input alignment file is not sorted!" HOT 3
- "The -c and -m parameters of StringTie are not effective. HOT 3
- "Segmentation fault" error
- "Segmentation fault" Error using StringTie HOT 1
- Segmentation Error with Stringtie mix. The bundle appears too large.
- Error when running prepDE.py HOT 1
- Springtie installation instructions fail on "make release"
- StringTie for single-end reads HOT 1
- Not compiling SuperReads_RNA HOT 1
- I wonder 'else' clause should be commented out (or deleted) in prepDE.py3, in the block of badGenes check
- Whether stringtie will include novel transcripts into ballgown file? What other tools can be used to visualize novel transcripts?
- StringTie --merge
- <class 'str'> with high reads after running prepDE.py3 HOT 1
- problem with python prepDE.py3 -i sample_lst.txt HOT 1
- Stringtie --merge -c
- Segmentation fault (core dumped) and problems of gene orientations HOT 3
- how do you get all the potential processed transcripts HOT 1
- input file format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringtie.