GithubHelp home page GithubHelp logo

Comments (2)

gpertea avatar gpertea commented on September 28, 2024

Indeed, currently StringTie only shows the matching reference transcripts in the reference_id attribute of the output GTF. It seems that you'd like to see some recognizable gene IDs or gene names from the annotation - in other words, to have StringTie better "annotate" its own output.. This is a valid suggestion and we would consider adding this in a future version. We hesitated adding this because it could get complicated for those assemblies which may span multiple gene regions, but for those which match exactly the reference transcripts (hence the reference_id), indeed it is very easy to add the gene_name and/or gene_id attributes as well - I'll add these in the next version, likely as a new ref_gene_id attribute, I hope this would help?
Unfortunately I cannot assume that the annotation file would even have gene_id attributes, if it's a NCBI GFF3 or another annotation source, what you call "gene_id" in your case, could be called differently in other files. Transcript ID (called "ID" in GFF3) is the most stable identifier that we can expect to find in the reference annotation file, that's why we've only provided that reference ID.

Meanwhile you can probably use other programs or scripts which could quickly annotate StringTie transcripts based on another GTF/GFF file (e.g. cuffcompare or gffread from the Cufflinks suite), though probably in your case, the way you are doing it now (looking up the reference_id back in the annotation) is still the easiest way, it just needs to be scripted somehow.

from stringtie.

sahyoun avatar sahyoun commented on September 28, 2024

The idea is that when using cufflinks, the gene ids that are found in
the reference transcriptome GTF are annotated in the cufflinks output
files so I was able to grep easily for them since I'm expecting to see a
down regulation of these genes in case of KD or so. I'm not sure how is
this done in cufflinks, but sure it is more convenient than needing to
look for the reference id.

On 3/27/15 4:42 AM, Geo Pertea wrote:

Indeed, currently StringTie only shows the matching reference
transcripts in the /reference_id/ attribute of the output GTF. It
seems that you'd like to see some recognizable gene IDs or gene names
from the annotation - in other words, to have StringTie better
"annotate" its own output.. This is a valid suggestion and we would
consider adding this in a future version. We hesitated adding this
because it could get complicated for those assemblies which may span
multiple gene regions, but for those which match exactly the reference
transcripts (hence the reference_id), indeed it is very easy to add
the gene_name and/or gene_id attributes as well - I'll add these in
the next version, likely as a new /ref_gene_id/ attribute, I hope this
would help?
Unfortunately I cannot assume that the annotation file would even have
/gene_id/ attributes, if it's a NCBI GFF3 or another annotation
source, what you call "gene_id" in your case, could be called
differently in other files. Transcript ID (called "ID" in GFF3) is the
most stable identifier that we can expect to find in the reference
annotation file, that's why we've only provided that reference ID.

Meanwhile you can probably use other programs or scripts which could
quickly annotate StringTie transcripts based on another GTF/GFF file
(e.g. cuffcompare or gffread from the Cufflinks suite), though
probably in your case, the way you are doing it now (looking up the
reference_id back in the annotation) is still the easiest way, it just
needs to be scripted somehow.


Reply to this email directly or view it on GitHub
#5 (comment).

Dr. Abdullah H. Sahyoun
Computational Biologist
Chromatin Regulation/Bioinformatics and Deep Sequencing Unit
Max Planck Institute of Immunbiology and Epigenetics
Stübeweg 51, D-79108 Freiburg, Germany
Tel.: +49 761 5108 707

from stringtie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.