GithubHelp home page GithubHelp logo

Comments (6)

Magdoll avatar Magdoll commented on June 21, 2024 3

Hi @vkkodali ,

SQANTI2 actually expects GFF3 format. You can convert your input using gffread below:

gffread -T test.gtf > test.gff3

And you can see the difference after the conversion. It basically takes out the "gene" records.

NC_000001.11    TALON   transcript  14404   20079   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    14404   14829   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    14970   15038   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    15796   15947   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    16607   16765   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    16858   17055   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    17233   17742   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    17915   18061   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    18268   18369   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    18501   18554   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    18913   20079   .   -   .   transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   transcript  14404   20274   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    14404   14829   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    14970   15038   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    15796   15947   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    16607   16765   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    16858   17055   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    17233   17742   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    17915   18061   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11    TALON   exon    18268   20274   .   -   .   transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";

from sqanti2.

CrazyHsu avatar CrazyHsu commented on June 21, 2024

Hi Liz
I still came out the same issue that assert raw[2] == 'transcript' after converted my gtf to gff3 using gffread -T test.gtf > test.gff3, the gffread was installed via conda and the version is 0.11.7
here is my gtf format:

1       iFLAS   transcript      44297   49138   .       +       .       gene_id "transcript/30091"; transcript_id "transcript/30091"; 
1       iFLAS   exon    44297   44947   .       +       .       gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1       iFLAS   CDS     44297   44947   .       +       0       gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1       iFLAS   start_codon     44297   44299   .       +       0       gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1       iFLAS   transcript      44297   49139   .       +       .       gene_id "transcript/31099"; transcript_id "transcript/31099"; 
1       iFLAS   exon    44297   44947   .       +       .       gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";
1       iFLAS   CDS     44297   44947   .       +       0       gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";
1       iFLAS   start_codon     44297   44299   .       +       0       gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";

and this is my gff3 file after converting:

1       iFLAS   transcript      44297   49138   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    44297   44947   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    45666   45803   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    45888   46133   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    46229   46342   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    46451   46633   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    47045   47262   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   exon    47650   49138   .       +       .       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     44297   44947   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     45666   45803   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     45888   46133   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     46229   46342   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     46451   46633   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     47045   47262   .       +       0       transcript_id "transcript/30091"; gene_id "transcript/30091";
1       iFLAS   CDS     47650   49138   .       +       1       transcript_id "transcript/30091"; gene_id "transcript/30091";

It seems like that the assert raw[2] == 'transcript' line only pass the line which the feature filed is transcript, so the exon line can't pass the criterion, and then the error is thrown out.
How do you think about it?

from sqanti2.

Magdoll avatar Magdoll commented on June 21, 2024

@CrazyHsu ,
Oh, actually, I may have updated Cupcake to a new version (v11.0.0) that deals with this. It was related to the last column order of whether transcript_id or gene_id is listed first. Can you please update Cupcake (which is used to read the GFF3 file) and report back?

from sqanti2.

CrazyHsu avatar CrazyHsu commented on June 21, 2024

Yes, Liz, I have tried Cupcake(v11.0.0), it worked as expected. But it is under the python3 environment, can i use SQANTI2 with Cupcake (Py2_v8.7.x)?

from sqanti2.

Magdoll avatar Magdoll commented on June 21, 2024

Hi @CrazyHsu SQANTI2 latest versions are all only for Python 3. I do recommend switching to Py3 completely as I have stopped supporting Py2.

from sqanti2.

CrazyHsu avatar CrazyHsu commented on June 21, 2024

OK, Liz, i will turn to Py3, thanks for your quick reply!

from sqanti2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.