Comments (6)
Hi @vkkodali ,
SQANTI2 actually expects GFF3 format. You can convert your input using gffread below:
gffread -T test.gtf > test.gff3
And you can see the difference after the conversion. It basically takes out the "gene" records.
NC_000001.11 TALON transcript 14404 20079 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 14404 14829 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 14970 15038 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 15796 15947 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 16607 16765 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 16858 17055 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 17233 17742 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 17915 18061 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 18268 18369 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 18501 18554 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 18913 20079 . - . transcript_id "TALONT000214958"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON transcript 14404 20274 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 14404 14829 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 14970 15038 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 15796 15947 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 16607 16765 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 16858 17055 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 17233 17742 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 17915 18061 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
NC_000001.11 TALON exon 18268 20274 . - . transcript_id "TALONT000214910"; gene_id "ENSG00000227232.5"; gene_name "WASH7P";
from sqanti2.
Hi Liz
I still came out the same issue that assert raw[2] == 'transcript'
after converted my gtf to gff3 using gffread -T test.gtf > test.gff3
, the gffread was installed via conda and the version is 0.11.7
here is my gtf format:
1 iFLAS transcript 44297 49138 . + . gene_id "transcript/30091"; transcript_id "transcript/30091";
1 iFLAS exon 44297 44947 . + . gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1 iFLAS CDS 44297 44947 . + 0 gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1 iFLAS start_codon 44297 44299 . + 0 gene_id "transcript/30091"; transcript_id "transcript/30091"; exon_number "1"; exon_id "transcript/30091.1";
1 iFLAS transcript 44297 49139 . + . gene_id "transcript/31099"; transcript_id "transcript/31099";
1 iFLAS exon 44297 44947 . + . gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";
1 iFLAS CDS 44297 44947 . + 0 gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";
1 iFLAS start_codon 44297 44299 . + 0 gene_id "transcript/31099"; transcript_id "transcript/31099"; exon_number "1"; exon_id "transcript/31099.1";
and this is my gff3 file after converting:
1 iFLAS transcript 44297 49138 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 44297 44947 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 45666 45803 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 45888 46133 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 46229 46342 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 46451 46633 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 47045 47262 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS exon 47650 49138 . + . transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 44297 44947 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 45666 45803 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 45888 46133 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 46229 46342 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 46451 46633 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 47045 47262 . + 0 transcript_id "transcript/30091"; gene_id "transcript/30091";
1 iFLAS CDS 47650 49138 . + 1 transcript_id "transcript/30091"; gene_id "transcript/30091";
It seems like that the assert raw[2] == 'transcript'
line only pass the line which the feature filed is transcript, so the exon line can't pass the criterion, and then the error is thrown out.
How do you think about it?
from sqanti2.
@CrazyHsu ,
Oh, actually, I may have updated Cupcake to a new version (v11.0.0) that deals with this. It was related to the last column order of whether transcript_id or gene_id is listed first. Can you please update Cupcake (which is used to read the GFF3 file) and report back?
from sqanti2.
Yes, Liz, I have tried Cupcake(v11.0.0), it worked as expected. But it is under the python3 environment, can i use SQANTI2 with Cupcake (Py2_v8.7.x)?
from sqanti2.
Hi @CrazyHsu SQANTI2 latest versions are all only for Python 3. I do recommend switching to Py3 completely as I have stopped supporting Py2.
from sqanti2.
OK, Liz, i will turn to Py3, thanks for your quick reply!
from sqanti2.
Related Issues (20)
- input isoforms.fasta for chain_samples.py HOT 4
- isoform map to scaffold which without reference gene HOT 1
- Assertion Error HOT 7
- gene_id annotation HOT 2
- Possible error from input gff files instead of fasta/fastq/gtf files HOT 10
- Tamma Collapse for 5' cap selected samples HOT 1
- Isoform class transcript distribution versus rarefaction curve HOT 1
- How to utilize sqanti2 classification result for genome optimizaion? HOT 1
- python 3.7 bx-python error HOT 1
- About adapting SQANTI2 to process transcripts from nanopore cDNA sequencing HOT 2
- Problem running with cage_peak HOT 2
- _corrected.gtf file without transcript line HOT 4
- FileNotFoundError: missing refAnnotation_***.genePred file? HOT 3
- Error with sqanti_qc2.py command HOT 4
- sqanti_filter.py HOT 4
- issue with python=3.7
- Gene_IDs in GTFs HOT 3
- Recurring perl errors - gmst.pl HOT 7
- error using GFF or converted GTF HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sqanti2.