GithubHelp home page GithubHelp logo

Questions about the results about sqanti2 HOT 5 CLOSED

magdoll avatar magdoll commented on May 27, 2024
Questions about the results

from sqanti2.

Comments (5)

Magdoll avatar Magdoll commented on May 27, 2024

Hi @Mel66 ,

(1) Sorry, my mind is blanking on the _dup suffix and I cannot find in my code where I would've added that. Can you be more specific? Where did the _dup suffix show up? In which files where they not present and in which files did they show up?

(2) Only FSM_class should be ignored for now since I have not implemented them yet. The other fields 28-31 about ORF and CDS prediction are valid and can be used, unless you specifically used the --skipORF option in which case they will all be zero.

--Liz

from sqanti2.

melissavantilborg avatar melissavantilborg commented on May 27, 2024

The suffix _dup appears in classification.txt, junctions.txt, .renamed_corrected.genePred and .renamed_corrected.gtf.
It doesn't appear in renamed_corrected.fasta, but this file contains additional sequences. These added sequences are named the same as some of the original sequences. For example, renamed_corrected.fasta contains 4 sequences named transcript/4922, while the original fasta and renamed.fasta only have 1 sequence with this name. The four previously mentioned files contain transcript/4922_dup2, transcript/4922_dup3, and transcript/4922_dup4. According to classifications.txt, these 3 transcripts are encoded by different genes.

Thank you for your help.

Melissa

from sqanti2.

Magdoll avatar Magdoll commented on May 27, 2024

Hi @Mel66 ,

I'm sorry for the late response as I was on vacation last week.

If renamed_corrected.fasta already had 4 duplicate sequences, I'm going to guess this is because when it was mapped back to the genome (for the genome-based error correction which is SQANTI's first step), it got mapped to multiple places. This is entirely possible when different aligners (or even different versions of the same aligner) was used. For example, GMAP might decide to map a sequence continuously while minimap2 might decide to break it up.

Are you willing to share the input data confidentially? Is this on human? To reproduce your results, I would need the input fasta, the ref genome, and the ref annotation. If you can share, please give me your email so I can request a file upload.

--Liz

from sqanti2.

melissavantilborg avatar melissavantilborg commented on May 27, 2024

Hi Magdoll,

sorry for the late response, I didn’t see you had answered already. You were right, some of the sequences mapped to multiple genes on the genome, which caused the dups. I worked around this problem by using a gtf file for classification instead of the fasta file.

Sqanti_qc2.py didn’t seem to work with gtf files because it has some bugs. I made some changed for myself, and my results seem fine now. I added the errors I got below, maybe they can be of use to you.

Melissa.


I got this error while running SQANTI with a gtf (-g). I "fixed" this by adding "if not args.gtf:" to line 1377.

R scripting front-end version 3.5.0 (2018-04-23)
Cleaning up isoform IDs...
Cleaned up isoform fasta file written to: /home/mvantilborg/hq.renamed.fasta
Traceback (most recent call last):
File "SQANTI2/sqanti_qc2.py", line 1395, in
main()
File "SQANTI2/sqanti_qc2.py", line 1384, in main
if args.aligner_choice == "minimap2":
AttributeError: 'Namespace' object has no attribute 'aligner_choice'


Then I got the error below. I worked around it by commenting out line 1369, because it looked like the function rename_isoform_seqids was trying to read my gtf like it was a fasta or fastq file, and the resulting renamed fasta file (hq.renamed.fasta) was empty.

R scripting front-end version 3.5.0 (2018-04-23)
Cleaning up isoform IDs...
Cleaned up isoform fasta file written to: /home/mvantilborg/hq.renamed.fasta
**** Running SQANTI...
**** Parsing provided files....
Reading genome fasta /usr/local/Genomes/species/H.sapiens/GRCh38_no_alt_analysis_set/reference.fa....
Skipping aligning of sequences because gtf file was provided.

ERROR: gtf has not annotation lines.


Then I got this error below. I worked around this by using the –skipORF prediction because I was doing an ORF prediction myself. But it was probably caused by skipping the renaming.

R scripting front-end version 3.5.0 (2018-04-23)
Cleaning up isoform IDs...
Cleaned up isoform fasta file written to: /home/mvantilborg/hq.gff
**** Running SQANTI...
**** Parsing provided files....
Reading genome fasta /usr/local/Genomes/species/H.sapiens/GRCh38_no_alt_analysis_set/reference.fa....
Skipping aligning of sequences because gtf file was provided.

Indels will be not calculated since you ran SQANTI without alignment step (SQANTI with gtf format as transcriptome input).
**** Predicting ORF sequences...
Expected GMST output IDs to be of format ' gene_4|GeneMark.hmm|_aa||<cds_start>|<cds_end>' but instead saw: PB.1.1 gene=PB.1 gene_1|GeneMark.hmm|138_aa|+|3|419! Abort!

from sqanti2.

Magdoll avatar Magdoll commented on May 27, 2024

Close unless further notice.

from sqanti2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.