Hello, I've used collapse_isoforms_by_sam.py to remove redundant tra

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

collapsed.gff file is not consistent with collapsed.rep.fa file about cdna_cupcake HOT 7 CLOSED

magdoll commented on June 22, 2024

collapsed.gff file is not consistent with collapsed.rep.fa file

from cdna_cupcake.

Comments (7)

Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

Can you confirm that in the collapsed.group.txt file that the PBID PB.10113.15 has multiple members? If so, then this discrepancy comes from collapsing multiple redundant sequences that have the same exonic structure but may vary slightly in the exact 5'/3' ends. The GFF file will display the longest 5' and 3' sites (which may be attributed by different collapsed members) but for rep.fa it is showing just one of the sequences from that collapsed set of members. Hence the small difference.

Another possibility of the difference - especially in the case of the sequence you posted - is difference in the transcript sequence and mapped alignment. The end of the sequence is all "A"s which may be untrimmed polyA sequences. They would hence not be mapped to the genome - and result in a shorter exonic length - then the sequence itself.

--Liz

from cdna_cupcake.

XiaoyuZhan520 commented on June 22, 2024

Hello,

Thanks for the quick answer.

Yes, indeed, PB.10113.15 was collapsed from multiple transcripts (around 20). And I can find a exactly same one from the non-collapsed transcripts.

According to the information above, can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?

Besides, could you please tell me the difference between collapsed.gff file and collapsed.gff.unfuzzy, as well as collapsed.group.txt and collapsed.group.txt.unfuzzy file.

from cdna_cupcake.

Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?

Do you mean: if you reconstitute the sequence form the genome according to GFF, will it match exactly (minus the small difference in length) the representative sequence?

Yes and No. It will match exactly except the PacBio transcript may contain SNPs not contained in the reference genome. Possibly, there may be a very small number of residual errors.

re: what is ".unfuzzy" files
The collapse script has a --fuzzy_junction parameter. That is, two transcripts that have the same mapped exonic structures except the junctions are different by a small amount (default: 5 bp) will still be considered identical and collapsed. Most likely, small number of junction diff (5 bp or less) are caused by residual errors not well aligned by GMAP or other aligners.

--Liz

from cdna_cupcake.

XiaoyuZhan520 commented on June 22, 2024

Thank you very much. Really appreciate for your help

from cdna_cupcake.

XiaoyuZhan520 commented on June 22, 2024

Hello Liz，

I found it quite confusing to collapse the transcripts and I hope you can provide me any idea.

For example, transcripts PB.6923.1, which is 1324 nucleotide long. It was mapped to the genome with 124 nucleotide perfectly mapped, 1200 nucleotide mapped to N (gap). This transcript passed the judgement of -c 0.85 -i 0.85 and collapsed into one final transcript.

Do you think it is reasonable to act like this?

from cdna_cupcake.

Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

This is an odd case. Can you please share the following information:

the collapsed (.rep.fq) fasta/fastq sequence for PB.6923.1
the GFF annotation for it PB.6923.1
the line that starts with PB.6923.1 in *.collapsed.group.txt

You may share this privately by sending it to [email protected].

Thanks,
--Liz

from cdna_cupcake.

Magdoll commented on June 22, 2024

Closing unless there is further information.

from cdna_cupcake.

collapsed.gff file is not consistent with collapsed.rep.fa file about cdna_cupcake HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs