GithubHelp home page GithubHelp logo

Comments (7)

Magdoll avatar Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

Can you confirm that in the collapsed.group.txt file that the PBID PB.10113.15 has multiple members? If so, then this discrepancy comes from collapsing multiple redundant sequences that have the same exonic structure but may vary slightly in the exact 5'/3' ends. The GFF file will display the longest 5' and 3' sites (which may be attributed by different collapsed members) but for rep.fa it is showing just one of the sequences from that collapsed set of members. Hence the small difference.

Another possibility of the difference - especially in the case of the sequence you posted - is difference in the transcript sequence and mapped alignment. The end of the sequence is all "A"s which may be untrimmed polyA sequences. They would hence not be mapped to the genome - and result in a shorter exonic length - then the sequence itself.

--Liz

from cdna_cupcake.

XiaoyuZhan520 avatar XiaoyuZhan520 commented on June 22, 2024

Hello,

Thanks for the quick answer.

Yes, indeed, PB.10113.15 was collapsed from multiple transcripts (around 20). And I can find a exactly same one from the non-collapsed transcripts.

According to the information above, can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?

Besides, could you please tell me the difference between collapsed.gff file and collapsed.gff.unfuzzy, as well as collapsed.group.txt and collapsed.group.txt.unfuzzy file.

from cdna_cupcake.

Magdoll avatar Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?

Do you mean: if you reconstitute the sequence form the genome according to GFF, will it match exactly (minus the small difference in length) the representative sequence?

Yes and No. It will match exactly except the PacBio transcript may contain SNPs not contained in the reference genome. Possibly, there may be a very small number of residual errors.

re: what is ".unfuzzy" files
The collapse script has a --fuzzy_junction parameter. That is, two transcripts that have the same mapped exonic structures except the junctions are different by a small amount (default: 5 bp) will still be considered identical and collapsed. Most likely, small number of junction diff (5 bp or less) are caused by residual errors not well aligned by GMAP or other aligners.

--Liz

from cdna_cupcake.

XiaoyuZhan520 avatar XiaoyuZhan520 commented on June 22, 2024

Thank you very much. Really appreciate for your help

from cdna_cupcake.

XiaoyuZhan520 avatar XiaoyuZhan520 commented on June 22, 2024

Hello Liz,

I found it quite confusing to collapse the transcripts and I hope you can provide me any idea.

For example, transcripts PB.6923.1, which is 1324 nucleotide long. It was mapped to the genome with 124 nucleotide perfectly mapped, 1200 nucleotide mapped to N (gap). This transcript passed the judgement of -c 0.85 -i 0.85 and collapsed into one final transcript.

Do you think it is reasonable to act like this?

from cdna_cupcake.

Magdoll avatar Magdoll commented on June 22, 2024

Hi @XiaoyuZhan520 ,

This is an odd case. Can you please share the following information:

  1. the collapsed (.rep.fq) fasta/fastq sequence for PB.6923.1
  2. the GFF annotation for it PB.6923.1
  3. the line that starts with PB.6923.1 in *.collapsed.group.txt

You may share this privately by sending it to [email protected].

Thanks,
--Liz

from cdna_cupcake.

Magdoll avatar Magdoll commented on June 22, 2024

Closing unless there is further information.

from cdna_cupcake.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.