GithubHelp home page GithubHelp logo

Comments (6)

Magdoll avatar Magdoll commented on July 20, 2024

Hi @WayneForever ,

Yes you are correct that the estimation of genes (technically, just "loci") changes.

This is because the collapse script doesn't actually know where genes are since no annotation is used. It simply creates "loci", where the definition is "a collection of overlapping transcripts on the same strand, even if they only overlap by 1 bp"

So, with more transcripts, it's possible now there are more overlaps and hence fewer "genes".

--Liz

from cdna_cupcake.

WayneForever avatar WayneForever commented on July 20, 2024

Thanks for your reply, Liz.

from cdna_cupcake.

WayneForever avatar WayneForever commented on July 20, 2024

Hi, @Magdoll

Sorry for bothering you again. There is another thing that I don't understand.

This project follows the Isoseq2 from Pacbio where 16 cells were sequenced. In total, I got 1,267,862 reads and 288,773 isoforms based on 'collapse_isoforms_by_sam.py' script. The resulting file 'all.collapsed.group.txt' shows that some loci (The gene I mentioned above) has even hundreds of isoforms such as PB.1.259 or PB.63.242.

In most circumstances, this situation seldom happens because the number of isoforms in one gene shouldn't be that much. Can this situation happen if my input of the script is correct?

My script runs as follows, and the file 'rm_fusion.mapped.sorted.sam' is the gmap.sam without fusion gene:
python collapse_isoforms_by_sam.py --input NGS_corrected.fasta -s rm_fusion.mapped.sorted.sam -o all

Sorry again for my bothering and thank you very much.

from cdna_cupcake.

Magdoll avatar Magdoll commented on July 20, 2024

Hi @WayneForever ,

So you had 288,773 HQ isoforms (after running Iso-Seq2 pipeline) and then ran collapsed?

I would look at the before (rm_fusion.mapped.sorted.sam) and after (all.collapsed.gff) on a browser (ex: IGV) and look if the high splicing diversity for those genes is really there. It's entirely possible, especially for neural genes in human, to have a lot of isoforms. One of the first Iso-Seq papers found 247 neurexin 1 isoforms: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977267/

--Liz

from cdna_cupcake.

WayneForever avatar WayneForever commented on July 20, 2024

The 288,773 isoforms are the results of collapse isoform. My input is the quivered fasta after Iso-Seq2 pipeline. The species of the project is Macaca.
Now I'm thinking that this result may be caused by high redundancy in the quivered fasta. And I'm try to reduce redundancy now. Thanks anyway. :)

from cdna_cupcake.

Magdoll avatar Magdoll commented on July 20, 2024

Good to hear. Let me know what you find out.

from cdna_cupcake.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.