GithubHelp home page GithubHelp logo

Comments (7)

armintoepfer avatar armintoepfer commented on May 27, 2024
  1. The cluster step trims polyA tails. It also identifies concatemers by searching for the SMRT Bell hairpin sequence.
  2. UMIs are not supported. You must trim them before going into the isoseq pipeline. Clustering with UMIs is beyond this bioconda support channel.

from pbbioconda.

wyzhangMPI avatar wyzhangMPI commented on May 27, 2024

from pbbioconda.

armintoepfer avatar armintoepfer commented on May 27, 2024

1] We trained a HMM for polyA detection, residual errors are allowed.
2a] SMRT Bell hairpin sequence detection is performed across the whole read.
2b] Wrong orientations are not written to the output file by lima, so they don't even make it into the clustering step.
3] Yes, at least 20 As, but if the read has to start with a polyA. The Viterbi may allow a few residuals, but likely not your full UMI.
4] no-polish takes the subreads, creates a partial-order-alignment, and calls a consensus sequence. This is noisier, but of sufficient quality, compared to the much longer taking polished CCS. Motivation is purely speed. You can also take fully polished CCS as input.
5] I don't trust a single molecule. I'd rather be concerned by a ton of FPs than your FNs. To counter the FN argument, sequence more. If you use polished CCS as input, there is no need to polish the FLNCs. In the upcoming version 3.1, we will introduce a pre-processing step that has CCS as input and generates FLNCs that will be used for clustering.

from pbbioconda.

wyzhangMPI avatar wyzhangMPI commented on May 27, 2024

from pbbioconda.

armintoepfer avatar armintoepfer commented on May 27, 2024

That's exactly the reason I don't trust a single molecule. What are the chances that you detect the same chimeric read twice? The human genome contains homopolymer A stretches >20bp, so looking for polyA in the middle of the read is not the best approach. One could look for primers, but I don't, because as I said initially, those molecules will unlikely form clusters. My goal is to create meaningful clustering results. If your goal is to refine FLNCs without running clustering then you are on your own for now.

from pbbioconda.

wyzhangMPI avatar wyzhangMPI commented on May 27, 2024

from pbbioconda.

armintoepfer avatar armintoepfer commented on May 27, 2024

If you happen to find a massive amount of chimeric reads, go and find your sample prep person :)

from pbbioconda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.