Hello, We are trying to use isoseq to detect new isoforms, and would like to know

Yes, I understand your logic. Thanks! <span class="email-hidden-to

Detection of FLNC reads with Isoseq3,about pacificbiosciences/pbbioconda

Comments (7)

armintoepfer commented on May 27, 2024

The cluster step trims polyA tails. It also identifies concatemers by searching for the SMRT Bell hairpin sequence.
UMIs are not supported. You must trim them before going into the isoseq pipeline. Clustering with UMIs is beyond this bioconda support channel.

from pbbioconda.

wyzhangMPI commented on May 27, 2024

Thanks for the explanation. Just a few more related questions: 1) Would that be possible for the cluster step to trim poly A tails with seldom sequencing errors, e.g. AAAAAAAAAAAAAGAAAAAAAAAAAAA. 2) The searching for concatemers (SMRT Bell hairpin sequence) is done in the middle of CCS reads, is that correct? The purpose of this option is to detect and remove chimerica reads, right? Also the detection and removal of wrong orientation reads (3p--3p or 5p---5p) from lima step is conducted as an alternative way for the removal of chimeric reads??? 3) If I understood correctly, the detection of poly A requires at least 20 times repeat of As. Also the no more than 10 over-hang bases is allowed. For instance, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAATCGCGATTGT can still be considered as polyA. Most of UMIs seem to be good, but I will also trim the UMIs on my own. 4) I am wondering for the intial CCS step, a "no-polish" option is recommened. Does that mean a subread was randomly assigned to represent that CCS, would any error correction step be done at this point? If "polish" option is applied, what more procudures will be done? 5) Isoseqs cluster would only output transcript with at least 2 supporting CCS, thus may introduce false negatives (i.e., the ones with only one supporting CCS). Thus we would more like to use FLNC.bam data for further analysis to get more results. Would that be possible to do "isoseq3 poilish' on the FLNC.bam data (all full-length non-chimerica data), instead of the clustered bam data? From: Armin Töpfer Date: 2018-11-18 20:08 To: PacificBiosciences/pbbioconda CC: wyzhangMPI; Author Subject: Re: [PacificBiosciences/pbbioconda] Detection of FLNC reads with Isoseq3 (#51) The cluster step trims polyA tails. It also identifies concatemers by searching for the SMRT Bell hairpin sequence. UMIs are not supported. You must trim them before going into the isoseq pipeline. Clustering with UMIs is beyond this bioconda support channel. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread..

from pbbioconda.

armintoepfer commented on May 27, 2024

1] We trained a HMM for polyA detection, residual errors are allowed.
2a] SMRT Bell hairpin sequence detection is performed across the whole read.
2b] Wrong orientations are not written to the output file by lima, so they don't even make it into the clustering step.
3] Yes, at least 20 As, but if the read has to start with a polyA. The Viterbi may allow a few residuals, but likely not your full UMI.
4] no-polish takes the subreads, creates a partial-order-alignment, and calls a consensus sequence. This is noisier, but of sufficient quality, compared to the much longer taking polished CCS. Motivation is purely speed. You can also take fully polished CCS as input.
5] I don't trust a single molecule. I'd rather be concerned by a ton of FPs than your FNs. To counter the FN argument, sequence more. If you use polished CCS as input, there is no need to polish the FLNCs. In the upcoming version 3.1, we will introduce a pre-processing step that has CCS as input and generates FLNCs that will be used for clustering.

from pbbioconda.

wyzhangMPI commented on May 27, 2024

Thanks for the clear response. Just one more question at this moment: Regarding to the removal of chimeric transcripts, the lima step would only search from 5' and 3' primers at both ends, right? I mean if there are chimeric reads (not biologically meaningful), there might be multiple primers or polyAs (before the adding of SMRT bell hairpin adapter) in the middle of the reads. Is there any way to remove the chimeric reads for this purpose?

…

------ Original Message ------ From: "Armin Töpfer" <[email protected]> To: "PacificBiosciences/pbbioconda" <[email protected]> Cc: "wyzhangMPI" <[email protected]>; "Author" <[email protected]> Sent: 11/18/2018 10:43:37 PM Subject: Re: [PacificBiosciences/pbbioconda] Detection of FLNC reads with Isoseq3 (#51)

We trained a HMM for polyA detection, residual errors are allowed. 2a) SMRT Bell hairpin sequence detection is performed across the whole read. 2b) Wrong orientations are not written to the output file by lima, so they don't even make it into the clustering step.Yes, at least 20 As, but if the read has to start with a polyA. The Viterbi may allow a few residuals, but likely not your full UMI.no-polish takes the subreads, creates a partial-order-alignment, and calls a consensus sequence. This is noisier, but of sufficient quality, compared to the much longer taking polished CCS. Motivation is purely speed. You can also take fully polished CCS as input.I don't trust a single molecule. I'd rather be concerned by a ton of FPs than your FNs. To counter the FN argument, sequence more. If you use polished CCS as input, there is no need to polish the FLNCs. In the upcoming version 3.1, we will introduce a pre-processing step that has CCS as input and generates FLNCs that will be used for clustering. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AbjeYACZHgLA6QZ7iXbNM6DVtMGfD16tks5uwdSJgaJpZM4Yh3qT>.

from pbbioconda.

armintoepfer commented on May 27, 2024

from pbbioconda.

wyzhangMPI commented on May 27, 2024

Yes, I understand your logic. Thanks!

…

------ Original Message ------ From: "Armin Töpfer" <[email protected]> To: "PacificBiosciences/pbbioconda" <[email protected]> Cc: "wyzhangMPI" <[email protected]>; "Author" <[email protected]> Sent: 11/19/2018 4:16:26 PM Subject: Re: [PacificBiosciences/pbbioconda] Detection of FLNC reads with Isoseq3 (#51)

That's exactly the reason I don't trust a single molecule. What are the chances that you detect the same chimeric read twice? The human genome contains homopolymer A stretches >20bp, so looking for polyA in the middle of the read is not the best approach. One could look for primers, but I don't, because as I said initially, those molecules will unlikely form clusters. My goal is to create meaningful clustering results. If your goal is to refine FLNCs without running clustering then you are on your own for now. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AbjeYK-vdptYeDAiN06hnQkUskvdyHSRks5uwstKgaJpZM4Yh3qT>.

from pbbioconda.

armintoepfer commented on May 27, 2024

If you happen to find a massive amount of chimeric reads, go and find your sample prep person :)

from pbbioconda.

Detection of FLNC reads with Isoseq3 about pbbioconda HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs