GithubHelp home page GithubHelp logo

Alignment mode for RRBS about gembs HOT 10 OPEN

heathsc avatar heathsc commented on August 13, 2024
Alignment mode for RRBS

from gembs.

Comments (10)

heathsc avatar heathsc commented on August 13, 2024

from gembs.

berguner avatar berguner commented on August 13, 2024

Hi Simon,

I am not really an expert on this topic but I will try to explain a bit more. We have been using BSMAP which basically restricts the genome into the regions starting with restriction patterns. Therefore it does not map outside these regions and assigns the reads which were not enriched as unmapped. This is not really the ideal solution because it does not report mappings outside of restrictions sites and also does not report whether reads are multimaps or not. However it has better performance because it only searches a fraction of the genome and we can evaluate how well had the enrichment worked based on the amount of reads aligned onto the restriction sites.

In my opinion a good solution should be able to:
1- Differentiate reads starting with the restriction patterns and map them accordingly,
2- Also map un-enriched (non-restricted) reads,
3- Correctly score/resolve multimaps,
4- Report enrichment rates based on the amount of reads conforming to enrichment criteria (starting with restriction pattern and also mapping onto the restriction sites of the genome).

We can imagine the reads starting with the restriction pattern (MspI:CGG/TGG, Taq1:CGA/TGA) are carrying an extra base at the beginning (MspI:C-CGG/C-TGG, Taq1:T-CGA/T-TGA) and map them together with this imaginary base. These patterns should be given as program arguments. I believe adding this imaginary base should help prioritize the restriction sites as correct mapping regions and also should not disrupt the resolving of multimaps.

In our facility we primarily use MspI so most of the reads carry CGG/TGG pattern at their start position. In a good sample where the source DNA was not degraded more than %95 of the reads carry this pattern, so we can assume that they come from digested DNA fragments. You can see this on the FastQC per-base sequence content plot:
image

I hope this helps a little.

Best, Bekir

from gembs.

heathsc avatar heathsc commented on August 13, 2024

from gembs.

berguner avatar berguner commented on August 13, 2024

Hello Simon,

Thank you for looking into this, I think your solution will solve most of our problems.

BSMAP requires less CPU time and less memory when it is run in digestion/restriction mode in comparison to running in whole genome mode. In my experience, gem3-mapper was faster than BSMAP even if it was running in digestion mode. gem3-mapper requires more memory for searching whole genome, however this is not really a problem in our computing infrastructure. I would much prefer whole genome mapping capability rather than smaller memory footprint. So, I am not really concerned about the performance of gemBS.

Best,
Bekir

from gembs.

berguner avatar berguner commented on August 13, 2024

Hello Simon,

Did you get the chance to implement this feature?

from gembs.

heathsc avatar heathsc commented on August 13, 2024

from gembs.

berguner avatar berguner commented on August 13, 2024

That's great! If you need more data for testing I can always provide.

Best,
Bekir

from gembs.

heathsc avatar heathsc commented on August 13, 2024

from gembs.

berguner avatar berguner commented on August 13, 2024

You can use the data here:
test_data
Paired reads are 75bp and the single end reads are 50bp. The digestion enzyme is MspI so the digestion pattern is C-CGG.

Best, Bekir

from gembs.

berguner avatar berguner commented on August 13, 2024

Hi,

Is there any progress for this feature request? There are some large RRBS projects coming up and I would love to ditch our old pipeline and switch to gemBS for good.

Thanks again for looking into this,
Cheers,
Bekir

from gembs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.