Comments (10)
from gembs.
Hi Simon,
I am not really an expert on this topic but I will try to explain a bit more. We have been using BSMAP which basically restricts the genome into the regions starting with restriction patterns. Therefore it does not map outside these regions and assigns the reads which were not enriched as unmapped. This is not really the ideal solution because it does not report mappings outside of restrictions sites and also does not report whether reads are multimaps or not. However it has better performance because it only searches a fraction of the genome and we can evaluate how well had the enrichment worked based on the amount of reads aligned onto the restriction sites.
In my opinion a good solution should be able to:
1- Differentiate reads starting with the restriction patterns and map them accordingly,
2- Also map un-enriched (non-restricted) reads,
3- Correctly score/resolve multimaps,
4- Report enrichment rates based on the amount of reads conforming to enrichment criteria (starting with restriction pattern and also mapping onto the restriction sites of the genome).
We can imagine the reads starting with the restriction pattern (MspI:CGG/TGG, Taq1:CGA/TGA) are carrying an extra base at the beginning (MspI:C-CGG/C-TGG, Taq1:T-CGA/T-TGA) and map them together with this imaginary base. These patterns should be given as program arguments. I believe adding this imaginary base should help prioritize the restriction sites as correct mapping regions and also should not disrupt the resolving of multimaps.
In our facility we primarily use MspI so most of the reads carry CGG/TGG pattern at their start position. In a good sample where the source DNA was not degraded more than %95 of the reads carry this pattern, so we can assume that they come from digested DNA fragments. You can see this on the FastQC per-base sequence content plot:
I hope this helps a little.
Best, Bekir
from gembs.
from gembs.
Hello Simon,
Thank you for looking into this, I think your solution will solve most of our problems.
BSMAP requires less CPU time and less memory when it is run in digestion/restriction mode in comparison to running in whole genome mode. In my experience, gem3-mapper was faster than BSMAP even if it was running in digestion mode. gem3-mapper requires more memory for searching whole genome, however this is not really a problem in our computing infrastructure. I would much prefer whole genome mapping capability rather than smaller memory footprint. So, I am not really concerned about the performance of gemBS.
Best,
Bekir
from gembs.
Hello Simon,
Did you get the chance to implement this feature?
from gembs.
from gembs.
That's great! If you need more data for testing I can always provide.
Best,
Bekir
from gembs.
from gembs.
You can use the data here:
test_data
Paired reads are 75bp and the single end reads are 50bp. The digestion enzyme is MspI so the digestion pattern is C-CGG.
Best, Bekir
from gembs.
Hi,
Is there any progress for this feature request? There are some large RRBS projects coming up and I would love to ditch our old pipeline and switch to gemBS for good.
Thanks again for looking into this,
Cheers,
Bekir
from gembs.
Related Issues (20)
- mapped bam files: SM key without value
- Logging error on call HOT 2
- how to set the control sequences?
- PBAT dataset failed in calling step HOT 3
- Can't see colors for ENCODE bigBed on UCSC browser HOT 4
- Scattering gemBS alignment step across a cluster HOT 1
- Conversion rate is N/A in QC HOT 7
- Latest version to Bioconda? HOT 9
- Dealing with overlapping reads HOT 3
- gemBS 3.5.1 issue at map stage of sample data HOT 10
- ValueError: Error while executing the Bisulfite bisulphite-mapping HOT 14
- Alignment Modes HOT 9
- Error when running gemBS map HOT 3
- HowTo enable GPU for read mapping? HOT 7
- mapping merge fails if one of the file_id's is equal to the Barcode
- The .bed file does not match the statistics of the REPORT
- Difference between bin/bs_call and gemBSbinaries/bs_call? HOT 3
- System.Error::Signal raised (no=11) using bs_call HOT 2
- snp extract "Invalid format" HOT 4
- bs_call finished with -6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gembs.