Evaluating the performance of Remora for detection of DNA modifications.
- Separates canonical base calling & modified base calling
- Reduces overall time
- High accuracy
- Simple training dataset, so detection of rare mods like 5hmC is possible with high accuracy
5hmC is an oxidation product of 5mC. References - PMID:28769976, PMID:23634848
Figure represents the workflow for detection of 5mC
and 5hmC
mods using guppy basecaller using different remora modes.
Project was divided into 2 parts:
- Understanding the 2 modes of remora (
mC mode (SINGLE mode)
andmC+hmC (DUAL mode)
mode). - Using remora for detection of 5hmC modifications
2 major results of my project:
- Performance is highly similar for the 2 modes. DUAL mode does no worse than the single mode in detecting 5mC modifications. (Benchmarked using HG002)
- 5hmC mods are clustered in the centromeres and telomeres. Observed in the T2T-CHM13 reference genome.
- How the biological function of 5hmC aligns with seeing it clustered in centromeres and telomeres.
- See whether this trend is observed outside the X chromosome as well, since X has unique epigenetic characteristics.
- Use T2T-CHM13 to build an epigenetic map of 5hmC across the centromeres and telomeres in more fine detail. Done for 5mC (Gershman et al, 2022)
- Extend analysis to study 5hmC patterns across all of the HPRC assemblies to see how 5hmC patterns differ amongst individuals.