GithubHelp home page GithubHelp logo

find_markers for RRBS about wgbs_tools HOT 9 CLOSED

nloyfer avatar nloyfer commented on August 15, 2024
find_markers for RRBS

from wgbs_tools.

Comments (9)

AndriesDeKoker avatar AndriesDeKoker commented on August 15, 2024 1

Hi there,

Thank you for bringing this bug to my attention. I found that the error message was originating from the scipy.stats.ttest_ind function when one of the input arguments was not 2D. This occurs when the target/background has only one sample (n=1).

I have fixed the issue in find_markers.py, and it should now be working as expected. Thank you again.

Hi,

I indeed uninstalled scipy and it ran without ttest.
I pulled this fix and am rerunning with scipy atm

Thanks!

from wgbs_tools.

nloyfer avatar nloyfer commented on August 15, 2024 1

Right, I accidentally broke something in beta_to_table.py in 7df331a (when dealing with the fragmentation warning), and fixed it a few hours later in f4031b8.
Sorry about that. Please pull again.

from wgbs_tools.

nloyfer avatar nloyfer commented on August 15, 2024 1

About the RRBS atlas. We selected regions that showed high coverage across multiple samples in some large RRBS cohort (METABRIC). Perhaps this way of choosing "RRBS regions" is not the best way. If you have your own set of regions where you know you are getting high coverage, you can make your own "RRBS atlas".
Once you have a bed file with such regions, you can intersect it with the "whole genome blocks", then find markers with this block file as reference

bedtools intersect -a blocks.s205.5CpG.bed.gz -b RRBS_regions.bed.gz -wa -u > blocks.5CpG.RRBS.bed
wgbstools find_markers --config_file CONFIG_FILE -b blocks.5CpG.RRBS.bed

aha, well, I guess that getting RRBS regions from a lot of datasets would do the job as well. I did it this way:

  1. insilico digest with MspI of hg38 (chr - start - stop in bed format)
  2. wgbstools convert --bed_file RRBS_regions.bed > RRBS_block_file

This is not the way to do it?

That works too. Intersecting RRBS_regions.bed with the blocks file might be an improvement, because it's possible some of the RRBS regions are not homogeneous (methylation-wise), and it's better to break them into smaller blocks

from wgbs_tools.

AndriesDeKoker avatar AndriesDeKoker commented on August 15, 2024 1

Thank you very much already!

And maybe just a small bug (or I am doing something wrong)
but calling top:25 or top:250 in the config file does not seem to do anything?

from wgbs_tools.

nloyfer avatar nloyfer commented on August 15, 2024

Hi there,

Thank you for bringing this bug to my attention. I found that the error message was originating from the scipy.stats.ttest_ind function when one of the input arguments was not 2D. This occurs when the target/background has only one sample (n=1).

I have fixed the issue in find_markers.py, and it should now be working as expected. Thank you again.

from wgbs_tools.

nloyfer avatar nloyfer commented on August 15, 2024

About the RRBS atlas.
We selected regions that showed high coverage across multiple samples in some large RRBS cohort (METABRIC).
Perhaps this way of choosing "RRBS regions" is not the best way. If you have your own set of regions where you know you are getting high coverage, you can make your own "RRBS atlas".

Once you have a bed file with such regions, you can intersect it with the "whole genome blocks", then find markers with this block file as reference

bedtools intersect -a blocks.s205.5CpG.bed.gz -b RRBS_regions.bed.gz -wa -u > blocks.5CpG.RRBS.bed
wgbstools find_markers --config_file CONFIG_FILE -b blocks.5CpG.RRBS.bed

from wgbs_tools.

AndriesDeKoker avatar AndriesDeKoker commented on August 15, 2024

About the RRBS atlas. We selected regions that showed high coverage across multiple samples in some large RRBS cohort (METABRIC). Perhaps this way of choosing "RRBS regions" is not the best way. If you have your own set of regions where you know you are getting high coverage, you can make your own "RRBS atlas".

Once you have a bed file with such regions, you can intersect it with the "whole genome blocks", then find markers with this block file as reference

bedtools intersect -a blocks.s205.5CpG.bed.gz -b RRBS_regions.bed.gz -wa -u > blocks.5CpG.RRBS.bed
wgbstools find_markers --config_file CONFIG_FILE -b blocks.5CpG.RRBS.bed

aha, well, I guess that getting RRBS regions from a lot of datasets would do the job as well.
I did it this way:

  1. insilico digest with MspI of hg38 (chr - start - stop in bed format)
  2. wgbstools convert --bed_file RRBS_regions.bed > RRBS_block_file

This is not the way to do it?

EDIT: I think I see the reasoning doing it not this way as you don't know if the RRBS-blocks are comethylated at this point, right?

from wgbs_tools.

AndriesDeKoker avatar AndriesDeKoker commented on August 15, 2024

wgbstools find_markers is acting somewhat strange at the moment, giving lots of NA-markers?

#chr start end startCpG endCpG target region lenCpG bp tg_mean bg_mean delta_means delta_quants delta_maxmin ttest direction NA NA NA NA NA Adipocytes NA nanCpGs nanbp 0.161 0.842 0.681 0.0271 -0.162 1.03e-08 U chr1 1.84e+08 1.84e+08 1.77e+06 1.77e+06 Adipocytes chr1:183870545.0-183870631.0 5.0CpGs 86.0bp 0.142 0.809 0.667 0.0138 -0.0822 4.93e-07 U chr4 1.43e+08 1.43e+08 7.36e+06 7.36e+06 Adipocytes chr4:143282734.0-143282862.0 8.0CpGs 128.0bp 0.225 0.88 0.655 0.2 -0.15 1.25e-11 U chr9 3.34e+07 3.34e+07 1.4e+07 1.4e+07 Adipocytes chr9:33402120.0-33402450.0 7.0CpGs 330.0bp 0.142 0.795 0.653 0.0775 -0.03141.29e-07 U chr2 1.27e+08 1.27e+08 3.58e+06 3.58e+06 Adipocytes chr2:127121016.0-127121256.0 5.0CpGs 240.0bp 0.224 0.87 0.646 0.231 -0.149 1.28e-12 U NA NA NA NA NA Adipocytes NA nanCpGs nanbp 0.24 0.874 0.634 0.0661 -0.148 1.25e-09 U NA NA NA NA NA Adipocytes NA nanCpGs nanbp 0.185 0.813 0.628 0.0202 -0.0675 5.61e-09 U NA NA NA NA NA Adipocytes NA nanCpGs nanbp 0.273 0.898 0.625 0.18 0.0341 1.52e-16 U NA NA NA NA NA Adipocytes NA nanCpGs nanbp 0.087 0.696 0.609 0.0424 -0.0713 2.82e-05 U

from wgbs_tools.

nloyfer avatar nloyfer commented on August 15, 2024

Fixed. Thank you

from wgbs_tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.