GithubHelp home page GithubHelp logo

tian-dechao / diffdomain Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 3.0 10.65 MB

DiffDomain is a statistically sound method for detecting differential TADs between conditions

License: MIT License

Python 80.80% Makefile 0.07% R 19.14%
tads 3d-genome differential hi-c python r

diffdomain's Introduction

diffDomain

A short description

diffDomain is a new computational method for identifying reorganized TADs using chromatin contact maps from two biological conditions.

A long description diffDomain

The workflow of diffDomain is illustrated down below.

The goal is to test if a TAD identified in one biological condition has structural changes in another biological condition.

The core of diffDomain is formulating the problem as a hypothesis testing problem where the null hypothesis is that the TAD doesn't undergo significant structural reorganization at later condition. The input are Hi-C contact matrices of the TAD region in the two biological conditions (A). The Hi-C contact matrices are log-transformed to adjust for the exponential decay of Hi-C contacts between chromosome bins with increased distances.

Their entry-wise difference is calculated (B).

The difference matrix D is normalized by iteratively standardizing its k-off diagonal parts, -N+2 <= k <= N-2, adjusting absolute differences in contact frequencies due to different sequencing depths in the two biological conditions (C).

Note that, standardization is TAD-specific. Each TAD has its own parameters that are only estimated from its contact matrices in a pair of biological conditions.

Intuitively, if a TAD is not significantly reorganized, normalized D would resemble a random matrix with white noise entries, enabling us to borrow theoretical results in random matrix theory. Indeed, normalized D is a generalized Wigner matrix (D), a well studied high-dimensional random matrices.

Its largest singular value is proved to be fluctuating around 2 under the null hypothesis. Armed with the fact, diffDomain reformulates the reorganized TAD identification problem into a hypothesis testing problem:

  1. H0: the largest singular value equals to 2;
  2. H1: the largest singular value is greater than 2.

For a user given set of TADs, P values are adjusted for multiple comparisons using BH method as default.
Once we identify the subset of reorganized TADs, we classify them into six subtypes to aid biological analysis and interpretations (F).
A few examples of reorganized TADs identified by diffDomain in two datasets are shown in (G).

workflow

Installation instructions

diffDomain is tested on MacOS & Linux (Centos).

Dependences

diffDomain-py2 is dependent on

  • Python 2.7
  • hic-straw==0.0.6

diffDomain-py3 is dependent on

  • Python 3
  • hic-straw==1.3.1

and

  • cooler
  • hicexplorer
  • TracyWidom
  • pandas
  • numpy
  • docopt
  • tqdm
  • matplotlib
  • statsmodels
  • h5py
  • seaborn

Installation

First of all, we recommend you to have a package manager, such as conda, and create a new independent environment for diffDomain.

Method1: to install the conda environment

Step1:

git clone https://github.com/Tian-Dechao/diffDomain
cd diffDomain

Step2:

For Linux

conda env create --name diffdomain -f environment_linux.yml

For MacOS

conda env create --name diffdomain -f environment_macos.yml

Step3:

conda activate diffdomain

In this environment, all the need of diffDomain(Python3 version) have been installed.

Method2: to install python3 version from Pypi

pip install diffDomain-py3

Note: If you encounter errors when installing hicstraw that diffDomain relies on, you can use conda to install it:

conda install -c bioconda hic-straw

Method3: Docker image named guming5/diffdomain-centos7:v1

docker pull guming5/diffdomain-centos7:v1
docker run -it guming5/diffdomain-centos7:v1 /bin/bash
# shift to the normal user named work
su work
cd ~
source activate diffdomain

In this image, there is a contact conda environment named diffdomain (/home/work/.conda/envs/diffdomain) meeting all requests, in which you can use the diffDomain Python3 version directly.

Documentation

Please see the wiki for extensive documentation and example tutorials.

Contact information

More information please contact Dunming Hua at [email protected], Ming Gu at [email protected] or Dechao Tian at [email protected].

References

DOI

diffdomain's People

Contributors

dunminghua avatar luo-cpu avatar mingbao96 avatar shinohara-xiao avatar tian-dechao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

diffdomain's Issues

Multiprocessing error with .cool file

Hello,
I am trying to run the tool with the following command:

python diffdomains.py dvsd multiple /data/T1_10000.cool /data/T2_10000.cool /data/TADS_DM.bed --reso 10000

But after appearing to run fine for a while it throws the following error:

...
chr9:75210000:75430000
chr9:106940000:107100000
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "diffdomains.py", line 66, in
comp2domins_by_twtest_parallel(0)
File "diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/diffdomain-py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/diffDomain/diffdomain-py3/utils.py", line 181, in contact_matrix_from_hic
c = cooler.Cooler(f'{hic_norm}')
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/api.py", line 85, in init
self._refresh()
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/api.py", line 89, in _refresh
with open_hdf5(self.store, **self.open_kws) as h5:
File "/home/micromamba/envs/diffDomain3/lib/python3.6/contextlib.py", line 81, in enter
return next(self.gen)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/util.py", line 525, in open_hdf5
fh = h5py.File(fp, mode, *args, **kwargs)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/h5py/_hl/files.py", line 427, in init
swmr=swmr)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/h5py/_hl/files.py", line 190, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/micromamba/envs/diffDomain3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/diffDomain/diffdomain-py3/utils.py", line 338, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/diffDomain/diffdomain-py3/utils.py", line 181, in contact_matrix_from_hic
c = cooler.Cooler(f'{hic_norm}')
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/api.py", line 85, in init
self._refresh()
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/api.py", line 89, in _refresh
with open_hdf5(self.store, **self.open_kws) as h5:
File "/home/micromamba/envs/diffDomain3/lib/python3.6/contextlib.py", line 81, in enter
return next(self.gen)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/cooler/util.py", line 525, in open_hdf5
fh = h5py.File(fp, mode, *args, **kwargs)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/h5py/_hl/files.py", line 427, in init
swmr=swmr)
File "/home/micromamba/envs/diffDomain3/lib/python3.6/site-packages/h5py/_hl/files.py", line 190, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "diffdomains.py", line 76, in
result.append(i.get())
File "/home/micromamba/envs/diffDomain3/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
OSError: Unable to open file (file signature not found)

Any help will be highly appreciated!
Thanks :)

1 not found in the file using 'dvsd multiple'

diffDomain is working for "dvsd one", however I am unable to get "dvsd multiple" to work with 2 hic files and a bed file.

Using the code below, I get "1 not found in the file." repeated for the total number of TADs in the bed file, then the program hangs indefinitely.
If the chr column is populated with "chr1", I get the message below.
If the chr column is populated with "1", I get the message below.
If I explicitly add --chrn "1" to the command, I get the message below.
If I explicitly add --chrn "chr1" to the command, the program immediately finishes and the output file only has the # commented rows.

python /home/########/diffDomainENV/diffDomain/diffdomain-py3/diffdomains.py dvsd multiple Control.hic Treatment.hic Control_TADs_chr1.txt --ofile Ctrl.vs.Treatment.at.Ctrl_chr1.txt --hicnorm KR --ncore 1 --reso 10000 --min_nbin 5
diffdomains.py
1 not found in the file.
1 not found in the file.
1 not found in the file.
1 not found in the file.
etc
etc
etc.

Error when running diffDomain with .cool or .hic

Hello,
Thanks for this new tool. I am attempting to run diffDomain using the command python diffdomain-py3/diffdomains.py dvsd multiple input/h9_merged_30_25kb_25000.cool input/smpc_merged_30_25kb_25000.cool input/h9_merged_30_25kb_normKR.bed --reso 25000 --ofile output/ --oprefix hPSC_vs_FetalSMPC --oprefixFig hPSC_vs_FetalSMPC --hicnorm KR
but keep getting this error:
`multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/anaconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "diffdomain-py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f'])
File "/data/diffDomain/diffdomain-py3/utils.py", line 379, in comp2domins_by_twtest
Diffmatnorm = normDiffbyMeanSD(D=Diffmat)
File "/data/diffDomain/diffdomain-py3/utils.py", line 260, in normDiffbyMeanSD
b[k] = np.max(val1)
File "<array_function internals>", line 6, in amax
File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2755, in amax
keepdims=keepdims, initial=initial, where=where)
File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "diffdomain-py3/diffdomains.py", line 76, in
result.append(i.get())
File "/data/anaconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: zero-size array to reduction operation maximum which has no identity`

I installed using conda. I have tried both .mcool and cool files as well as .hic files. neither work. When using .hic files I receive the error:
Traceback (most recent call last): File "diffdomain-py3/diffdomains.py", line 54, in <module> tadb = loadtads(opts['<bed>'], sep=opts['--sep'], chrnum=opts['--chrn'], min_nbin=int(opts['--min_nbin']), reso=int(opts['--reso'])) File "/data/diffDomain/diffdomain-py3/utils.py", line 44, in loadtads tadb.iloc[:,1:3] = tadb.iloc[:,1:3].astype(int) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/generic.py", line 5815, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 418, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 327, in apply applied = getattr(b, f)(**kwargs) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 591, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1309, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1257, in astype_array values = astype_nansafe(values, dtype, copy=copy) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1095, in astype_nansafe result = astype_nansafe(flat, dtype, copy=copy, skipna=skipna) File "/data/anaconda3/envs/diffdomain/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1174, in astype_nansafe return lib.astype_intsafe(arr, dtype) File "pandas/_libs/lib.pyx", line 679, in pandas._libs.lib.astype_intsafe TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Not quite sure what I am doing wrong, but any help would be great!
Thanks!

Gained TADs in condition 2

Hi,

Thanks for this wonderful tool to see the TAD changes among 2 condition!
How to identity the gained loops in condition 2?

Thank you,
Pinpin Sui

Migrate to Python 3?

Hello,

Is there any plan to upgrade to Python 3.X, as some dependencies failed to install due to the lack of support for Python 2.7?
e.g. h5py==2.9,

What’s new in h5py 2.9

Support for old Python

Support for Python 3.3 has been dropped.
Support for Python 2.6 has been dropped.

How to use replicates?

Hi, I have replicates for the two conditions to test. How I can I use them? I don't see option to use replicates.

File input for Diffdomain

Hi! Thanks for this tool!
I have a question about the format of input file.
Only file.hic are accepted ? Or can we use an other format (like cool or mcool) with a trick ?
If not are you planning to add more format possibilities or to stay with only the hic format ?

Thank you in advance for your answers.

diffDomain installation

Hi,

We're facing a few failed attempts when trying to install diffdomain on our HPC cluster (from the container itself)...
For approach one, a conda environment, we tried our personal account on a standalone server with internet access, because it appears to access a mirror site is not in our firewall list. the error message with the command line
conda env create --name diffdomain -f environment_linux.yml

... CondaEnvException: Pip failed
Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.12.0 Requires-Python >=3.8; 0.12.0rc1 Requires-Python >=3.8; 0.12.1 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement diffdomain-py3==0.2.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4,
0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.1, 0.2.2)
ERROR: No matching distribution found for diffdomain-py3==0.2.0

For a container approach, I am able to pull a apptainer image on a login node on seadragon2 using the command:
apptainer pull docker://guming5/diffdomain-centos7:v1
But the instruction of using the container, " "In this image, there is a contact conda environment named diffdomain (/home/work/.conda/envs/diffdomain) meeting all requests, in which you can use the diffDomain Python3 version directly."
I failed to change to user 'work', it asks a password.

I was wondering if you could provide any suggestions to resolve the above errors.

Thank you!

Archit

DiffDomain v0.0.8 crashing with ImportError.

Hi,
I saw your updated DiffDomain for python3. https://pypi.org/project/DiffDomain/0.0.8/
I successfully install it on python 3.9.12 as well as all of the dependencies.
However, when I run the command below, the program crashes immediately with the trace-back below.
I look forward to using this program as the paper is very promising!

`python /home/#########/software/diffDomain_v0.0.8/lib/python3.9/site-packages/diffdomain/diffdomains.py dvsd multiple
Control.hic Treatment.hic Control.bed --hicnorm KR --reso 10000 --oprefix=Control.vs.Treatment

Traceback (most recent call last):
File "/home/#########/software/diffDomain_v0.0.8/lib/python3.9/site-packages/diffdomain/diffdomains.py", line 31, in
from .utils import comp2domins_by_twtest, loadtads, visualization
ImportError: attempted relative import with no known parent package`

output results

Hi. I campare condition1 to condition2. How to know the differential TAD (strength type) is more strength in condition1 or condition2 ?

Use of precaculated TAD_list

Hello again!
I have questions about the use ofTAD_list generated before the use of DiffDomain.
I don't really understand your documentation on that. You wrote on the main method :

python diffdomains.py dvsd multiple <hic0> <hic1> <tadlist_of_hic0.bed> [options] 

Questions :
1- can we put a <tadlist_of_hic1.bed> with it ? or it's just for hic0 ?
2- can you give an exemple of usable <tadlist_of_hic0.bed> ?
3- if we can't used <tadlist_of_hic1.bed>, how can we compare 2 samples hic with a TAD_list for each ?

Thank you in advance for your answers.

output file

Hi author:
The diffDomain is a useful tool. But there seems to be something strange about my results, all the TADs have no p-values, is it possible that the input file should provide the original matrix not the iced matrix.
thanks.
Uploading Snipaste_2024-04-23_11-33-10.png…

Question about dvsd multiple output

Screenshot 2024-01-25 at 12 12 16 PM

I attached a screenshot of the "dvsd multiple" output file, and I can see the first 4 columns are TAD information I provided as input, and the 6th column is the p-value as mentioned in the wiki. Could you explain what do the 5th column and 7th column mean?

Also, I provided 2127 TAD entries in the input file, why the output table only gives information of 1885 of them?

Thank you!

Error when using a mcool file as input

Hello again!

I have an issue while I was testing dvsd one.

I use as follow :

python ../diffDomain/diffdomain-py3/diffdomains.py dvsd one chr2 10000000 15000000 --reso 40000 data1.mcool data2.mcool

And I obtain the error :

Traceback (most recent call last):
  File "../diffDomain/diffdomain-py3/diffdomains.py", line 45, in <module>
    result = comp2domins_by_twtest(chrn=opts['<chr>'], start=int(opts['<start>']), end=int(opts['<end>']), reso=int(opts['--reso']), hicnorm=opts['--hicnorm'], fhic0=opts['<hic0>'], fhic1=opts['<hic1>'], min_nbin=int(opts['--min_nbin']), f=opts['--f'])
  File "../diffdomain-py3/utils.py", line 398, in comp2domins_by_twtest
    Diffmatnorm = normDiffbyMeanSD(D=Diffmat)
  File "../diffDomain/diffdomain-py3/utils.py", line 291, in normDiffbyMeanSD
    b[k] = np.max(val1)
  File "<__array_function__ internals>", line 180, in amax
  File "/env/products/python/3.8.11/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2791, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/env/products/python/3.8.11/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

It's seem that this issue isn't visible when you use a hic file.

Thank you in advance for your answers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.