Comments (18)
@mr-c I saw the similar error of 'JI=0.27(<0.99)' before: https://github.com/macs3-project/MACS/actions/runs/7945645854/job/21692539127 And then we patched the hmmlearn class in https://github.com/macs3-project/MACS/blob/master/MACS3/Signal/HMMR_HMM.pyx. I checked my tests through Github Actions and I found that I haven't tested exactly numpy1.24.2, scipy 1.11.4, scikit-learn 1.4.1.post1 combination. Let me try it...
from macs.
Here's a Dockerfile
you can use to reproduce the build failure
First, check out our copy of MACS3 with the debian/
directory added:
git clone https://salsa.debian.org/med-team/macs.git
FROM docker.io/debian:sid-slim
COPY . /build
WORKDIR /build
RUN apt-get update && \
apt-get -y build-dep .
RUN dpkg-buildpackage -uc -us -b
(edit: made Dockerfile simpler and more generic)
from macs.
@mr-c I don't know hmmlearns people, but I agree that we'd better ask them to have a new release -- even just a minor version update. The more I look at my simple overwritten code the more I dislike it.
I have an AI assistant to help me with debugging codes that I am not familiar with :)
from macs.
Hi @tillea, We had a similar issue due to the Numpy version. The same code of MACS will generate slightly different results in Numpy with version < 1.25 and version >=1.25. The current standard output for testing (cmdlinetest
) is from Numpy 1.25. Did you see such an issue in other Python-based tools that depend on Numpy?
from macs.
from macs.
@tillea we tested using Github Action: https://github.com/macs3-project/MACS/actions/runs/7732237695/job/21083012078 for python3.11 and numpy 1.24.2, and the test passed. It may also relate to other dependencies as well. I think the only way to solve this is to relax the test -- we relaxed the precision of tests before. But this time, the differences in the result are at the peak coordinates -- usually 1bp difference. It's trivial to go through each peak coordination and allow a 1bp difference. Perhaps we can use other criteria to test that the results are 'similar' to the standard. Let me think...
from macs.
from macs.
@tillea Here are some updates. We have made a script to calculate the Jaccard index of two sets of peaks, so if there are some tiny differences between them, our testing script can tolerate them -- jaccard.py and a updated cmdlinetest. We also tested and decided to include scipy and scikit-learn into the list of dependencies since the HMM module (hmmlearn) needs them. hmmlearn
asks for pretty old scikit-learn (version before 1.0) and it will cause inconsistency of hmmratac
output from MACS3, so we explicitly require certain version of scikit-learn and scipy in pyproject.toml. We should release a 3.0.1 version with these changes.
Also, since we just figured out that the hmmratac
output can be impacted by different scipy
and scikit-learn
version. Could you provide the versions of them in your testing system so that we can exactly reproduce the 'slightly different' output that you saw?
from macs.
Thanks a lot for relaxing the tests.
We are using
scipy 1.11.4
sklearn 1.2.1
currently. I think when the error showed up first time we were at scipy 1.10.
Usually you can see a full build log in our CI but this does not work for whatever reason for MACS. Sorry about this.
from macs.
Thanks! Finally, find the cause of inconsistent results from hmmratac
. Please see the PR #620: in brief, hmmlearn
needs to be patched hmmlearn/hmmlearn#545 because of the 1.3.0 update of sklearn where the results from Kmeans will be not consistent with the results from older sklearn. The idea is to do the random seeding for 10 times and pick the best one. We implemented the patch in the MACS3/Signal/HMMR_HMM.pyx for now and hopefully hmmratac
will include this change, hmmlearn/hmmlearn#545, in the next release since it's already merged to its main branch. As a result, the difference due to sklearn 1.2 and 1.3 is small enough and can be tolerated by our new jaccard.py
tool:
... success! Results are different but Jaccard Index is 1.0 (>0.99)
Will aim at releasing a new version MACS3 in the next week and please let me know if it can pass the test on debian med.
from macs.
@tillea MACS3 has been updated to v3.0.1 with changes to address this issue. Please let us know if it can pass the test on Debian.
from macs.
@taoliu Thank you for the v3.0.1 release! I'm seeing the following with Python 3.12:
checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_accessible_regions.gappedPeak
... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.2 checking hmmratac hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak ...
checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak
... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.3 checking hmmratac hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak ...
checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak
... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.4 checking hmmratac hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak ...
checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak
... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
The Python packages installed are
# python3.12 -m pip freeze
WARNING: Skipping /usr/lib/python3.12/dist-packages/numpy-1.24.2.egg-info due to invalid metadata entry 'name'
build==1.0.3
cykhash==2.0.0
Cython==3.0.8
decorator==5.1.1
hmmlearn==0.0.0
iniconfig==1.1.1
installer==0.7.0
joblib==1.3.2
numpy==1.24.2
packaging==23.2
pluggy==1.4.0
pyproject_hooks==1.0.0
pytest==7.4.4
scikit-learn==1.4.1.post1
SciPy==1.11.4
setuptools==68.1.2
threadpoolctl==3.1.0
toml==0.10.2
wheel==0.42.0
pyproject_hooks==1.0.0
pytest==7.4.4
scikit-learn==1.4.1.post1
SciPy==1.11.4
threadpoolctl==3.1.0
toml==0.10.2
Here are the temp directory contents: https://people.debian.org/~crusoe/macs3.0.1-1-tests_temp.tgz
from macs.
Hi @mr-c in our requirements we define hmmlearn>=0.3
so this hmmlearn==0.0.0
is strange to me, was hmmlearn not installed?
from macs.
Hi @mr-c in our requirements we define
hmmlearn>=0.3
so thishmmlearn==0.0.0
is strange to me, was hmmlearn not installed?
Hmm.. maybe a pip
issue. The installed version of hmmlearn
is 0.3.0-4
/# dpkg -l python3-hmmlearn
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-================-============-============-===========================================================
ii python3-hmmlearn 0.3.0-4 amd64 unsupervised learning and inference of Hidden Markov Models
from macs.
Hi Michael @mr-c, Could you let me know how you managed to let numpy1.24 be installed with python3.12 since 'distutils' has been removed in 3.12?
from macs.
@taoliu It seems that doesn't cause a problem for the Debian 1.24.2-2 package of numpy
from macs.
@mr-c OK. Now I can reproduce the error and found the problem. The thing is that debian/python-hmmlearn has already incorporated some un-released patches from upstream (https://salsa.debian.org/med-team/python-hmmlearn/-/commit/2fe0fa06f874641b2b9ac16c6d7f038ecc9bef97), my simple patch on HMMR_HMM.pyx (I overrides an initiation function in hmmlearn.hmm.GaussianHMM) will have negative effects. Here is the solution. As for packaging MACS3 in Debian, Please patch the file MACS3/Signal/HMMR_HMM.pyx
:
--- HMMR_HMM.old.pyx 2024-02-23 15:10:39
+++ HMMR_HMM.pyx 2024-02-23 15:11:09
@@ -90,7 +90,7 @@
# according to base documentation, if init_prob not stated, it is set to be equally likely for any state (1/ # of components)
# if we have other known parameters, we should set these (ie: means_weights, covariance_type etc.)
rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(random_seed)))
- hmm_model = GaussianHMM_modified( n_components= n_states, covariance_type = covar, random_state = rs, verbose = False )
+ hmm_model = hmm.GaussianHMM( n_components= n_states, covariance_type = covar, random_state = rs, verbose = False )
hmm_model = hmm_model.fit( training_data, training_data_lengths )
assert hmm_model.n_features == 4
return hmm_model
@@ -121,7 +121,7 @@
cpdef list hmm_model_init( str model_file ):
with open( model_file ) as f:
m = json.load( f )
- hmm_model = GaussianHMM_modified( n_components=3, covariance_type=m["covariance_type"] )
+ hmm_model = hmm.GaussianHMM( n_components=3, covariance_type=m["covariance_type"] )
hmm_model.startprob_ = np.array(m["startprob"])
hmm_model.transmat_ = np.array(m["transmat"])
hmm_model.means_ = np.array(m["means"])
Now I can let the test pass and build the deb
on my Linux machine using the docker:
dpkg-deb: building package 'macs' in '../macs_3.0.1-1_amd64.deb'.
dpkg-deb: building package 'macs-dbgsym' in '../macs-dbgsym_3.0.1-1_amd64.deb'.
Removing intermediate container 80933f5ee889
---> 23910831a682
Successfully built 23910831a682
Successfully tagged test_debian_macs3:latest
By the way, I need to add this line in Dockerfile before quilt push -a
to set the path to the patches:
ENV QUILT_PATCHES=debian/patches
from macs.
@taoliu Thank you for the patch and for having such good test coverage to catch these issues! I can confirm the fix, this issue can be closed.
Do you know the hmmlearn people? Seems like they need a new release..
Sorry for forgetting the QUILT_PATCHES
trick, I'm impressed that you figured that out!
from macs.
Related Issues (20)
- Q: lambda estimation
- Bug: Test scripts failing HOT 1
- Mention alternative installation methods in install guide
- Q: Is there a way to specify blacklisted regions in MACS2?
- Q: Defining the bin size when generating bedgraphs.
- Bug: pip install error: subprocess-exited-with-error HOT 2
- What is the difference between "absolute peak summit" and "summit position" in narrowPeak format? HOT 1
- Q: conversion of output files to UCSC track hub format? HOT 1
- Feat: Enhanced clarification of the specific definition of BEDPE as used by MACS HOT 1
- Feat: Reduction of memory consumption HOT 3
- Q: HMMRATAC output and options for _summit.bed HOT 1
- Feat: Built-in scoring for HMMRATAC HOT 1
- Q: HMMRATAC producing too many peaks HOT 10
- Bug: undefined symbol: __pow_finite HOT 3
- Q: HMMRATAC reproducibility HOT 2
- Q: how to use MACS3 for ATAC seq with "himmratac" option? HOT 2
- Bug: Callvar throws an error when run on broadPeak file HOT 15
- Q: impact of sequencing throughput on peak calling HOT 4
- Setting cut-off when using logFE bigWig files from bdgcmp
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from macs.