GithubHelp home page GithubHelp logo

Comments (18)

taoliu avatar taoliu commented on August 24, 2024 1

@mr-c I saw the similar error of 'JI=0.27(<0.99)' before: https://github.com/macs3-project/MACS/actions/runs/7945645854/job/21692539127 And then we patched the hmmlearn class in https://github.com/macs3-project/MACS/blob/master/MACS3/Signal/HMMR_HMM.pyx. I checked my tests through Github Actions and I found that I haven't tested exactly numpy1.24.2, scipy 1.11.4, scikit-learn 1.4.1.post1 combination. Let me try it...

from macs.

mr-c avatar mr-c commented on August 24, 2024 1

Here's a Dockerfile you can use to reproduce the build failure

First, check out our copy of MACS3 with the debian/ directory added:
git clone https://salsa.debian.org/med-team/macs.git

FROM docker.io/debian:sid-slim

COPY . /build
WORKDIR /build

RUN apt-get update && \
	apt-get -y build-dep .

RUN dpkg-buildpackage -uc -us -b

(edit: made Dockerfile simpler and more generic)

from macs.

taoliu avatar taoliu commented on August 24, 2024 1

@mr-c I don't know hmmlearns people, but I agree that we'd better ask them to have a new release -- even just a minor version update. The more I look at my simple overwritten code the more I dislike it.

I have an AI assistant to help me with debugging codes that I am not familiar with :)

from macs.

taoliu avatar taoliu commented on August 24, 2024

Hi @tillea, We had a similar issue due to the Numpy version. The same code of MACS will generate slightly different results in Numpy with version < 1.25 and version >=1.25. The current standard output for testing (cmdlinetest) is from Numpy 1.25. Did you see such an issue in other Python-based tools that depend on Numpy?

from macs.

tillea avatar tillea commented on August 24, 2024

from macs.

taoliu avatar taoliu commented on August 24, 2024

@tillea we tested using Github Action: https://github.com/macs3-project/MACS/actions/runs/7732237695/job/21083012078 for python3.11 and numpy 1.24.2, and the test passed. It may also relate to other dependencies as well. I think the only way to solve this is to relax the test -- we relaxed the precision of tests before. But this time, the differences in the result are at the peak coordinates -- usually 1bp difference. It's trivial to go through each peak coordination and allow a 1bp difference. Perhaps we can use other criteria to test that the results are 'similar' to the standard. Let me think...

from macs.

tillea avatar tillea commented on August 24, 2024

from macs.

taoliu avatar taoliu commented on August 24, 2024

@tillea Here are some updates. We have made a script to calculate the Jaccard index of two sets of peaks, so if there are some tiny differences between them, our testing script can tolerate them -- jaccard.py and a updated cmdlinetest. We also tested and decided to include scipy and scikit-learn into the list of dependencies since the HMM module (hmmlearn) needs them. hmmlearn asks for pretty old scikit-learn (version before 1.0) and it will cause inconsistency of hmmratac output from MACS3, so we explicitly require certain version of scikit-learn and scipy in pyproject.toml. We should release a 3.0.1 version with these changes.

Also, since we just figured out that the hmmratac output can be impacted by different scipy and scikit-learn version. Could you provide the versions of them in your testing system so that we can exactly reproduce the 'slightly different' output that you saw?

from macs.

tillea avatar tillea commented on August 24, 2024

Thanks a lot for relaxing the tests.
We are using

 scipy 1.11.4
 sklearn 1.2.1

currently. I think when the error showed up first time we were at scipy 1.10.
Usually you can see a full build log in our CI but this does not work for whatever reason for MACS. Sorry about this.

from macs.

taoliu avatar taoliu commented on August 24, 2024

Thanks! Finally, find the cause of inconsistent results from hmmratac. Please see the PR #620: in brief, hmmlearn needs to be patched hmmlearn/hmmlearn#545 because of the 1.3.0 update of sklearn where the results from Kmeans will be not consistent with the results from older sklearn. The idea is to do the random seeding for 10 times and pick the best one. We implemented the patch in the MACS3/Signal/HMMR_HMM.pyx for now and hopefully hmmratac will include this change, hmmlearn/hmmlearn#545, in the next release since it's already merged to its main branch. As a result, the difference due to sklearn 1.2 and 1.3 is small enough and can be tolerated by our new jaccard.py tool:

... success! Results are different but Jaccard Index is 1.0 (>0.99)

Will aim at releasing a new version MACS3 in the next week and please let me know if it can pass the test on debian med.

from macs.

taoliu avatar taoliu commented on August 24, 2024

@tillea MACS3 has been updated to v3.0.1 with changes to address this issue. Please let us know if it can pass the test on Debian.

from macs.

mr-c avatar mr-c commented on August 24, 2024

@taoliu Thank you for the v3.0.1 release! I'm seeing the following with Python 3.12:

  checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_accessible_regions.gappedPeak
 ... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.2 checking hmmratac hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak ...
  checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_bedpe_accessible_regions.gappedPeak
 ... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.3 checking hmmratac hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak ...
  checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_load_hmm_model_accessible_regions.gappedPeak
 ... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)
16.18.4 checking hmmratac hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak ...
  checking the files: ../temp/macs3.0.1-1-3.11_run_hmmratac/hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak vs standard_results_hmmratac/hmmratac_yeast500k_load_training_regions_accessible_regions.gappedPeak
 ... failed! Results are different and Jaccard Index is 0.2775891758917589 (<0.99)

The Python packages installed are

# python3.12 -m pip freeze
WARNING: Skipping /usr/lib/python3.12/dist-packages/numpy-1.24.2.egg-info due to invalid metadata entry 'name'
build==1.0.3
cykhash==2.0.0
Cython==3.0.8
decorator==5.1.1
hmmlearn==0.0.0
iniconfig==1.1.1
installer==0.7.0
joblib==1.3.2
numpy==1.24.2
packaging==23.2
pluggy==1.4.0
pyproject_hooks==1.0.0
pytest==7.4.4
scikit-learn==1.4.1.post1
SciPy==1.11.4
setuptools==68.1.2
threadpoolctl==3.1.0
toml==0.10.2
wheel==0.42.0
pyproject_hooks==1.0.0
pytest==7.4.4
scikit-learn==1.4.1.post1
SciPy==1.11.4
threadpoolctl==3.1.0
toml==0.10.2

Here are the temp directory contents: https://people.debian.org/~crusoe/macs3.0.1-1-tests_temp.tgz

from macs.

philippadoherty avatar philippadoherty commented on August 24, 2024

Hi @mr-c in our requirements we define hmmlearn>=0.3 so this hmmlearn==0.0.0 is strange to me, was hmmlearn not installed?

from macs.

mr-c avatar mr-c commented on August 24, 2024

Hi @mr-c in our requirements we define hmmlearn>=0.3 so this hmmlearn==0.0.0 is strange to me, was hmmlearn not installed?

Hmm.. maybe a pip issue. The installed version of hmmlearn is 0.3.0-4

/# dpkg -l python3-hmmlearn
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name             Version      Architecture Description
+++-================-============-============-===========================================================
ii  python3-hmmlearn 0.3.0-4      amd64        unsupervised learning and inference of Hidden Markov Models

from macs.

taoliu avatar taoliu commented on August 24, 2024

Hi Michael @mr-c, Could you let me know how you managed to let numpy1.24 be installed with python3.12 since 'distutils' has been removed in 3.12?

from macs.

mr-c avatar mr-c commented on August 24, 2024

@taoliu It seems that doesn't cause a problem for the Debian 1.24.2-2 package of numpy

https://salsa.debian.org/python-team/packages/numpy/-/blob/master/debian/changelog?ref_type=heads#L36

from macs.

taoliu avatar taoliu commented on August 24, 2024

@mr-c OK. Now I can reproduce the error and found the problem. The thing is that debian/python-hmmlearn has already incorporated some un-released patches from upstream (https://salsa.debian.org/med-team/python-hmmlearn/-/commit/2fe0fa06f874641b2b9ac16c6d7f038ecc9bef97), my simple patch on HMMR_HMM.pyx (I overrides an initiation function in hmmlearn.hmm.GaussianHMM) will have negative effects. Here is the solution. As for packaging MACS3 in Debian, Please patch the file MACS3/Signal/HMMR_HMM.pyx :

--- HMMR_HMM.old.pyx	2024-02-23 15:10:39
+++ HMMR_HMM.pyx	2024-02-23 15:11:09
@@ -90,7 +90,7 @@
     # according to base documentation, if init_prob not stated, it is set to be equally likely for any state (1/ # of components)
     # if we have other known parameters, we should set these (ie: means_weights, covariance_type etc.)
     rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(random_seed)))
-    hmm_model = GaussianHMM_modified( n_components= n_states, covariance_type = covar, random_state = rs, verbose = False )
+    hmm_model = hmm.GaussianHMM( n_components= n_states, covariance_type = covar, random_state = rs, verbose = False )
     hmm_model = hmm_model.fit( training_data, training_data_lengths )
     assert hmm_model.n_features == 4
     return hmm_model
@@ -121,7 +121,7 @@
 cpdef list hmm_model_init( str model_file ):
     with open( model_file ) as f:
         m = json.load( f )
-        hmm_model = GaussianHMM_modified( n_components=3, covariance_type=m["covariance_type"] )
+        hmm_model = hmm.GaussianHMM( n_components=3, covariance_type=m["covariance_type"] )
         hmm_model.startprob_ = np.array(m["startprob"])
         hmm_model.transmat_ = np.array(m["transmat"])
         hmm_model.means_ = np.array(m["means"])

Now I can let the test pass and build the deb on my Linux machine using the docker:

dpkg-deb: building package 'macs' in '../macs_3.0.1-1_amd64.deb'.
dpkg-deb: building package 'macs-dbgsym' in '../macs-dbgsym_3.0.1-1_amd64.deb'.
Removing intermediate container 80933f5ee889
 ---> 23910831a682
Successfully built 23910831a682
Successfully tagged test_debian_macs3:latest

By the way, I need to add this line in Dockerfile before quilt push -a to set the path to the patches:

ENV QUILT_PATCHES=debian/patches

from macs.

mr-c avatar mr-c commented on August 24, 2024

@taoliu Thank you for the patch and for having such good test coverage to catch these issues! I can confirm the fix, this issue can be closed.

Do you know the hmmlearn people? Seems like they need a new release..

Sorry for forgetting the QUILT_PATCHES trick, I'm impressed that you figured that out!

from macs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.