ucl-pond / pysustain Goto Github PK
View Code? Open in Web Editor NEWSubtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
License: MIT License
Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
License: MIT License
I am in the process of applying the mixture_KDE version of Sustain to an external dataset that contains cross-sectional cognitive test score data for several thousand patients. I have been trying to modify the simrun.py function, but I'm running into a few conceptual and technical roadblocks. For one, we don't have control group data, which to my understanding is acceptable for the mixture_KDE model. Without controls, I'm wondering if the random assignment approach from simrun.py for generating ground_truth_sequences, ground_truth_subtypes, ground_truth_stages_control, and ground_truth_stages_other is the appropriate first step? In the pySustain white paper it says,
Within simrun.py, simulated subjects assigned earliest stages are used as controls and those in latest stages as cases.
I don't quite understand how this transfers onto applying Sustain on real data?
In my script, after generating the random ground truth sequences etc., I comment out this line
data, data_denoised = generate_data_mixture_sustain(ground_truth_subtypes, ground_truth_stages, ground_truth_sequences, sustainType)
and use the numpy array of my own data, which is in the exact same shape as what would be generated by the above line of code. However, I am receiving a LinAlgError: Singular matrix error when running this line:
mixtures = fit_all_kde_models(true_data, labels)
.
I think having a clearer example script of a mixture_KDE implementation with real data would be very useful in helping me answer some of my questions. Please let me know if there are any resources that you could share with me that might be helpful, or if you could address some of my issues directly.
I can also share my current working script if it would be of any help. Thanks!
Hi,
Is there a notebook for reproducing the ordinal sustain simulations/implementation (as mentioned in ordinal sustain article)?
The current notebooks doesn't support ordinal Sustain and I am not sure how to apply it on external data set.
Hello!
I'm trying to run PySustain with my data (1129 observations and 38 biomarkers), but maybe for the high number of biomarkers, the algorithm does not move forward (even after 10 hours) on the print "Finding ML solution to 1 cluster problem". I found, inserting some print into the code to debug it, that the "heavy" code is in AbstractSustain.py into the _find_ml(): in particular for these lines of code:
partial_iter = partial(self._find_ml_iteration, sustainData)
pool_output_list = self.pool.map(partial_iter, range(self.N_startpoints))
if ~isinstance(pool_output_list, list):
pool_output_list = list(pool_output_list)
I think that the map is very slow: the execution hangs on "list(pool_output_list)".
Do you have any idea how to resolve this problem ? I tried also generating simulated data (with 1129 observations and 38 biomarkes) but nothing happened.
Thank you in advance.
I get divide by zero errors relating to model likelihood, which I tracked back to missing data causing problems with max()
and min()
, etc. Couldn't fix it with numpy.nanmax()
, so we probably need to devise a robust method for handling missing data.
Hi SuStaIn team!
I am trying to use SuStaIn with a train / test like approach, in which I have two dataset:
run_sustain_algorithm
method, if i'm correct.So it seems to me that this makes sense from a methodological point of view (but I could be mistaken π ).
Now I don't seem to find exactly how I would proceed to perform this last step, given the output from the first step. I went back to the notebook from the workshop (that I had followed some time ago) and it looks to me that the presented cross_validate_sustain_model
mainly focuses on cross validation metrics, rather than outputting the subtypes corresponding to the "test" subtypes.
I am sorry if this is treated somewhere that I have missed, and don't hesitate if the question is somewhat unclear, I'm happy to rephrase or go more into details π
Cheers,
Nemo
Can we model SuStaIn in probabilistic modelling library like pymc3, also can we use Variational Inference instead of MCMC?
Hi, pySuStaIn is a great work, thanks for your effort!
I hope to use SuStaIn to subtype patients from our private Alzheimer's Disease structure MRI dataset.
However, I find it hard to implement the data preparation part (i.e. how to obtain z-scores from the raw MRI images).
Could you share your AD MRI data preparation code? or maybe provide some more detailed pipeline instructions of it?
(I checked the instruction provided in your Nat.Comm. paper, but found it too coarse to reproduce.)
Grateful for your help.
At present, there are some variables (e.g. ml_f_EM
) that are not returned by run_sustain_algorithm
but are saved in the pickle files. It doesn't make sense to me that running the model gives you different outputs from loading the pickle file that you get from running it. The run_sustain_algorithm
should just output what gets saved (and should probably output it as a dict
to avoid having to refer to the order in which things are output).
Further to this, when trying to load previous results from a pickle file, a lot of the same setup is still required as when running the model in the first place, which can result in a lot of unnecessary boilerplate. There is some code that indicates there was once a desire for this. If anybody knows why this wasn't pursued or if there is some obstacle I haven't noticed please let me know.
A @classmethod
to recreate the model instance from a pickle file, resulting in the same thing as if you were to run the method (with Z_vals
etc. bundled in), would simplify the average workflow significantly, and would be a fairly simple change. The main issue with this is that it would probably break people's existing code, and would require the notebook(s) to be updated. It would, however, keep to the current concept (pickle the results/arrays).
We could also just pickle/unpickle the model instance itself, following a few changes to enable this. The process should also fix #27 and #41, on top of addressing the above (and simplifying things a lot). I reckon it should be doable without too much bother. Further details/considerations can be found here. This would also be best-served by turning the arrays that are usually pickled into attributes, and so would lead to a lot of small changes through the code.
If core maintainers agree, I'll go ahead with it, but if not then I will leave it.
Hi Neil, Leon, Others
I am trying to use Mixture SuStaIn with fixed controls in GMM (i.e. without optimizing for the controls Gaussian) and I would like to get your opinion if what I am doing is okay.
In mixture_model/utils/fit_all_gmm_models, instead of calling the fit function, I am trying to call the fit_constrained function in gmm.py
Is this okay ? I see that all the necessary functions are in place for making use of this functionality. Is there a reason for you to not provide access to this function through user controllable options (for e.g. if there is an unfixed bug in the fit_constrained function etc.) ?
Do let me know.
Cheers,
Vikram
Hi pySustain team,
I am running into the following error when setting parallelization = True in ZscoreSustain: "TypeError: cannot pickle '_abc._abc_data' object". This error is picked up ~60 times, always tracing to the pickle.py or the _dill.py files within the pysustain package. This has happened using both Jupyter Notebook and Spyder, however I can run Sustain fine when parallelization is set to False.
File "[pysustain package location]/lib/python3.10/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File [notebook], line 231, in <module>
prob_subtype_stage = sustain_input.run_sustain_algorithm()
File "[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py", line 164, in run_sustain_algorithm
ml_likelihood_mat_EM = self._estimate_ml_sustain_model_nplus1_clusters(self.__sustainData, ml_sequence_prev_EM, ml_f_prev_EM) #self.__estimate_ml_sustain_model_nplus1_clusters(self.__data, ml_sequence_prev_EM, ml_f_prev_EM)
File "[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py", line 615, in _estimate_ml_sustain_model_nplus1_clusters
ml_likelihood_mat = self._find_ml(sustainData)
File "[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py", line 704, in _find_ml
pool_output_list = self.pool.map(partial_iter, seed_sequences.spawn(self.N_startpoints))
File "[pysustain package location]/lib/python3.10/site-packages/pathos/multiprocessing.py", line 135, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "[pysustain package location]/lib/python3.10/site-packages/multiprocess/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "[pysustain package location]/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
File "[pysustain package location]/lib/python3.10/site-packages/multiprocess/pool.py", line 540, in _handle_tasks
put(task)
File "[pysustain package location]/lib/python3.10/site-packages/multiprocess/connection.py", line 214, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "[pysustain package location]/lib/python3.10/site-packages/multiprocess/reduction.py", line 54, in dumps
cls(buf, protocol, *args, **kwds).dump(obj)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 394, in dump
StockPickler.dump(self, obj)
File "[pysustain package location]/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 902, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 887, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 887, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1824, in save_function
_save_with_postproc(pickler, (_create_function, (
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1089, in _save_with_postproc
pickler.save_reduce(*reduction)
File "[pysustain package location]/lib/python3.10/pickle.py", line 692, in save_reduce
save(args)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 887, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 603, in save
self.save_reduce(obj=obj, *rv)
File "[pysustain package location]/lib/python3.10/pickle.py", line 692, in save_reduce
save(args)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 887, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1427, in save_instancemethod0
pickler.save_reduce(MethodType, (obj.__func__, obj.__self__), obj=obj)
File "[pysustain package location]/lib/python3.10/pickle.py", line 692, in save_reduce
save(args)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 887, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1824, in save_function
_save_with_postproc(pickler, (_create_function, (
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1084, in _save_with_postproc
pickler._batch_setitems(iter(source.items()))
File "[pysustain package location]/lib/python3.10/pickle.py", line 998, in _batch_setitems
save(v)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1698, in save_type
_save_with_postproc(pickler, (_create_type, (
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1070, in _save_with_postproc
pickler.save_reduce(*reduction, obj=obj)
File "[pysustain package location]/lib/python3.10/pickle.py", line 692, in save_reduce
save(args)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/pickle.py", line 902, in save_tuple
save(element)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 1186, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "[pysustain package location]/lib/python3.10/pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File "[pysustain package location]/lib/python3.10/pickle.py", line 998, in _batch_setitems
save(v)
File "[pysustain package location]/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "[pysustain package location]/lib/python3.10/pickle.py", line 578, in save
rv = reduce(self.proto)
TypeError: cannot pickle '_abc._abc_data' object
Packages:
alabaster @ file:///home/ktietz/src/ci/alabaster_1611921544520/work
anyio==3.6.2
applaunchservices @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_96v71vcny2/croots/recipe/applaunchservices_1661854626389/work
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_bc9ine8jfo/croot/arrow_1666726871970/work
astroid @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_9fsa1cbbec/croots/recipe/astroid_1659023133872/work
asttokens==2.0.8
atomicwrites==1.4.0
attrs==22.1.0
autopep8 @ file:///opt/conda/conda-bld/autopep8_1650463822033/work
awkde @ git+https://github.com/noxtoby/awkde.git@1c31e55fe54c0cad80ab423a9605fc9ddfb2614c
Babel==2.10.3
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
beautifulsoup4 @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_croot-cdiouih5/beautifulsoup4_1650462164803/work
binaryornot @ file:///tmp/build/80754af9/binaryornot_1617751525010/work
black @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_d0nhgmyc3l/croots/recipe/black_1660237813406/work
bleach==5.0.1
brotlipy==0.7.0
certifi @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_0ek9yztvu3/croot/certifi_1665076692562/work/certifi
cffi @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_62rp5d8fd4/croots/recipe/cffi_1659598655556/work
chardet @ file:///Users/builder/ci_310/chardet_1642531418028/work
charset-normalizer==2.1.1
click @ file:///opt/concourse/worker/volumes/live/2d66025a-4d79-47c4-43be-6220928b6c82/volume/click_1646056610594/work
cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1632508026186/work
colorama @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_b8ecd5af-5e60-48b8-80ac-92164ecb9b9bxkf0tkfp/croots/recipe/colorama_1657009097162/work
contourpy==1.0.5
cookiecutter @ file:///opt/conda/conda-bld/cookiecutter_1649151442564/work
cryptography @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_3evwafgyg8/croot/cryptography_1665612651044/work
cycler==0.11.0
debugpy==1.6.3
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work
diff-match-patch @ file:///Users/ktietz/demo/mc3/conda-bld/diff-match-patch_1630511840874/work
dill @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_88dxe9g1aq/croot/dill_1667919544494/work
docutils @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_10cfb287-0327-45ef-a38e-53dffd30cef1nwpvy20e/croots/recipe/docutils_1657175439973/work
entrypoints @ file:///opt/concourse/worker/volumes/live/5eb4850e-dcbc-41ad-5f22-922bac778f70/volume/entrypoints_1649926457041/work
executing==1.1.1
fastjsonschema @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_b5c1gee32t/croots/recipe/python-fastjsonschema_1661368622875/work
flake8 @ file:///opt/conda/conda-bld/flake8_1648129545443/work
fonttools==4.38.0
future==0.18.2
idna @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_00jf0h4zbt/croot/idna_1666125573348/work
imagesize @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_4a6ed1be-fe30-4d6a-91d4-f867600caa0be5_dxzvt/croots/recipe/imagesize_1657179500955/work
importlib-metadata @ file:///opt/concourse/worker/volumes/live/a8740f82-0523-4b08-5bb5-afa0c929f5e0/volume/importlib-metadata_1648562424930/work
inflection==0.5.1
intervaltree @ file:///Users/ktietz/demo/mc3/conda-bld/intervaltree_1630511889664/work
ipykernel==6.16.2
ipython==8.5.0
ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work
isort @ file:///tmp/build/80754af9/isort_1628603791788/work
jedi @ file:///opt/concourse/worker/volumes/live/18b71546-5bde-4add-72d1-7d16b76f0f7a/volume/jedi_1644315243726/work
jellyfish @ file:///opt/concourse/worker/volumes/live/d045b25f-e3af-4008-4edc-a00aeffb8b33/volume/jellyfish_1647962558521/work
Jinja2 @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_6adj7x0ejx/croot/jinja2_1666908137966/work
jinja2-time @ file:///opt/conda/conda-bld/jinja2-time_1649251842261/work
joblib==1.2.0
json5==0.9.10
jsonschema @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_d832da7jx3/croots/recipe/jsonschema_1663375475386/work
jupyter-server==1.21.0
jupyter_client==7.4.4
jupyter_core @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_fc_0us_ta7/croot/jupyter_core_1668084443574/work
jupyterlab==3.5.0
jupyterlab-pygments==0.2.2
jupyterlab_server==2.16.1
jupyterthemes==0.20.0
kde-ebm @ git+https://github.com/ucl-pond/kde_ebm.git@26ee48f7f723a82e4ff740e59b9745aa7def3daa
keyring @ file:///Users/builder/ci_310/keyring_1642616528347/work
kiwisolver==1.4.4
lazy-object-proxy @ file:///Users/builder/ci_310/lazy-object-proxy_1642533824465/work
lesscpy==0.15.1
lxml @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_1902c961-4bd2-4871-a3c5-70b7317a6521kpj7nz2o/croots/recipe/lxml_1657545138937/work
MarkupSafe @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_d4a9444f-bd4c-4043-b47d-cede33979b0fve7bm42r/croots/recipe/markupsafe_1654597878200/work
matplotlib==3.6.0
matplotlib-inline @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_9ddl71oqte/croots/recipe/matplotlib-inline_1662014471815/work
mccabe @ file:///opt/conda/conda-bld/mccabe_1644221741721/work
mistune==2.0.4
multiprocess==0.70.14
mypy-extensions==0.4.3
nbclassic==0.4.5
nbclient==0.7.0
nbconvert==7.2.5
nbformat==5.7.0
nest-asyncio==1.5.6
notebook==6.5.1
notebook_shim==0.2.0
numpy==1.23.4
numpydoc @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_adnyzxppoz/croot/numpydoc_1668085907252/work
p2j==1.3.2
packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
pandas==1.5.1
pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
pathos==0.3.0
pathspec @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_e2t1r2kdq7/croots/recipe/pathspec_1659627124303/work
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
Pillow==9.2.0
platformdirs @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_7fs8_2xgrm/croots/recipe/platformdirs_1662711383474/work
pluggy @ file:///opt/concourse/worker/volumes/live/8277900c-164a-49c8-6f2a-f55c3c0154be/volume/pluggy_1648042581708/work
ply==3.11
pox==0.3.2
poyo @ file:///tmp/build/80754af9/poyo_1617751526755/work
ppft==1.7.6.6
prometheus-client==0.15.0
prompt-toolkit==3.0.31
psutil==5.9.3
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval==0.2.2
pybind11==2.10.0
pycodestyle @ file:///tmp/build/80754af9/pycodestyle_1636635402688/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydocstyle @ file:///tmp/build/80754af9/pydocstyle_1621600989141/work
pyflakes @ file:///tmp/build/80754af9/pyflakes_1636644436481/work
Pygments==2.13.0
pylint @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_e75_4ydew9/croots/recipe/pylint_1659110352634/work
pyls-spyder==0.4.0
pyobjc-core @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_e7giy3a869/croots/recipe/pyobjc-core_1661848172499/work
pyobjc-framework-Cocoa @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_4c2umern3y/croots/recipe/pyobjc-framework-cocoa_1661850714385/work
pyobjc-framework-CoreServices @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_4717m_ngol/croots/recipe/pyobjc-framework-coreservices_1661853392396/work
pyobjc-framework-FSEvents @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_5atkr691rb/croots/recipe/pyobjc-framework-fsevents_1661852390555/work
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_3a17y2delq/croots/recipe/pyparsing_1661452538853/work
PyQt5-sip==12.11.0
pyrsistent==0.18.1
PySocks @ file:///Users/builder/ci_310/pysocks_1642536366386/work
pySuStaIn @ git+https://github.com/ucl-pond/pySuStaIn@564f07617a2a11477a18aec0b24d5d80825b0371
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
python-lsp-black @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_14xl6hg757/croots/recipe/python-lsp-black_1661852036282/work
python-lsp-jsonrpc==1.0.0
python-lsp-server @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_6cu9im5n5w/croots/recipe/python-lsp-server_1661813818984/work
python-slugify @ file:///tmp/build/80754af9/python-slugify_1620405669636/work
pytz==2022.5
PyYAML==6.0
pyzmq==24.0.1
QDarkStyle @ file:///tmp/build/80754af9/qdarkstyle_1617386714626/work
qstylizer @ file:///tmp/build/80754af9/qstylizer_1617713584600/work/dist/qstylizer-0.1.10-py2.py3-none-any.whl
QtAwesome @ file:///tmp/build/80754af9/qtawesome_1637160816833/work
qtconsole @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_552cqm7spz/croots/recipe/qtconsole_1662018258355/work
QtPy @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_4e5ppuhz0f/croots/recipe/qtpy_1662014536017/work
requests @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_516b78ce-034d-4395-b9b5-1d78c2847384qtnol99l/croots/recipe/requests_1657734628886/work
rope @ file:///opt/conda/conda-bld/rope_1643788605236/work
Rtree @ file:///Users/builder/ci_310/rtree_1642537064369/work
scikit-learn==1.1.3
scipy==1.9.3
seaborn==0.12.1
Send2Trash==1.8.0
sip @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_88z1zrsfrf/croots/recipe/sip_1659012373083/work
six @ file:///tmp/build/80754af9/six_1644875935023/work
sklearn==0.0
sniffio==1.3.0
snowballstemmer @ file:///tmp/build/80754af9/snowballstemmer_1637937080595/work
sortedcontainers @ file:///tmp/build/80754af9/sortedcontainers_1623949099177/work
soupsieve @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_14fb2zs6e3/croot/soupsieve_1666296397588/work
Sphinx @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_5d9f8d69-b80c-4ca1-8876-1698c70b1faeqe461tx8/croots/recipe/sphinx_1657784127805/work
sphinxcontrib-applehelp @ file:///home/ktietz/src/ci/sphinxcontrib-applehelp_1611920841464/work
sphinxcontrib-devhelp @ file:///home/ktietz/src/ci/sphinxcontrib-devhelp_1611920923094/work
sphinxcontrib-htmlhelp @ file:///tmp/build/80754af9/sphinxcontrib-htmlhelp_1623945626792/work
sphinxcontrib-jsmath @ file:///home/ktietz/src/ci/sphinxcontrib-jsmath_1611920942228/work
sphinxcontrib-qthelp @ file:///home/ktietz/src/ci/sphinxcontrib-qthelp_1611921055322/work
sphinxcontrib-serializinghtml @ file:///tmp/build/80754af9/sphinxcontrib-serializinghtml_1624451540180/work
spyder @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_97gv8v17po/croots/recipe/spyder_1663056808858/work
spyder-kernels @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_1c7pyd81si/croots/recipe/spyder-kernels_1662457889999/work
stack-data==0.5.1
tdt==0.5.4
terminado==0.17.0
text-unidecode @ file:///Users/ktietz/demo/mc3/conda-bld/text-unidecode_1629401354553/work
textdistance @ file:///tmp/build/80754af9/textdistance_1612461398012/work
threadpoolctl==3.1.0
three-merge @ file:///tmp/build/80754af9/three-merge_1607553261110/work
tinycss @ file:///tmp/build/80754af9/tinycss_1617713798712/work
tinycss2 @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_56dshjmms6/croot/tinycss2_1668168824483/work
toml @ file:///tmp/build/80754af9/toml_1616166611790/work
tomli @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_90762ba4-f339-47e8-bd29-416854a59b233d27hku_/croots/recipe/tomli_1657175507767/work
tomlkit @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_5fgtm9if1m/croots/recipe/tomlkit_1658946891645/work
tornado @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_1fimz6o0gc/croots/recipe/tornado_1662061695695/work
tqdm==4.64.1
traitlets==5.5.0
typing_extensions @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_ff5_5nqr6l/croots/recipe/typing_extensions_1659638832447/work
ujson @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_cf44fbd5-5db0-48cf-86c4-c8d4e74d1cbbwhgckc99/croots/recipe/ujson_1657544919410/work
Unidecode @ file:///tmp/build/80754af9/unidecode_1614712377438/work
urllib3 @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_7f7kb5tudl/croot/urllib3_1666298941688/work
watchdog @ file:///Users/builder/ci_310/watchdog_1642516765439/work
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
webencodings==0.5.1
websocket-client==1.4.1
whatthepatch @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_0aabmq0ph3/croots/recipe/whatthepatch_1661795995892/work
wrapt @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_1ade1f68-8354-4db8-830b-ff3072015779vd_2hm7k/croots/recipe/wrapt_1657814407132/work
wurlitzer @ file:///Users/builder/ci_310/wurlitzer_1642539193810/work
yapf @ file:///tmp/build/80754af9/yapf_1615749224965/work
zipp @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_b279673d-f037-44c7-8773-c5a6b6f51037d3wfr9cq/croots/recipe/zipp_1652341773612/work
Thank you for your help!
Katrina
Hello and thank you very much for sharing your work!
I am sorry for the trouble, but I am not being able to find out how to run your model for multiple sclerosis outcome prediction on the pretrained pkl you provide.
Would it be possible for you to provide some explanation on how to do so, or a reference to some instructions you have already provided and I am not seeing. From the input required, up to the inference process and the expected output.
I thank you enormously in advance.
Lucia
Hey team,
Just a tiny one here. I did a fresh install of pySuStaIn recently and ran into a minor issue. The requirements.txt file lists "sklearn" but it should be "scikit-learn". This caused some issues with my install, which were resolved by making that small change to the requirements.txt file.
--Jake
Hello,
Is it possible to apply the optimal model (after CV) to external independent data? If so, can you please add functionality to this tutorial: https://github.com/ucl-pond/pySuStaIn/blob/master/notebooks/SuStaInWorkshop.ipynb
I assume we only need to z-score the new data relative to the CN data used for the training?
Thank you very much!
I have noticed an edge case where subtypes are labelled differently by the positional variance diagrams, and the model output.
I'm not 100% certain, but I think this is due to the subtype numbering in the PVDs being assigned according to maximum likelihood of the positional variance, whilst I think the ml_subtype
number is assigned according to number of individuals per subtype. In rare cases a smaller subtype can have a higher maximum likelihood PVD. π
SusStaIn is a cool idea for extracting models from cross-sectional data, but one idea I have is, if my data is longitudinal, I could constrain the possible progression models that are possible?
Can such a feature go into SuStaIn? Does it conceptually make sense?
We have lots of animal longitudinal developmental and interventional data which could be used to test this.
We should code up a script that implements 10-fold CV for selecting the optimal number of subtype (see the SuStaIn paper).
For each fold:
Hi, Thanks for your great work!
I am working on your SuStaInWorkshop notebook tutorial. It plots positional variance diagrams to interpret the subtype progressions.
However, I am a little confused about how to read these diagrams.
Grateful for your help again.
Hi, Thanks for releasing pySuStaIn !
I am interested in AD pathology and currently trying to reproduce your subtyping results on ADNI dataset (i.e. the resulst reported in your Nat.Comm. 2018 paper).
I am confused of two problems:
Grateful for your help π
I was running cross-validation in parallel on a cluster using cross_validate_sustain_model()
with argument select_fold
set to the CV fold desired for each compute job.
I noticed that all 10 folds were returning results for only fold0
.
The culprit is line 276, where the loop is through range(Nfolds)
(where Nfolds=len(select_fold)
) rather than explicitly through the select_fold
array itself.
Will send a PR to fix shortly, but wanted to raise this in case others have the same problem
Hi Leon, Peter, others,
I tried running Sustain on a sporadic AD dataset and it ran well for around 8 hours or so until it crashed with the following error. It seems like some corner case scenario which doesn't occur quite often. Would really appreciate your help in addressing this issue.
Splitting cluster 1 of 3
+ Resolving 2 cluster problem
+ Finding ML solution from hierarchical initialisation
- ML likelihood is [-4233.34529467]
Splitting cluster 2 of 3
+ Resolving 2 cluster problem
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-bce0b47cba4f> in <module>
65 dataset_name,False)
66
---> 67 sustain_input.run_sustain_algorithm()
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/AbstractSustain.py in run_sustain_algorithm(self)
142 ml_sequence_mat_EM, \
143 ml_f_mat_EM, \
--> 144 ml_likelihood_mat_EM = self._estimate_ml_sustain_model_nplus1_clusters(self.__sustainData, ml_sequence_prev_EM, ml_f_prev_EM) #self.__estimate_ml_sustain_model_nplus1_clusters(self.__data, ml_sequence_prev_EM, ml_f_prev_EM)
145
146 seq_init = ml_sequence_EM
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/AbstractSustain.py in _estimate_ml_sustain_model_nplus1_clusters(self, sustainData, ml_sequence_prev, ml_f_prev)
584
585 print(' + Resolving 2 cluster problem')
--> 586 this_ml_sequence_split, _, _, _, _, _ = self._find_ml_split(sustainData_i)
587
588 # Use the two subtype model combined with the other subtypes to
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/AbstractSustain.py in _find_ml_split(self, sustainData)
695
696 if ~isinstance(pool_output_list, list):
--> 697 pool_output_list = list(pool_output_list)
698
699 ml_sequence_mat = np.zeros((N_S, sustainData.getNumStages(), self.N_startpoints))
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/AbstractSustain.py in _find_ml_split_iteration(self, sustainData, seed_num)
740
741 temp_seq_init = self._initialise_sequence(sustainData)
--> 742 seq_init[s, :], _, _, _, _, _ = self._perform_em(temp_sustainData, temp_seq_init, [1])
743
744 f_init = np.array([1.] * N_S) / float(N_S)
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/AbstractSustain.py in _perform_em(self, sustainData, current_sequence, current_f)
826 candidate_sequence, \
827 candidate_f, \
--> 828 candidate_likelihood = self._optimise_parameters(sustainData, current_sequence, current_f)
829
830 HAS_converged = np.fabs((candidate_likelihood - current_likelihood) / max(candidate_likelihood, current_likelihood)) < 1e-6
~/anaconda3/lib/python3.8/site-packages/pySuStaIn/ZscoreSustain.py in _optimise_parameters(self, sustainData, S_init, f_init)
237 p_perm_k_weighted = p_perm_k * f_val_mat
238 p_perm_k_norm = p_perm_k_weighted / np.sum(p_perm_k_weighted, axis=(1,2), keepdims=True)
--> 239 f_opt = (np.squeeze(sum(sum(p_perm_k_norm))) / sum(sum(sum(p_perm_k_norm)))).reshape(N_S, 1, 1)
240 f_val_mat = np.tile(f_opt, (1, N + 1, M))
241 f_val_mat = np.transpose(f_val_mat, (2, 1, 0))
TypeError: 'int' object is not iterable
Thanks in advance.
Vikram
Hi all,
I just wanted to share some code here in case it may help anyone else: I've added a legend to the PVDs which shows the Z_val associated with each colour. It assumes that the Z_vals are the same for all biomarkers when creating the legend labels. I'm aware that there's an open issue on PVD colourbars and am not offering a solution to that problem, but simply a fix in the meantime to show clearly which value maps to which colour (I myself found this quite confusing to parse out when using more than 3 Z_vals as there was no documentation showing the mappings). It would be well paired with explanations I've seen used in papers about what the intensity, etc. of each colour indicates.
This requires an extra import from Matplotlib in ZscoreSustain: import matplotlib.patches as mpatches
.
To implement it, add the following to plot_positional_var
in ZscoreSustain between lines 660 (ax.set_title(title_i, fontsize=title_font_size)
) and 661 (# Tighten up the figure
). Further customization (i.e. legend size, location, etc) is possible through use of the ax.legend
arguments.
ax.set_xlabel(stage_label, fontsize=stage_font_size+2)
ax.set_title(title_i, fontsize=title_font_size)
# Add a legend
# adding an extra dim to colour mat for RGB reasons
legend_colour_mat = np.array([[[1, 0, 0], [1, 0, 1], [0, 0, 1], [0.5, 0, 1], [0, 1, 1], [0, 1, 0.5]]])[:N_z]
patches = [ mpatches.Patch(color=legend_colour_mat[0][i], label=f"Z_val = {zvalues[i]}") for i in range(len(zvalues)) ]
# put those patches as legend-handles into the legend
ax.legend(handles=patches, loc = "best" )
# Tighten up the figure
#plt.tight_layout()
fig.tight_layout()
which I generated using:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
legend_colour_mat = np.array([[[1, 0, 0], [1, 0, 1], [0, 0, 1], [0.5, 0, 1], [0, 1, 1], [0, 1, 0.5]]])
ax = plt.subplot()
ax.imshow(legend_colour_mat)
ax.set_yticks([])
ax.set_xticks(range(6), ["1st Z_val", "2nd Z_val", "3rd Z_val", "4th Z_val", "5th Z_val", "6th Z_val"])
plt.show()
Before running SuStaIn using my own data, I wanted to run SuStaIn using the workshop file. Everything went smoothly except for when I want to plot the positional variance diagrams (under "Evaluate subtypes" and "choosing the optimal number of subtypes"). Python then throws the following error at me:
IndexError Traceback (most recent call last)
in
1 N_S_selected = 2
2
----> 3 pySuStaIn.ZscoreSustain._plot_sustain_model(sustain_input,samples_sequence,samples_f,M,subtype_order=(0,1))
4 _ = plt.suptitle('SuStaIn output')
5
~\xxx\pySuStaIn\ZscoreSustain.py in _plot_sustain_model(self, *args, **kwargs)
450
451 def _plot_sustain_model(self, *args, **kwargs):
--> 452 return ZscoreSustain.plot_positional_var(*args, Z_vals=self.Z_vals, **kwargs)
453
454 def subtype_and_stage_individuals_newData(self, data_new, samples_sequence, samples_f, N_samples):
~\xxx\pySuStaIn\notebooks\pySuStaIn\ZscoreSustain.py in plot_positional_var(samples_sequence, samples_f, n_samples, Z_vals, biomarker_labels, ml_f_EM, cval, subtype_order, biomarker_order, title_font_size, stage_font_size, stage_label, stage_rot, stage_interval, label_font_size, label_rot, cmap, biomarker_colours, figsize, separate_subtypes, save_path, save_kwargs)
622 # Shuffle vals according to subtype_order
623 # This defaults to previous method if custom order not given
--> 624 vals = temp_mean_f[subtype_order]
625
626 if n_samples != np.inf:
IndexError: too many indices for array
Could it be that there is something wrong with the dimensions of the array? When I get rid of ",subtype_order=(0,1))" (line 3), I get at least part of the output. Same applies to when I want to plot the positional variance diagrams before crossvalidation.
Any hint would be very welcome.
Kind regards
Hiya Leon et al.,
I ran into an interesting issue when the sys admin of my cluster reached out about some problematic processes that were instantiated when running a parallelized version of the SuStaIn cross-validation. The wrapper script looked something like this:
from pySuStaIn import Zscore SuStaIn
import multiprocessing as mp
sustain_input = ZscoreSuStaIn(args)
test_idxs = <a list of lists>
jobs = []
NFolds = 10
for fold in range(NFolds):
p = mp.Process(target = target = sustain_input.cross_validate_sustain_model,
args = (test_idxs,fold))
jobs.append(p)
p.start()
This script was then submitted to the cluster with an .sh script specifying some parameters, such as the number of nodes and cores (in this case I asked for 1 node and 32 cores). However, it seems that individual jobs were themselves starting several other threads/processes. In this sense, they were overriding the specifications on my .sh script. The result was me asking for 32 cores, but having 32^2 threads running on the node. This results in many context switches and inefficient use of the processors on the node.
I admit this is kind of a niche issue and maybe folks don't care so much about how efficient the code is. But I think this issue might be surmounted quite easily by allowing an argument where the user can control the internal parallelization to some degree, a la the n_jobs
framework in sklearn. As is, the parallel qualities do not seem to be controllable by the user.
Forgive me if this isn't clear. Would be happy to provide greater detail!
As always, thanks for such making this amazing library!
Hi all,
Thanks for your help with my past issue!
I'm now encountering a new error within the AbstractSuStaIn package that seems to relate to the staging portion of the algorithm:
MCMC Iteration: 100%|ββββββββββ| 10000/10000 [00:24<00:00, 414.37it/s] MCMC Iteration: 100%|ββββββββββ| 10000/10000 [00:23<00:00, 422.43it/s] MCMC Iteration: 100%|ββββββββββ| 10000/10000 [00:30<00:00, 325.00it/s] MCMC Iteration: 100%|ββββββββββ| 1000/1000 [00:02<00:00, 462.84it/s]
[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py:556: RuntimeWarning: invalid value encountered in divide
total_prob_subtype_norm = total_prob_subtype /
np.tile(np.sum(total_prob_subtype, 1).reshape(len(total_prob_subtype), 1), (1, N_S))
[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py:557: RuntimeWarning: invalid value encountered in divide
total_prob_stage_norm = total_prob_stage / np.tile(np.sum(total_prob_stage, 1).reshape(len(total_prob_stage), 1), (1, nStages + 1)) #removed total_prob_subtype
[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py:560: RuntimeWarning: invalid value encountered in divide
total_prob_subtype_stage_norm = total_prob_subtype_stage /
np.tile(np.sum(np.sum(total_prob_subtype_stage, 1, keepdims=True), 2).reshape(nSamples, 1, 1),(1, nStages + 1, N_S))
Traceback (most recent call last):
File "[notebook].py", line 475, in <module>
prob_subtype_stage = sustain_input.run_sustain_algorithm()
File "[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py", line 186, in run_sustain_algorithm
prob_subtype_stage = self.subtype_and_stage_individuals(self.__sustainData, samples_sequence, samples_f, N_samples) #self.subtype_and_stage_individuals(self.__data, samples_sequence, samples_f, N_samples)
File "[pysustain package location]/lib/python3.10/site-packages/pySuStaIn/AbstractSustain.py", line 590, in subtype_and_stage_individuals
this_prob_stage = np.squeeze(prob_subtype_stage[i, :, int(ml_subtype[i])])
ValueError: cannot convert float NaN to integer
This error is happening both locally and on a remote computing cluster. I've already added in an assert that none of the data going into SuStaIn contains NaNs, and ensured that my Z_vals are all integers (I am using Zscore SuStaIn). Do you have any ideas of what may be causing this issue or how to solve it?
As a related question, my research group and I are wondering why negative z-scores are not allowed in SuStaIn, and best practices handle them. We are currently shifting the z-score distribution to the right to ensure all values are > 0, but this means that we are losing the interpretability of z = 0, etc. Do you have any advice?
Thank you!
Hi all,
After I initialize the ZSuStaIn model, I run into the following Index out of bounds
exception:
Finding ML solution to 1 cluster problem
/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/ZscoreSustain.py:238: RuntimeWarning: invalid value encountered in true_divide
p_perm_k_norm = p_perm_k_weighted / np.sum(p_perm_k_weighted, axis=(1,2), keepdims=True)
Traceback (most recent call last):
File "try0_mspaths_sustain.py", line 151, in <module>
sustain_input.run_sustain_algorithm()
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/AbstractSustain.py", line 144, in run_sustain_algorithm
ml_likelihood_mat_EM = self._estimate_ml_sustain_model_nplus1_clusters(self.__sustainData, ml_sequence_prev_EM, ml_f_prev_EM) #self.__estimate_ml_sustain_model_nplus1_clusters(self.__data, ml_sequence_prev_EM, ml_f_prev_EM)
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/AbstractSustain.py", line 552, in _estimate_ml_sustain_model_nplus1_clusters
ml_likelihood_mat = self._find_ml(sustainData)
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/AbstractSustain.py", line 643, in _find_ml
pool_output_list = list(pool_output_list)
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/AbstractSustain.py", line 676, in _find_ml_iteration
_ = self._perform_em(sustainData, seq_init, f_init)
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/AbstractSustain.py", line 828, in _perform_em
candidate_likelihood = self._optimise_parameters(sustainData, current_sequence, current_f)
File "/camhpc/mspaths/data/dev/pySuStaIn/pySuStaIn/ZscoreSustain.py", line 304, in _optimise_parameters
this_S = this_S[0, :]
IndexError: index 0 is out of bounds for axis 0 with size 0
My current ZSustaIn class initialization is as followed:
# Start pySuStaIn
N = 4 # number of biomarkers
SuStaInLabels = ['Bio1', 'Bio2', 'Bio3', 'Bio4'] # biomarker labels
unt_data = np.vstack((biom_1, biom_2, biom_3, biom_4)) # each biomarker z-scored
data = np.transpose(unt_data) # data.shape --> (5123, 4) # data.shape returns (5123, 4)
Z_vals = np.array([[1,2,3]]*N) # Z-score stage threshold # Z_vals.shape return (4,3)
Z_max = np.array([np.max(biom_bpf), np.max(biom_t2les),
np.max(biom_cgmf), np.max(biom_dgmf)]) # Z_max.shape returns (4,)
# Snipper of my prepared data for SuStaIn:
N_S_gt = 3 # Number of ground truth subtypes
N_startpoints = 10
N_S_max = N_S_gt+1
N_iterations_MCMC = int(1e4)
output_folder = os.path.join(os.path.dirname(__file__), 'rp_mspaths_sim')
if os.path.isdir(output_folder) is False:
os.mkdir(output_folder)
dataset_name = 'sim'
sustain_input = ZscoreSustain(data,
Z_vals,
Z_max,
SuStaInLabels,
N_startpoints,
N_S_max,
N_iterations_MCMC,
output_folder,
dataset_name,
False)
sustain_input.run_sustain_algorithm() ##
After spending some time attempting to debug, I believe the issue is related to the following functions returning a nan array type:
f_opt = (np.squeeze(sum(sum(p_perm_k_norm))) / sum(sum(sum(p_perm_k_norm)))).reshape(N_S, 1, 1)
def _optimise_parameters(self, sustainData, S_init, f_init):
# Optimise the parameters of the SuStaIn model
M = sustainData.getNumSamples() #data_local.shape[0]
N_S = S_init.shape[0]
N = self.stage_zscore.shape[1]
S_opt = S_init.copy() # have to copy or changes will be passed to S_init
f_opt = np.array(f_init).reshape(N_S, 1, 1)
f_val_mat = np.tile(f_opt, (1, N + 1, M))
f_val_mat = np.transpose(f_val_mat, (2, 1, 0))
p_perm_k = np.zeros((M, N + 1, N_S))
for s in range(N_S):
p_perm_k[:, :, s] = self._calculate_likelihood_stage(sustainData, S_opt[s])
p_perm_k_weighted = p_perm_k * f_val_mat
p_perm_k_norm = p_perm_k_weighted / np.sum(p_perm_k_weighted, axis=(1,2), keepdims=True)
f_opt = (np.squeeze(sum(sum(p_perm_k_norm))) / sum(sum(sum(p_perm_k_norm)))).reshape(N_S, 1, 1)
f_val_mat = np.tile(f_opt, (1, N + 1, M))
f_val_mat = np.transpose(f_val_mat, (2, 1, 0))
order_seq = np.random.permutation(N_S) # this will produce different random numbers to Matlab
.
.
.
Now, this issue occurs randomly on the nth iteration, as replicated when running my script multiple times.
Any insights on how to fix/troubleshoot this problem?
Thanks in advance
Hello!
Parallelizing the cross-validation requires use of the "select_fold" argument. One might, for example, launch 10 instances of cross-validation, one for each of (say) 10 folds, in which case the given fold (say 3) would be passed to the select_fold argument. However, I've run into a few issues with this function.
First, line 218
if select_fold:
Many users will pass 0 to get the first fold. However, 0 will not fulfill the conditional statement if select_fold. There are many ways to fix this, but since the default setting for select_fold is [], my janky fix was just:
if select_fold != []:
The next issue comes in the following lines, 218-220
if select_fold:
test_idxs = test_idxs[select_fold]
Nfolds = len(test_idxs)
I'm not sure what the intention was here, but the result is that Nfolds actually becomes the number of subjects in the test set. So, instead of having the desired 1 fold, you end up with N folds, where N is the number of subjects in the test set.
My solution here requires a few changes. First, lines 218-220 are changed to this:
if select_fold != []:
Nfolds = 1
else:
Nfolds = len(test_idxs)
Then, in order to disrupt the code as little as possible, I added the following lines under line 226. I include 226 below for reference:
for fold in range(Nfolds):
if select_fold != []: # or whatever you change line 218 to
fold = select_fold
Adding these three small changes resulted in the script working without issue for me, though maybe there are more elegant solutions. Thanks for bringing SuStaIn to Python!!
Dear SuStaIn friends,
I had encountered an error on a number of occasions when using SuStaIn on different datasets. The error itself looked something like this:
Traceback (most recent call last):
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/pySuStaIn/AbstractSustain.py", line 709, in _find_ml_iteration
_ = self._perform_em(sustainData, seq_init, f_init, rng)
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/pySuStaIn/AbstractSustain.py", line 864, in _perform_em
candidate_likelihood = self._optimise_parameters(sustainData, current_sequence, current_f, rng)
File "/Users/jacobv/SuStaIn_workshop/lib/python3.7/site-packages/pySuStaIn/ZscoreSustain.py", line 324, in _optimise_parameters
this_S = this_S[0, :]
IndexError: index 0 is out of bounds for axis 0 with size 0.
As it turns out, this is caused by a divide by zero problem during the normalization of p_perm_k. This NaN then propagates forward a bit and doesn't turn up as an error until line 324, as shown. This is not itself necessarily caused by any outlying "bad values" (e.g. NaNs) in the original dataset, so it's quite hard (impossible?) to detect before running SuStaIn and getting the error.
This is apparently a known issue, as the following comment exists on line 333 of ZScoreSustain:
#adding 1e-250 fixes divide by zero problem that happens rarely
A few lines later at 335, the "corrected" line occurs:
p_perm_k_norm = p_perm_k_weighted / np.sum(p_perm_k_weighted + 1e-250, axis=(1, 2), keepdims=True)
However, at least in my case, the offending divide by zero problem occurred earlier. Note that the fix (ln 335) occurs before the error in my traceback (ln 324). Instead, the divide by zero problem occurs for me at line 238, which is incidentally the same calculation:
p_perm_k_norm = p_perm_k_weighted / np.sum(p_perm_k_weighted, axis=(1,2), keepdims=True)
By once again adding the "corrected" line, the problem is surmounted and I no longer get the error. I'm not sure how rare this issue really is, because this is maybe the third time I've encountered it (on different datasets). Requesting a patch to fix it, pretty please!
Thanks as always for this incredible software!! <3 <3 <3
This is minor in the grand scheme of things, but it might be important when creating clear figures for e.g. publication. I've toyed with a few ideas so thought I'd raise an issue on this before unilaterally merging one option for others to use and save them some time.
Adding a colourbar to the PVD for the mixture version is straightforward, as colour intensity equates to the certainty of that position. For the z-score version, however, it has two dimensions. While certainty of the colour for a single z-score event equates to certainty (e.g. from pure white to pure red), the colours also mix when different z-score events overlap (and they mix proportionally to their certainty). For example, if a single stage (for a single biomarker) has 50% certainty for z=1 and z=2, this square will be a 50:50 mix of red and magenta (both of which are themselves at 50% intensity). A single colourbar cannot (to me, at least) capture this.
The question is, if adding a colourbar is to be useful, which information is it best that it captures.
Here's a few variants I made for this. Other suggestions are welcome.
Gets the point across, but doesn't integrate intensity/certainty or z-score mixing.
Highlights the difference in intensity, but not z-score mixing.
Highlights z-score mixing, but not intensity.
The point of this is to add something so others don't need to do this themselves. If no-one feels strongly, I'll just pick one after a week or so to integrate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.