History-augmented Markov analysis of weighted ensemble trajectories.
Home Page: https://msm-we.readthedocs.io
License: MIT License
Currently, optimization happens at fixed intervals of max_iters
, up to a total of max_total_iterations
iterations.
However, it's unintuitive that WESTPA's max_iters
is actually the interval, and the optimization restarts it until the number of iterations in the plugin config...
A better system would be for WESTPA's max_iters
to be the true number of maximum iterations, and the optimization provides specific iterations (or intervals) to restart at.
Clustering in particular is a stateful operation, which currently relies on serializing/deserializing modelWE objects from the Ray object store.
Ideally, there isn't much overhead associated with this, but I think it becomes noticeable on systems without shared memory between workers.
Instead of doing the parallelism via Ray processes, we can initialize a set of Actors to do work. Actors are stateful, so we can just initialize them with the current model state (with unnecessary stuff stripped out).
For aggregate clustering functionality, just silently create a single rectilinear bin from -inf to inf, and then use the stratified methods.
The Rich live table that build_analyze_model
displays is incompatible with the various progress bars msm_we
produces. The Live Table is always printed at the bottom of the page, causing the progress bars to spam new lines as they're updated
Find some way to combine them, or create a Rich Console that both can connect to and work under
Ray parallelization has some ugly, duplicated code. This should be cleaned up (particularly for rediscretization where it's copied in organize_fluxMatrix
and clusterCoordinates
Right now the dependencies for this package aren't very cleanly managed. This is because:
pyemma
is not robust to pip installs (according to their documentation)So I can make a conda environment file w/ all the appropriate dependencies, but then you still need to pip install package... OR I can leave as-is, and you install everything with pip, but have to manually install conda. Neither option seems great.
The optimization plugin must extend the progress coordinate using the dimensionality reduction produced by the haMSM.
Currently, this is only supported for SynD, because it's easy to recompute a value for every pcoord.
However, a more generic implementation could look something like wrapping get_pcoord
with a call to model.processCoordinates
, and returning the combination of the original pcoord and the new dimensionality-reduced coordinates.
Recall, though, that get_pcoord isn't necessarily called during propagation..
If I pass an extra n_iter
argument to my custom_optimization
functions that the optimization plugin uses, I can allow using different optimization strategies at different iterations
I.e., instead of
def optimize_allocation(hamsm):
original_allocation = westpa.rc.get_we_driver().bin_target_counts
return original_allocation
we could have
def optimize_allocation(hamsm, n_iter):
if n_iter < 1000:
allocation = westpa.rc.get_we_driver().bin_target_counts
elif n_iter < 7000:
allocation = do_something_else()
return allocation
Ray provides a register_ray()
function that allows you to use it as a backend for sklearn.
I think this could provide some nice performance boosts for dimensionality reduction and clustering.
The haMSM plugin must construct an haMSM from scratch every time it runs. If you're re-running from, for example, a run with a failed optimization, then you might want to save/restore the built haMSM to save time
Add an optional flag that will save the constructed haMSM, and attempt to reload from an existing one if it exists
Now that augmentation has been refactored into a separate plugin, this should be much easier.
Write versions of tests using Ray
Currently, some parallelized sections of msm_we
still assume they'll have direct access to the west.h5
files being analyzed.
For example, taking the discretization: in do_stratified_ray_discretization
, I pass an iteration to the remote worker, and the remote worker is responsible for retrieving the data.
This was an intentional decision, to avoid having to send lots of data. However, this means we assume direct access to the west.h5
files... This prevents you from using a remote cluster for Ray work.
To me, it seems like the options are:
west.h5
files to the remote worker, maybe in the restart plugin? Ray supports this in init
via runtime_env
argument, but it has pretty conservative limits on file size.MDAnalysis appears to use an old-style method of importing openmm, which raises
(do_stratified_ray_discretization pid=559390) Warning: importing 'simtk.openmm' is deprecated. Import 'openmm'
instead.
Suppress this in, for example, my sample processCoordinates
scripts (which use mdanalysis
)
The haMSM building + optimization plugins can be fragile to failures during model-building or optimization.
When attempting to restart after a failure, issues arise such as
westpa.rc.we_driver()
not correctly set, which means the optimization can't get the bin allocation and failsNeed to see if there's a good way to restore WESTPA state when launching directly into plugin execution
In prepare_extension_run()
, the restart_driver assumes the west config file is named west.cfg
. This is not necessarily true, if WESTPA was run with the -r
argument pointing to a differently-named file. This can probably be obtained from westpa.rc
somewhere?
iPCA/PCA occasionally hangs indefinitely during model-building
This is maybe a little risky, and allows a bad practice... But the validation models can take a lot of time when you're trying to rapidly test things.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.