GithubHelp home page GithubHelp logo

jdrusso / msm_we Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 7.0 447.59 MB

History-augmented Markov analysis of weighted ensemble trajectories.

Home Page: https://msm-we.readthedocs.io

License: MIT License

Python 100.00%
markov-model molecular-dynamics weighted-ensemble

msm_we's Introduction

Hi there ๐Ÿ‘‹

Social

Current homelab setup

CPU: 4x Xeon 4870 GPU: Nvidia K10

Editor: PyCharm OS: Ubuntu 20.04

msm_we's People

Contributors

jdrusso avatar jeremyleung521 avatar jpthompson17 avatar junchaoxia avatar shz66 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

msm_we's Issues

Allow optimization at custom intervals

Currently, optimization happens at fixed intervals of max_iters, up to a total of max_total_iterations iterations.

However, it's unintuitive that WESTPA's max_iters is actually the interval, and the optimization restarts it until the number of iterations in the plugin config...

A better system would be for WESTPA's max_iters to be the true number of maximum iterations, and the optimization provides specific iterations (or intervals) to restart at.

Rewrite clustering using Ray Actors

Clustering in particular is a stateful operation, which currently relies on serializing/deserializing modelWE objects from the Ray object store.

Ideally, there isn't much overhead associated with this, but I think it becomes noticeable on systems without shared memory between workers.

Instead of doing the parallelism via Ray processes, we can initialize a set of Actors to do work. Actors are stateful, so we can just initialize them with the current model state (with unnecessary stuff stripped out).

Remove all `aggregate` methods

For aggregate clustering functionality, just silently create a single rectilinear bin from -inf to inf, and then use the stratified methods.

Blend progress bars into rich live display

The Rich live table that build_analyze_model displays is incompatible with the various progress bars msm_we produces. The Live Table is always printed at the bottom of the page, causing the progress bars to spam new lines as they're updated

Find some way to combine them, or create a Rich Console that both can connect to and work under

Remove egregiously duplicated Ray code

Ray parallelization has some ugly, duplicated code. This should be cleaned up (particularly for rediscretization where it's copied in organize_fluxMatrix and clusterCoordinates

Sort out dependencies

Right now the dependencies for this package aren't very cleanly managed. This is because:

  • Conda doesn't allow you to install a package from local source, like pip does
  • But I don't want to install all the dependencies from pip, because pyemma is not robust to pip installs (according to their documentation)

So I can make a conda environment file w/ all the appropriate dependencies, but then you still need to pip install package... OR I can leave as-is, and you install everything with pip, but have to manually install conda. Neither option seems great.

Make optimization pcoord extension generic, not SynD specific

The optimization plugin must extend the progress coordinate using the dimensionality reduction produced by the haMSM.

Currently, this is only supported for SynD, because it's easy to recompute a value for every pcoord.

However, a more generic implementation could look something like wrapping get_pcoord with a call to model.processCoordinates, and returning the combination of the original pcoord and the new dimensionality-reduced coordinates.

Recall, though, that get_pcoord isn't necessarily called during propagation..

Allow different optimizations at different iterations

If I pass an extra n_iter argument to my custom_optimization functions that the optimization plugin uses, I can allow using different optimization strategies at different iterations

I.e., instead of

def optimize_allocation(hamsm):

    original_allocation = westpa.rc.get_we_driver().bin_target_counts

    return original_allocation

we could have

def optimize_allocation(hamsm, n_iter):

    if n_iter < 1000:
        allocation = westpa.rc.get_we_driver().bin_target_counts

    elif n_iter < 7000:
        allocation = do_something_else()

    return allocation

Allow haMSM plugin to use a pre-saved haMSM

The haMSM plugin must construct an haMSM from scratch every time it runs. If you're re-running from, for example, a run with a failed optimization, then you might want to save/restore the built haMSM to save time

Add an optional flag that will save the constructed haMSM, and attempt to reload from an existing one if it exists

Parallelism assumes direct `west.h5` file access

Currently, some parallelized sections of msm_we still assume they'll have direct access to the west.h5 files being analyzed.

For example, taking the discretization: in do_stratified_ray_discretization, I pass an iteration to the remote worker, and the remote worker is responsible for retrieving the data.

This was an intentional decision, to avoid having to send lots of data. However, this means we assume direct access to the west.h5 files... This prevents you from using a remote cluster for Ray work.

To me, it seems like the options are:

  • Copy all the west.h5 files to the remote worker, maybe in the restart plugin? Ray supports this in init via runtime_env argument, but it has pretty conservative limits on file size.
    This has the drawback of requiring a large file copy.
  • Send iteration data directly, rather than having each worker read it.
    This, on the other hand, has the drawback of requiring lots of data to be sent over the network.

Suppress `openmm` import warnings in code using MDAnalysis

MDAnalysis appears to use an old-style method of importing openmm, which raises

(do_stratified_ray_discretization pid=559390) Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' 
instead.

Suppress this in, for example, my sample processCoordinates scripts (which use mdanalysis)

Handling failed model building

The haMSM building + optimization plugins can be fragile to failures during model-building or optimization.

When attempting to restart after a failure, issues arise such as

  • westpa.rc.we_driver() not correctly set, which means the optimization can't get the bin allocation and fails

Need to see if there's a good way to restore WESTPA state when launching directly into plugin execution

Don't assume west.cfg filename

In prepare_extension_run(), the restart_driver assumes the west config file is named west.cfg. This is not necessarily true, if WESTPA was run with the -r argument pointing to a differently-named file. This can probably be obtained from westpa.rc somewhere?

iPCA hangs

  • msm_we version: 0.27
  • Python version: 3.9.13
  • Operating System: Ubuntu 20.04

iPCA/PCA occasionally hangs indefinitely during model-building

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.