GithubHelp home page GithubHelp logo

mdanalysis / mdacli Goto Github PK

View Code? Open in Web Editor NEW
17.0 4.0 7.0 7.08 MB

Command line interface for MDAnalysis

Home Page: https://mdacli.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%
command-line-tool mdanalysis science python molecular-dynamics cli computational-chemistry molecular-dynamics-simulation trajectory-analysis command-line

mdacli's Introduction

MDAnalysis command line interface

mdacli is a simple command line interface (CLI) to the analysis classes of MDAnalysis using argparse. This project is in an early development stage and work in progress. Contributions are welcome!

To install mdacli refer to the INSTALL file.

Run `mdacli`:

mda -h

For a help and an overview of the supported modules. A help message for each module is available using:

mda <module> -h

Available modules

Currently the following analysis modules are available

Module Name Description
AlignTraj RMS-align trajectory to a reference structure using a selection.
AverageStructure RMS-align trajectory to a reference structure using a selection, and calculate the average coordinates of the trajectory.
Contacts Calculate contacts based observables.
DensityAnalysis Volumetric density analysis.
DistanceMatrix Calculate the pairwise distance between each frame in a trajectory
Dihedral Calculate dihedral angles for specified atomgroups.
Janin Calculate χ_1 and χ_2 dihedral angles of selected group
Ramachandran Calculate ϕ and ψ dihedral angles of selected group
DielectricConstant Computes the average dipole moment.
GNMAnalysis Basic tool for GNM analysis.
closeContactGNMAnalysis GNMAnalysis only using close contacts.
HELANAL Perform HELANAL helix analysis on your trajectory.
HoleAnalysis Run hole program on a trajectory.
LinearDensity Linear density profile
EinsteinMSD Class to calculate Mean Squared Displacement by the Einstein relation.
PCA Principal component analysis on an MD trajectory.
InterRDF Intermolecular pair distribution function
RMSD Class to perform RMSD analysis on a trajectory.
RMSF Calculate RMSF of given atoms across a trajectory.

More information about each module is available through the help page or at the MDAnalysis documentation.

mdacli's People

Contributors

hejamu avatar joaomcteixeira avatar lilyminium avatar orbeckst avatar picocentauri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mdacli's Issues

decide on executable name

@PicoCentauri started polling on Discord How would you like to call MDAnalysis from the command line?

  1. MDAnalysis --help
  2. mdanalysis --help
  3. mda --help
  4. mdacli --help

Handle kwargs named argument in Analysis classes

Some classes have a specific named argument named kwargs that does not serve the typical kwargs functionality. Instead it is a dictionary passed to the execution method. An example of this is the Contacts class:

https://github.com/MDAnalysis/mdanalysis/blob/3500131284b2d91846fda9696f569bc1f6bfe741/package/MDAnalysis/analysis/contacts.py#L410

I believe we need to write an Argparse action to read a dictionary like input from the command line when a kwargs argument exists in the analysis class. The analysis class will know how to handle such dictionary.

Docs parser fails

Running mdacli with current development version of MDAnalysis the parse_docs is able to parse the docs. I pinned down the problematic entry to the universe attribute of DistanceMatrix in the diffusionmap.py. It can be easily seen by adding a print

print(klass, par_name, others_)

statement to before line L288

https://github.com/PicoCentauri/mda_cli/blob/ad619df0f7ab3473ab4c2e8789954c5c78ff0b9e/src/mdacli/cli.py#L281-L288

Most likely our regex is not able to parse the `... @joaomcteixeira do you have an idea?

Additionally it seems that MDAnalysis also got docstring entries using : and : to seperate the name and the type. I think we are not able to parse them...

AnalysisBase is appearing

When running the code from #3 commit 76a7093, the AnalysisBase appears in the CLI selection menu. Currently, I don't understand from where it comes because it is removed here:

skip_mods = ('base', 'rdf_s', 'hydrogenbonds', 'hbonds')                        
relevant_modules = (_mod for _mod in __all__ if _mod not in skip_mods) 

To investigate... 🔬

$ python cli_main.py -h
Warning: This module is deprecated as of MDAnalysis version 1.0. It will be removed in MDAnalysis version 2.0.Please use MDAnalysis.analysis.helix_analysis instead.
Warning: This module is deprecated as of MDAnalysis version 1.0.It will be removed in MDAnalysis version 2.0Please use MDAnalysis.analysis.hydrogenbonds.hbond_analysis instead.
usage: cli_main.py [-h]
                   {AlignTraj,AnalysisBase,AverageStructure,Contacts,DensityAnalysis,HoleAnalysis,EinsteinMSD,PersistenceLength,InterRDF,InterRDF_s,RMSD,RMSF}
                   ...

optional arguments:
  -h, --help            show this help message and exit

MDAnalysis Analysis CLI:
  {AlignTraj,AnalysisBase,AverageStructure,Contacts,DensityAnalysis,HoleAnalysis,EinsteinMSD,PersistenceLength,InterRDF,InterRDF_s,RMSD,RMSF}
    AlignTraj           RMS-align trajectory to a reference structure using a
                        selection.
    AnalysisBase        Base class for defining multi frame analysis
    AverageStructure    RMS-align trajectory to a reference structure using a
                        selection,
    Contacts            Calculate contacts based observables.
    DensityAnalysis     Volumetric density analysis.
    HoleAnalysis        Run :program:`hole` on a trajectory.
    EinsteinMSD         Class to calculate Mean Squared Displacement by the
                        Einstein relation.
    PersistenceLength   Calculate the persistence length for polymer chains
    InterRDF            Intermolecular pair distribution function
    InterRDF_s          Site-specific intermolecular pair distribution
                        function
    RMSD                Class to perform RMSD analysis on a trajectory.
    RMSF                Calculate RMSF of given atoms across a trajectory.

Alter `MDAnalsysi.base.AnanlysisBase`

I know that @joaomcteixeira is not a fan of OOPing things but I will post this here anyway ;).
I thought about altering the base class with two new methods/attributes. A self.doc_dict attribute: a dictionary combining the docstring information and the callable signature. In principle, it is our

def parse_callable_signature(callable_obj, storage_dict):

as a class method.

And a self.save_results() method based on our general approach could be added if no such a function exists.

If we dynamically alter the analysis class we are parsing this could help to improve our workflow and will help users/developers that try to use our python implementation.

lib not found after fresh install

with MDAnalysis 1.1.1

(mdacli) joao@vantito:~/github/mda_cli
· mdacli 
Traceback (most recent call last):
  File "/home/joao/anaconda3/envs/mdacli/bin/mdacli", line 33, in <module>
    sys.exit(load_entry_point('mdacli', 'console_scripts', 'mdacli')())
  File "/home/joao/github/mda_cli/src/mdacli/cli.py", line 591, in main
    maincli(setup_clients())
  File "/home/joao/github/mda_cli/src/mdacli/cli.py", line 575, in setup_clients
    module = importlib.import_module('MDAnalysis.analysis.' + module)
  File "/home/joao/anaconda3/envs/mdacli/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'MDAnalysis.analysis.rdf_s'

Testing and CI

On top of #8 and #9 discussions, we need to agree on a strategy for testing and CI. Despite MDA_CLI being part of MDA and uses MDA it is independent of the main project, so I believe it should have its own testing and CI infrastructure. What do you think @orbeckst @PicoCentauri ?

On Travis-CI for taurenmd I use tox-conda to install MDAnalysis and perform taurenmd tests. It can be an option to reproduce that strategy here. We can also try Github Actions.

Integration tests

#109 shows that mdacli is missing some integration tests with MDAnalysis, i.e., runnning a minimal working example of an analysis to actually call the run() function once.

Clean kwargs before parsing atom selection

In this line our code will break since the analysis_kwargs dictionary contains the keys begin, end, dt, verbose and func. I fixed this, but we should maybe clean up the analysis_kwargs dict. It should only contain the entries for the analysis class... What do you think

Originally posted by @PicoCentauri in #3 (comment)

Limit number of threads used by analysis

For running analysis code in shared machines it can be useful to limit the number of threads. By default
numpy and to my knowledge also MDAnalysis takes all resources available. threadpoolctl allows nicely to limit the
number of threads. I am using it with great success in MAICosS.

I'm just copying the essential lines. They just have to be inserted in the right positions in mdacli

First add ne new parameter

parser.add_argument("-nt",
                        dest="num_threads",
                        type=int,
                        default=0,
                        help="Total number of threads to start (0 is guess)")

after

mdacli/src/mdacli/libcli.py

Lines 198 to 204 in 3aa2a66

run_group.add_argument(
"-dt",
dest="step",
type=str,
default="1",
help="step or time step for evaluation (default: %(default)s)"
)

and add a with statement

with threadpool_limits(limits=args.num_threads):
    ...

around

mdacli/src/mdacli/cli.py

Lines 131 to 136 in 3aa2a66

run_analsis(analysis_callable,
arg_grouped_dict["Mandatory Parameters"],
arg_grouped_dict["Optional Parameters"],
arg_grouped_dict["Reference Universe Parameters"],
arg_grouped_dict["Analysis Run Parameters"],
arg_grouped_dict["Output Parameters"])

This will of course add a new dependency.

More tolerant docstring parser

Our current docstring parser located at

def parse_docs(klass):

returns a dictionary of the the docstring. It works but it is not as flexible and tolerant as the sphinx/napoleon implementation. Especially we have problems with the separator between a parameter name and its type; usually denoted by name : type. A different notation can not be parsed since we use a hardcoded split

mdacli/src/mdacli/utils.py

Lines 230 to 232 in 78fa3f2

for line in doc_lines[par_i: end_param_line][::-1]:
if ' : ' in line:
par_name, others_ = line.split(' : ')

Improvements with using a regex did also not succeed. If possible we should incorporate the sphinx parser or at least get some ideas from their implementation.

Incorrect error message when reading data/topol.tpr

After #28 (at least when I noticed it), the data/topology file can't be read:

$ mdacli RMSF -s data/topol.tpr -f data/traj.trr -atomgroup all
Warning: No coordinate reader found for data/topol.tpr. Skipping this file.

mdacli doesn't loop over single frames

If one is using an AnalysisClass on a trajectory with only one frame, the AnalysisBaseClass from MDA will happily do exactly that. This is because at initialization (
https://github.com/MDAnalysis/mdanalysis/blob/3769ee29e5907221527ff0ec88a8c5acf9f86dee/package/MDAnalysis/analysis/base.py#L247
), if start, stop and step are None, and the trajectory is ONE frame long, it sets them to 0, 1 and 1 respectively.

mdacli however sets the default values to start = 0, stop = -1 and step = 1. This does not result in the same behaviour, since for array of length 1:

a[0:-1:1] != a[0:1:1]

This difference in behaviour finally matters at this point: https://github.com/MDAnalysis/mdanalysis/blob/3769ee29e5907221527ff0ec88a8c5acf9f86dee/package/MDAnalysis/analysis/base.py#L296

Maybe setting the defaults to None would be an option?

Simplify `analyze_data` method

The analyze_data method

def analyze_data(

is our main run method receiving attributes parsed from the cli, creating the universe, runs
the analysis and saving the results. The function takes all parameters as keyword arguments. However, if one argument is not
given it fails even if there are default arguments. This should not happen.
Maybe the universe creation, analysis instance creation the run and the saving should split into different functions...

MDAnalysis docstrings not parsed properly

The current implementation of numpydoc does not work with the MDAnalysis docstrings. See the following code example

from numpydoc.docscrape import NumpyDocString

class TestClass():
    r"""A TestClass docstring.

    Some long description

    Parameters
    ----------
    atomgroup : AtomGroup or UpdatingAtomGroup
            foo
    atomgroup2 : AtomGroup
            fancy second atomgroup
    """
    
doc = NumpyDocString(TestClass.__doc__)
print(len(doc["Parameters"]))

The output is 1 however there are 2 parameters in TestClass . When adding an extra line at the beginning of the docstring is parsed properly

class TestClass():
    r"""
    A TestClass docstring.

    ...

Unfortunately I was not able to implement something like doc = NumpyDocString("\n" + TestClass.__doc__) quickly. Does somebody got an idea?

Allow time and frame number in `-b` `-e`, `-dt` option

In our current implementation the time is fixed to pico seconds. This is handy but users might also want to use a different time unit or even frame numbers. There are two options handling this

Option 1

Ad a new paramater like -u for giving the unit for skipping etc. Allowed values could be None for frames, "ps" for pico seconds or even "ns" for nano seconds.

Option 2

Change the current parameters and parse the unit from the user input. I could think of something like

mda_cli xxx -b 10 -e 20 -dt 5 # frames
mda_cli xxx -b 10ps -e 20ps -dt 5ps # time
mda_cli xxx -b 10ps -e 20ns -dt 5ps # even different time units could be allowed

I'm preferring option 2 but are open for other opinions on this.

Tests on Windows are failing

If running the tests on a Windows machine the following error is produced

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '{"key1":1}'

when running the KwargsDict.

@pytest.mark.parametrize(
'cmd,expected',
[
('-d {"key1":1}', {"key1": 1}),
]
)
def test_KwargsDict(cmd, expected):
"""Test dict reading action."""
ap = argparse.ArgumentParser()
ap.add_argument(
'-d',
action=libcli.KwargsDict,
default=None,
)
args = ap.parse_args(cmd.split())
assert args.d == expected

See for example the tests on the main branch.

use MDA style for docs

It would look nice if the mdacli docs followed the style of the other docs to show that it's part of the MDA ecosystem and directly supported by the MDAnalysis org.

And a logo...

build job fails for every version bump commit in main branch

For every new version the main branch runs our test pipeline. However, the build tasks fails all the time. The reason is that there is
no new_version statement in the CHANGELOG.rst since exactly we change the version to the 'new' one (see for example 33eb1df). It is probably useful to disable the build test for pushes to main branch for example by removing the two lines

push:
branches: [main]

What do you think @joaomcteixeira ?

Visit all Analysis DOCSTRINGS

As discussed with @PicoCentauri during #2 and #3:

Visit all Analysis DOCSTRINGS and ensure all are working properly. If they are not, inspect if the issue originates in a parsing failure of cli_mda functions or is a format break upstream in the MDAnalysis package itself. If the latter is true, Pull Request upstream to correct DOCSTRING format.

Prepare for downstream usage

With v.0.1 close to ship we should think about preparing the library for a downstream usage. Especially this is interesting for myself to use our efforts in maicos. I therefore suggest to move all generic functions from cli.py into libcli.py. Afterwards cli.py only contains function calls specific for MDA. Downstream developers can copy cli.py and use it inside their own library. Wee should of course also give help and an example in the docs.

docs on RTD contain Google Analytics

Looking at the page source of the docs hosted on RTD I see "global_analytics_code": "UA-17997319-1",

in

<script type="application/json" id="READTHEDOCS_DATA">{"ad_free": false, "api_host": "https://readthedocs.org", "build_date": "2021-11-24T19:10:36Z", "builder": "sphinx", "canonical_url": null, "commit": "b989042f", "docroot": "/docs/rst/", "features": {"docsearch_disabled": false}, "global_analytics_code": "UA-17997319-1", "language": "en", "page": "index", "programming_language": "py", "project": "mdacli", "proxied_api_host": "/_", "source_suffix": ".rst", "subprojects": {}, "theme": "sphinx_rtd_theme", "user_analytics_code": "", "version": "latest"}</script>

and

<script type="text/javascript" src="https://assets.readthedocs.org/static/javascript/readthedocs-analytics.js" async="async"></script>

Looking at https://assets.readthedocs.org/static/javascript/readthedocs-analytics.js this sets cookies and does send information to Google.

None of this is covered in our Privacy Policy so the docs cannot be served from mdacli.mdanalysis.org as they are. Anything under mdanalysis.org must comply with our privacy policy.

My recommendation is to switch to GH pages and serve as www.mdanalysis.org/mdacli.

mdacli broken since 0.1.20

Trying to run mdacli>=0.1.20 results in

(base) hjaeger@argali:/work/hjaeger/spce_water$ maicos densityplanar -s run.tpr -f run.xtc -atomgroups 'resname SOL'
Logging to file is disabled.
Gromacs version   : b'VERSION 2022-beta1-dev-20211122-4f4b9e4b19-unknown'
tpx version       : 127
tpx generation    : 28
tpx precision     : 4
tpx file_tag      : b'release'
tpx natoms        : 7500
tpx ngtc          : 1
tpx fep_state     : 0
tpx lambda        : 0.0
Error: run() got an unexpected keyword argument 'num_threads'

since the newly introduced -nt argument gets passed to the run function of the analysis class.

documentation

To give users and developers an idea how to work with mda_cli there should be at least minimal docs. I recommend you set up sphinx docs and use the MDA templates as it's easier for users to have same navigation and markup conventions.

Topics that would be important

  • motivation
  • philosophy (what is the underlying key idea/approach?)
  • high-level view (examples of what can done)
    • user side
    • developer side
  • specific examples (for developers: how to use the tool box)

Generalize argparsing

Our approach of parsing the analysis classes and building an argparse object based on the docstring works good for arbitrary parameters. However, currently it is deeply buried and tightly bound to this library.

I often have the same problem that I have a certain function and I want to build a simple command line interface from it. Currently, this involves a lot of copy and pasting. Here, we have everything in place and one just has to generalize everything a bit. There already exists a nice reference implementation which is not as general as what we did here.

If anybody wants to tackle this I already started a little branch which is not working at all but maybe allows for a little headstart.

Visit all Analysis run/result API

As @PicoCentauri and I have been discussing in e-mails:

Verify that all Analysis classes share a common API to run the calculation and retrieve the results. We can also accommodate a small set of APIs through try/catch procedure. A prototype for this code is already in the master branch.

We expected results to be savable to a text file in the disk.

In the future, we can decide if to plot such data o pipe it somewhere else.

Ping to @orbeckst, we might need to discuss with you the best strategy here because this part needs a finer synchronization with the MDAnalysis package itself.

Cheers 😄

More structured command line module help

Our argument parser

def create_CLI(cli_parser, interface_name, parameters):

has creates the groups Common Analysis Parameters, Mandatory Parameters and Optional Parameters when the help page is accessed.
We should split Common Analysis Parameters into something like Universe Parameters, Run Parameters and Storing Results Parameters to be more structured. Additionally, we should give informations on how we store the results after the analysis at the beginning of the Storing Results Parameters group.

Keep logical order when saving results

If you want to plot data from a file stored in columns it is convention that x values have a lower column index compared to the y values. Currently this is not guaranteed. The current results handling proposed in #17 and #29 handle every possible type appearing in an MDAanalysis AnalysisClass. In our approach 1D arrays are stacked into 2D arrays and saved as CSV files. The logical structure is usually encoded inside the class docstring. For example, in lineardensity.py the docstring has the following structure

"""
    results.x.pos : numpy.ndarray
           mass density in [xyz] direction
    results.x.pos_std : numpy.ndarray
           standard deviation of the mass density in [xyz] direction
    results.x.char : numpy.ndarray
           charge density in [xyz] direction
    results.x.char_std : numpy.ndarray
           standard deviation of the charge density in [xyz] direction
"""

Saving these 1D arrays stacked into a 2D array keep this order.

show docs under www.mdanalysis.org

For all official MDAnalysis projects we have docs under our www.mdanalysis.org URL. The authoritative mdacli docs should appear as https://www.mdanalysis.org/mdacli.

This can be achieved with a simple Github Pages deployment action — the GH pages branch will automatically appear in the right place.

The automatic sitemap generation in PR #70 currently assume the canonical URL above. Sitemaps are important for integrating the docs in the search on the website and making our site globally easy to index.

Do not hardcode analysis modules

Currently the relevant_modules are hardcoded as global variable in cli.py. This is not only bad coding practice but also prevents third party libraries to use the cli for their Analysis modules. The lines

mdacli/src/mdacli/cli.py

Lines 29 to 34 in 78fa3f2

# modules in MDAnalysis.analysis packages that are ignored by mdacli
# relevant modules used in this CLI factory
# hydro* are removed here because they have a different folder/file structure
# and need to be investigated separately
skip_mods = ('base', 'hydrogenbonds', 'hbonds')
relevant_modules = (_mod for _mod in __all__ if _mod not in skip_mods)

should be combined with

module = importlib.import_module('MDAnalysis.analysis.' + module)

and moved into the main function. setup_cli should then take the list of modules as required argument.

tab-completion

People (including me) would like to have tab-completion. I played around with it in MAICoS and created a static completion file. However, for this project I would go for a dynamic approach.

A command like mdacli create_completion creates the completion file depending on the current shell and the available analysis modules. I would not run this command by default on installation since maybe the available modules are different depending on the version of mdacli and MDAnalysis. The command returns the location of the completion file and a statement on how to add this to the .bashrc etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.