mdanalysis / mdacli Goto Github PK
View Code? Open in Web Editor NEWCommand line interface for MDAnalysis
Home Page: https://mdacli.readthedocs.io/
License: GNU General Public License v3.0
Command line interface for MDAnalysis
Home Page: https://mdacli.readthedocs.io/
License: GNU General Public License v3.0
The implementations here look interesting.
With a quick look I can't understand why tests were cancelled on MAC and Windows after py38 was success. Need to investigate this in another PR.
Originally posted by @joaomcteixeira in #28 (comment)
Can we add ReadTheDocs to the checks in a PR. In this way, the docs are built immediately and one can directly check them.
See https://docs.readthedocs.io/en/stable/pull-requests.html
When running the code from #3 commit 76a7093, the AnalysisBase
appears in the CLI
selection menu. Currently, I don't understand from where it comes because it is removed here:
skip_mods = ('base', 'rdf_s', 'hydrogenbonds', 'hbonds')
relevant_modules = (_mod for _mod in __all__ if _mod not in skip_mods)
To investigate... ๐ฌ
$ python cli_main.py -h
Warning: This module is deprecated as of MDAnalysis version 1.0. It will be removed in MDAnalysis version 2.0.Please use MDAnalysis.analysis.helix_analysis instead.
Warning: This module is deprecated as of MDAnalysis version 1.0.It will be removed in MDAnalysis version 2.0Please use MDAnalysis.analysis.hydrogenbonds.hbond_analysis instead.
usage: cli_main.py [-h]
{AlignTraj,AnalysisBase,AverageStructure,Contacts,DensityAnalysis,HoleAnalysis,EinsteinMSD,PersistenceLength,InterRDF,InterRDF_s,RMSD,RMSF}
...
optional arguments:
-h, --help show this help message and exit
MDAnalysis Analysis CLI:
{AlignTraj,AnalysisBase,AverageStructure,Contacts,DensityAnalysis,HoleAnalysis,EinsteinMSD,PersistenceLength,InterRDF,InterRDF_s,RMSD,RMSF}
AlignTraj RMS-align trajectory to a reference structure using a
selection.
AnalysisBase Base class for defining multi frame analysis
AverageStructure RMS-align trajectory to a reference structure using a
selection,
Contacts Calculate contacts based observables.
DensityAnalysis Volumetric density analysis.
HoleAnalysis Run :program:`hole` on a trajectory.
EinsteinMSD Class to calculate Mean Squared Displacement by the
Einstein relation.
PersistenceLength Calculate the persistence length for polymer chains
InterRDF Intermolecular pair distribution function
InterRDF_s Site-specific intermolecular pair distribution
function
RMSD Class to perform RMSD analysis on a trajectory.
RMSF Calculate RMSF of given atoms across a trajectory.
I think #101 and deletion of the gh-pages
branch means that mdacli.mdanalysis.org
is now showing some random not-MDA website. The solution here is probably to point mdacli.mdanalysis.org
to https://mdacli.readthedocs.io/ using the ReadTheDocs subdomain. @PicoCentauri @joaomcteixeira I think only you two have maintainer access to RTD, so you might have to add someone or do it yourselves :)
๐
As discussed with @PicoCentauri during #2 and #3:
Visit all Analysis DOCSTRINGS and ensure all are working properly. If they are not, inspect if the issue originates in a parsing failure of cli_mda
functions or is a format break upstream in the MDAnalysis package itself. If the latter is true, Pull Request upstream to correct DOCSTRING format.
For running analysis code in shared machines it can be useful to limit the number of threads. By default
numpy
and to my knowledge also MDAnalysis
takes all resources available. threadpoolctl
allows nicely to limit the
number of threads. I am using it with great success in MAICosS.
I'm just copying the essential lines. They just have to be inserted in the right positions in mdacli
First add ne new parameter
parser.add_argument("-nt",
dest="num_threads",
type=int,
default=0,
help="Total number of threads to start (0 is guess)")
after
Lines 198 to 204 in 3aa2a66
and add a with
statement
with threadpool_limits(limits=args.num_threads):
...
around
Lines 131 to 136 in 3aa2a66
This will of course add a new dependency.
For another issue, we need to define a general logging library to configure debugging mode etc more nicely. But for now is fine.
Originally posted by @joaomcteixeira in #45 (comment)
It would look nice if the mdacli docs followed the style of the other docs to show that it's part of the MDA ecosystem and directly supported by the MDAnalysis org.
And a logo...
In this line our code will break since the analysis_kwargs
dictionary contains the keys begin
, end
, dt
, verbose
and func
. I fixed this, but we should maybe clean up the analysis_kwargs
dict. It should only contain the entries for the analysis class... What do you think
Originally posted by @PicoCentauri in #3 (comment)
Running mdacli with current development version of MDAnalysis the parse_docs
is able to parse the docs. I pinned down the problematic entry to the universe attribute of DistanceMatrix in the diffusionmap.py. It can be easily seen by adding a print
print(klass, par_name, others_)
statement to before line L288
Most likely our regex is not able to parse the `... @joaomcteixeira do you have an idea?
Additionally it seems that MDAnalysis also got docstring entries using :
and :
to seperate the name and the type. I think we are not able to parse them...
@PicoCentauri and I have discussed on discord how to accommodate clients using several universes or atom groups or different numerical combinations of both.
To give users and developers an idea how to work with mda_cli there should be at least minimal docs. I recommend you set up sphinx docs and use the MDA templates as it's easier for users to have same navigation and markup conventions.
Topics that would be important
Looking at https://mdacli.readthedocs.io/en/latest/api.html , the following sections are empty
I would expect to see something there.
If one is using an AnalysisClass on a trajectory with only one frame, the AnalysisBaseClass from MDA will happily do exactly that. This is because at initialization (
https://github.com/MDAnalysis/mdanalysis/blob/3769ee29e5907221527ff0ec88a8c5acf9f86dee/package/MDAnalysis/analysis/base.py#L247
), if start
, stop
and step
are None
, and the trajectory is ONE frame long, it sets them to 0, 1 and 1 respectively.
mdacli however sets the default values to start = 0
, stop = -1
and step = 1
. This does not result in the same behaviour, since for array of length 1:
a[0:-1:1] != a[0:1:1]
This difference in behaviour finally matters at this point: https://github.com/MDAnalysis/mdanalysis/blob/3769ee29e5907221527ff0ec88a8c5acf9f86dee/package/MDAnalysis/analysis/base.py#L296
Maybe setting the defaults to None
would be an option?
Looking at the page source of the docs hosted on RTD I see "global_analytics_code": "UA-17997319-1",
in
<script type="application/json" id="READTHEDOCS_DATA">{"ad_free": false, "api_host": "https://readthedocs.org", "build_date": "2021-11-24T19:10:36Z", "builder": "sphinx", "canonical_url": null, "commit": "b989042f", "docroot": "/docs/rst/", "features": {"docsearch_disabled": false}, "global_analytics_code": "UA-17997319-1", "language": "en", "page": "index", "programming_language": "py", "project": "mdacli", "proxied_api_host": "/_", "source_suffix": ".rst", "subprojects": {}, "theme": "sphinx_rtd_theme", "user_analytics_code": "", "version": "latest"}</script>
and
<script type="text/javascript" src="https://assets.readthedocs.org/static/javascript/readthedocs-analytics.js" async="async"></script>
Looking at https://assets.readthedocs.org/static/javascript/readthedocs-analytics.js this sets cookies and does send information to Google.
None of this is covered in our Privacy Policy so the docs cannot be served from mdacli.mdanalysis.org as they are. Anything under mdanalysis.org must comply with our privacy policy.
My recommendation is to switch to GH pages and serve as www.mdanalysis.org/mdacli.
The analyze_data
method
Line 234 in 78fa3f2
is our main run method receiving attributes parsed from the cli, creating the universe, runs
the analysis and saving the results. The function takes all parameters as keyword arguments. However, if one argument is not
given it fails even if there are default arguments. This should not happen.
Maybe the universe creation, analysis instance creation the run and the saving should split into different functions...
Our approach of parsing the analysis classes and building an argparse object based on the docstring works good for arbitrary parameters. However, currently it is deeply buried and tightly bound to this library.
I often have the same problem that I have a certain function and I want to build a simple command line interface from it. Currently, this involves a lot of copy and pasting. Here, we have everything in place and one just has to generalize everything a bit. There already exists a nice reference implementation which is not as general as what we did here.
If anybody wants to tackle this I already started a little branch which is not working at all but maybe allows for a little headstart.
There is a severe coupling here. It is okay for now. Maybe we leave it for some exercise.
Originally posted by @joaomcteixeira in #41 (comment)
@PicoCentauri started polling on Discord How would you like to call MDAnalysis from the command line?
MDAnalysis --help
mdanalysis --help
mda --help
mdacli --help
With v.0.1 close to ship we should think about preparing the library for a downstream usage. Especially this is interesting for myself to use our efforts in maicos. I therefore suggest to move all generic functions from cli.py
into libcli.py
. Afterwards cli.py
only contains function calls specific for MDA. Downstream developers can copy cli.py
and use it inside their own library. Wee should of course also give help and an example in the docs.
As @PicoCentauri and I have been discussing in e-mails:
Verify that all Analysis classes share a common API to run the calculation and retrieve the results. We can also accommodate a small set of APIs through try/catch procedure. A prototype for this code is already in the master branch.
We expected results to be savable to a text file in the disk.
In the future, we can decide if to plot such data o pipe it somewhere else.
Ping to @orbeckst, we might need to discuss with you the best strategy here because this part needs a finer synchronization with the MDAnalysis package itself.
Cheers ๐
Our argument parser
Line 60 in 78fa3f2
has creates the groups Common Analysis Parameters
, Mandatory Parameters
and Optional Parameters
when the help page is accessed.
We should split Common Analysis Parameters
into something like Universe Parameters
, Run Parameters
and Storing Results Parameters
to be more structured. Additionally, we should give informations on how we store the results after the analysis at the beginning of the Storing Results Parameters
group.
The current implementation of numpydoc
does not work with the MDAnalysis
docstrings. See the following code example
from numpydoc.docscrape import NumpyDocString
class TestClass():
r"""A TestClass docstring.
Some long description
Parameters
----------
atomgroup : AtomGroup or UpdatingAtomGroup
foo
atomgroup2 : AtomGroup
fancy second atomgroup
"""
doc = NumpyDocString(TestClass.__doc__)
print(len(doc["Parameters"]))
The output is 1 however there are 2 parameters in TestClass
. When adding an extra line at the beginning of the docstring is parsed properly
class TestClass():
r"""
A TestClass docstring.
...
Unfortunately I was not able to implement something like doc = NumpyDocString("\n" + TestClass.__doc__)
quickly. Does somebody got an idea?
Our current docstring parser located at
Line 163 in 78fa3f2
returns a dictionary of the the docstring. It works but it is not as flexible and tolerant as the sphinx/napoleon implementation. Especially we have problems with the separator between a parameter name and its type; usually denoted by name : type
. A different notation can not be parsed since we use a hardcoded split
Lines 230 to 232 in 78fa3f2
Improvements with using a regex did also not succeed. If possible we should incorporate the sphinx parser or at least get some ideas from their implementation.
For all official MDAnalysis projects we have docs under our www.mdanalysis.org URL. The authoritative mdacli docs should appear as https://www.mdanalysis.org/mdacli.
This can be achieved with a simple Github Pages deployment action โ the GH pages branch will automatically appear in the right place.
The automatic sitemap generation in PR #70 currently assume the canonical URL above. Sitemaps are important for integrating the docs in the search on the website and making our site globally easy to index.
If running the tests on a Windows machine the following error is produced
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '{"key1":1}'
when running the KwargsDict.
Lines 24 to 39 in 3c45d8d
See for example the tests on the main branch.
with MDAnalysis 1.1.1
(mdacli) joao@vantito:~/github/mda_cli
ยท mdacli
Traceback (most recent call last):
File "/home/joao/anaconda3/envs/mdacli/bin/mdacli", line 33, in <module>
sys.exit(load_entry_point('mdacli', 'console_scripts', 'mdacli')())
File "/home/joao/github/mda_cli/src/mdacli/cli.py", line 591, in main
maincli(setup_clients())
File "/home/joao/github/mda_cli/src/mdacli/cli.py", line 575, in setup_clients
module = importlib.import_module('MDAnalysis.analysis.' + module)
File "/home/joao/anaconda3/envs/mdacli/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'MDAnalysis.analysis.rdf_s'
I know that @joaomcteixeira is not a fan of OOPing things but I will post this here anyway ;).
I thought about altering the base class with two new methods/attributes. A self.doc_dict
attribute: a dictionary combining the docstring information and the callable signature. In principle, it is our
Line 62 in 78fa3f2
as a class method.
And a self.save_results()
method based on our general approach could be added if no such a function exists.
If we dynamically alter the analysis class we are parsing this could help to improve our workflow and will help users/developers that try to use our python implementation.
Trying to run mdacli>=0.1.20
results in
(base) hjaeger@argali:/work/hjaeger/spce_water$ maicos densityplanar -s run.tpr -f run.xtc -atomgroups 'resname SOL'
Logging to file is disabled.
Gromacs version : b'VERSION 2022-beta1-dev-20211122-4f4b9e4b19-unknown'
tpx version : 127
tpx generation : 28
tpx precision : 4
tpx file_tag : b'release'
tpx natoms : 7500
tpx ngtc : 1
tpx fep_state : 0
tpx lambda : 0.0
Error: run() got an unexpected keyword argument 'num_threads'
since the newly introduced -nt
argument gets passed to the run function of the analysis class.
People (including me) would like to have tab-completion. I played around with it in MAICoS and created a static completion file. However, for this project I would go for a dynamic approach.
A command like mdacli create_completion
creates the completion file depending on the current shell and the available analysis modules. I would not run this command by default on installation since maybe the available modules are different depending on the version of mdacli
and MDAnalysis
. The command returns the location of the completion file and a statement on how to add this to the .bashrc
etc.
The site_url
at
Line 43 in b989042
The site_url
Line 44 in dec4abd
mdacli rmsd
should be possible and not only mdacli RMSD
See #103 (comment)
Currently the relevant_modules
are hardcoded as global variable in cli.py
. This is not only bad coding practice but also prevents third party libraries to use the cli for their Analysis modules. The lines
Lines 29 to 34 in 78fa3f2
should be combined with
Line 337 in 78fa3f2
and moved into the main function. setup_cli
should then take the list of modules as required argument.
On top of #8 and #9 discussions, we need to agree on a strategy for testing and CI. Despite MDA_CLI being part of MDA and uses MDA it is independent of the main project, so I believe it should have its own testing and CI infrastructure. What do you think @orbeckst @PicoCentauri ?
On Travis-CI for taurenmd
I use tox-conda
to install MDAnalysis and perform taurenmd
tests. It can be an option to reproduce that strategy here. We can also try Github Actions.
For every new version the main branch runs our test pipeline. However, the build tasks fails all the time. The reason is that there is
no new_version
statement in the CHANGELOG.rst since exactly we change the version to the 'new' one (see for example 33eb1df). It is probably useful to disable the build test for pushes to main branch for example by removing the two lines
mdacli/.github/workflows/build.yml
Lines 4 to 5 in 33eb1df
What do you think @joaomcteixeira ?
If you want to plot data from a file stored in columns it is convention that x values have a lower column index compared to the y values. Currently this is not guaranteed. The current results handling proposed in #17 and #29 handle every possible type appearing in an MDAanalysis
AnalysisClass. In our approach 1D arrays are stacked into 2D arrays and saved as CSV files. The logical structure is usually encoded inside the class docstring. For example, in lineardensity.py the docstring has the following structure
"""
results.x.pos : numpy.ndarray
mass density in [xyz] direction
results.x.pos_std : numpy.ndarray
standard deviation of the mass density in [xyz] direction
results.x.char : numpy.ndarray
charge density in [xyz] direction
results.x.char_std : numpy.ndarray
standard deviation of the charge density in [xyz] direction
"""
Saving these 1D arrays stacked into a 2D array keep this order.
After #28 (at least when I noticed it), the data/topology
file can't be read:
$ mdacli RMSF -s data/topol.tpr -f data/traj.trr -atomgroup all
Warning: No coordinate reader found for data/topol.tpr. Skipping this file.
I know this is not part of this PR, but we should start handling paths with pathlib
instead of os
and strings
. Let's rise an issue for that.
Originally posted by @joaomcteixeira in #45 (comment)
For another issue, we need to define a general logging library to configure debugging mode etc more nicely. But for now is fine.
Originally posted by @joaomcteixeira in #45 (comment)
Also we have to fix the readme on pypi...
Originally posted by @PicoCentauri in #103 (comment)
This likely happened in #96 when dropping the function to setup the readme in setup.py
.
#109 shows that mdacli is missing some integration tests with MDAnalysis, i.e., runnning a minimal working example of an analysis to actually call the run()
function once.
@PicoCentauri @orbeckst
We need to drop py37 here, right?
Some classes have a specific named argument named kwargs
that does not serve the typical kwargs functionality. Instead it is a dictionary passed to the execution method. An example of this is the Contacts
class:
I believe we need to write an Argparse action to read a dictionary like input from the command line when a kwargs
argument exists in the analysis class. The analysis class will know how to handle such dictionary.
Would be nice to have a little dev script printing the arguments of analysis classes. @joaomcteixeira you already build one once. Maybe you can provide the code as a snippet in devtools
.
Do you want to move the repo under https://github.com/MDAnalysis โ you'll have full admin permissions, we can pin it to the top of the repo list, we can easily make docs appear under https://www.mdanalysis.org/mda_cli , and I think it will increase visibility.
EDIT: link to doc issue
In our current implementation the time is fixed to pico seconds. This is handy but users might also want to use a different time unit or even frame numbers. There are two options handling this
Ad a new paramater like -u
for giving the unit for skipping etc. Allowed values could be None
for frames, "ps" for pico seconds or even "ns" for nano seconds.
Change the current parameters and parse the unit from the user input. I could think of something like
mda_cli xxx -b 10 -e 20 -dt 5 # frames
mda_cli xxx -b 10ps -e 20ps -dt 5ps # time
mda_cli xxx -b 10ps -e 20ns -dt 5ps # even different time units could be allowed
I'm preferring option 2 but are open for other opinions on this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.