prolint / prolint2 Goto Github PK

Prolint2 is an optimized tool for analyzing and visualizing lipid-protein interactions from molecular dynamics trajectories.

Home Page: https://prolint2.readthedocs.io/

License: MIT License

Shell 0.30% Python 83.39% HTML 1.61% JavaScript 14.09% CSS 0.60%

lipid-protein-interactions membrane-proteins molecular-dynamics python-library

prolint2's Introduction

A web-based tool to analyze and visualize lipid-protein interactions.

Installation

This is the source-code for the ProLint Webserver.

Install Docker

You need to have Docker installed first, in order to install ProLint.

Install ProLint

To install ProLint locally, all you have to do is download/clone the repository:

git clone https://github.com/ProLint/ProLint.git

and then execute:

cd ProLint
docker-compose up

The main and only installation command is docker-compose up. It will create a Docker build that contains all of the software packages, python libraries and environment configuration required by ProLint to work. As such, this will take a few minutes to finish.

You may get warning and notifications about missing files, but they are harmless. If Docker asks to share the image, make sure to click Share It (I think this is Windows specific).

When installation finishes, open a browser and navigate to: 127.0.0.1:8000 (not 0.0.0.0:8000).

Note that Docker has a default memory allocation that it uses. So if you submit large (and multiple) files make sure to increase the memory allocation otherwise the calculations will fail, with the following error: WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

Docker Installation Tested

The installation process above has been test on MacOS and confirmed to work. Linux should work too. I assume WSL 2 would also work, although I have not tested it yet.

If you are using Windows OS directly (e.g., through the command prompt, or powershell, or anaconda prompt for windows), then you may get errors because of the different line-ending termination. To fix this, you need to tell git to keep line endings as they are. You can do that globally or on a per-repo basis. The instructions below are for configuring git globally, but if you want to do it for the repo specifically please read here for instructions.

# Windows users
git config --global core.autocrlf input
docker-compose up --build

Installation from source

You can also install ProLint directly from source. You can use the provided environment.yml file to create the conda environmet. You'll also need to have gromacs installed and sourced for g_surf to work (if you do not need thickness&curvature app, then you can skip this). Once you have all dependencies installed, you should activate your new conda environment and do the following: Open the terminal and run the redis server:

# simply execute: 
redis-server

Then you need to open prolint/settings.py and change the CELERY_BROKER_URL and CELERY_RESULT_BACKEND variables like so:

CELERY_BROKER_URL = os.environ.get("CELERY_BROKER", "redis://127.0.0.1:6379")
CELERY_RESULT_BACKEND = os.environ.get("CELERY_BROKER", "redis://127.0.0.1:6379")

After that, you will need to open another terminal, navigate to the root django directory (that's the directory that contains manage.py in it) and execute:

# run the celery worker, if you have celery version 4:
celery -A prolint worker -l info

# run the celery worker, if you have celery version 5: 
celery --app prolint worker --loglevel=info

# if you are on windows - for which celery has dropped support since version 4, 
# you will need to install a program like gevent and run the command like this: 
# celery -A prolint worker -l info -P gevent              # v4
# celery --app prolint worker --loglevel=info -P gevent   # v5

Once that is done, you have to open a third terminal, navigate to the root django directory and run the django server:

python manage.py managemigrations
python manage.py migrate
python manage.py runserver

Of course you can run these commands in the background so you do not need to have three terminals open, but for ProLint to work you need at least the following components: the django web application, celery to run tasks asynchronously, and a message broker such as redis to keep track of submitted tasks. This is why the docker installation is preferred, because it reduces installation to one single command.

Important things to know

The following are a list of things I think are important to know:

The docker build will show useful information about your session, and you can use terminals to access the docker images.
Celery output is saved on the logs/celery.log file, so keep an eye on that when you submit jobs.
Submissions are saved in the media/user-data/prolint folder. They are not deleted automatically yet, so please keep an eye on that.
The functions to download and delete submissions are kept, but their usefulness is limited since now you have the data locally. The reason why these buttons are kept is clear from the development roadmap provided below.

Roadmap

ProLint is the result of a lot of work and it already provides many features. It is also in very active development and the following is a rough roadmap of what is planned to come to ProLint:

A cleaner installation without the notifications and warning messages given by Docker currently.
Full support for atomistic simulations. Currently, atomistic data are supported as a beta-feature and we want to fix bugs and add stability to fully handle data at this resolution.
Martini 3 should be supported already, but we still need to test it.
Allow the user to specify residues manually in the submission form.
Provide additional metric support and add the ability for the user to select the preferred ones.
Provide support for systems containing multiple different proteins. Currently, support for these systems is partial.
Support user-requested features.

Roadmap: support for deployment on local networks.

This can already be done, but the current config is not secure. We want to allow people to deploy ProLint on a local network where multiple users/members of research groups can use it. For this reason, ProLint already has a working setup with support for secure user accounts and individual pages to track all the submitted jobs and ability to make them available to other members of the local network. This allows members of a group, for example, to share data with others, prepare for group meetings, use the data during presentations (e.g. during zoom calls), and in general, collaborate. This functionality already exists and has been implemented in ProLint but currently it has been disabled!

Development

All you need to contribute to the development of ProLint is open the ProLint directory with a code editor such as VS Code. Your saves will automatically trigger docker to autoload the build and update the website. These updates are, however, not transmitted when you make changes to the calcul app which is used by Celery. Celery auto-reload on file save is on the to-do list, however.

Bug report

Please feel free to open an issue or contact us if you encounter any problem or bug while working with ProLint.

Citation

ProLint is research software. If you make use of it in work which you publish, please cite it. The BibTeX reference is

@article{10.1093/nar/gkab409,
    author = {Sejdiu, Besian I and Tieleman, D Peter},
    title = "{ProLint: a web-based framework for the automated data analysis and visualization of lipid–protein interactions}",
    journal = {Nucleic Acids Research},
    year = {2021},
    month = {05},
    issn = {0305-1048},
    doi = {10.1093/nar/gkab409},
    url = {https://doi.org/10.1093/nar/gkab409},
    note = {gkab409},
    eprint = {https://academic.oup.com/nar/advance-article-pdf/doi/10.1093/nar/gkab409/38270894/gkab409.pdf},
}

prolint2's People

Contributors

Stargazers

Watchers

Forkers

flop20 tieleman-lab

prolint2's Issues

Load server data from local file

We need to create an API that enables the storage of server data and subsequent loading of the server from the local data file.

Implement the analysis of membrane curvature.

Use https://github.com/MDAnalysis/membrane-curvature.

Polish docs.

Complete command-line section.
Complete the server section.
Use a homogeneous format for the docs in the API integration.
Polish current tutorials and add more.
Add Plotters to the tutorials.

adding config file to future main.

Add a config file for default parameters of the calculations in the future_main branch.

Fix order of the groups during the interactive selection.

Make sure that the order of the selection groups during the interactive selection mode is fixed and consistent between different runs of the code so that it can scriptable.

Implement a visualization frontend (dashboard)

Not clear yet how we should go around implementing this. The actual features to implement are even less clear.
We can start with a backbone setup and continue from there.

Increase the coverage of the tests.

Right now the coverage of the unit tests is only 45%, which is quite low, to demonstrate that the code is reliable for external users we would need to increase this as much as possible.

Adding the contacts class and basic tests.

Add unittests.

Include parallelization with Dask.

Recreate parallel routine with Dask once it has been defined the data structure for the contacts results.

Update setup.py installation

ufcc installation using setup.py does not work, at least not on MacOS.

Can you have a look at that and confirm that it is working on Linux?

Define visualization functions and custom metrics.

Define the same visualization functions there were in the previous version of prolint and define a function for custom metrics.

prolint2.plotting error

Hello,

Working with prolint2 i was not able to import prolint2.plotting and got this error message:
" ModuleNotFoundError: No module named 'prolint2.plotting' "

An idea about this error please ?

Add trajectory information

We should make some basic system information very easily accessible. All of this data is already available via different MDAnalysis function and methods, but it would be nice to have them readily accessible.

Fix command-line version.

The command-line version stoped working due to some issues with the dependencies that needs to be solved.

Plotting error

Hello,
This is a repeat of bug #85 but there has been no one assigned and no movement on this. The module (prolint2.plotting) will not load at all, even when copying the notebooks that are part of this github page.

ModuleNotFoundError: No module named 'prolint2.plotting'

export_to_prolintpy method

This method needs to be optimized or completely substituted. A first step would be to change the line 21 of the w2plp.py file.

HPC support with the Dask scheduler

Optimize the runner class to allow for the setup of remote machines to run the calculation of the contacts.

include a config file for default parameters.

Adding exporting options.

Add exporting options, so that users can be able to use the contacts data for specific analyses.

Remove demo branch?

I don't think it is needed now?

Publish v1.0.0

Once #78 has been merged into the main branch, we can upgrade the package version to 1.0, which will be a significant achievement. However, there is still a long way to go. We require extensive (1) testing and (2) the addition of tutorial/example notebooks.

Improvements to the calculated and displayed metrics

We need to define the metrics we will calculate and use. Currently, we output a metric variable that is set to a float value. We need to update that to an iterable. I'm not so sure about the actual metrics. We can discuss the specifics during our regular meetings.
For now, we can simply use the average contact and the maximum contact.

From a broader view, we may want to group contact types into categories: default contacts, occupancies, and residence types.

Show compute statistics

When calculating contacts, add useful output regarding performance e.g. time it took, resources used, etc.

This output should be enabled by default, and we should provide rough estimates using current dependencies (ie. no need to add a new dependency).

At this point, we also don't need to worry about the formatting of the output or any other similar details. A simple example:

Calculation Report
------------------
Resource used: CPU
Time per search: 0.01 seconds
Iterations: 1000
Total time: 10 seconds

Deal with MDAnalysis warnings when working with Martini model

Default warnings coming out of MDAnalysis are not very useful when working with the Martini model.
We can try to wrap them and better handle them.

Add contact metrics to the contact calculation routine

We should go ahead and implement some basic contact metrics. This is essential to move forward with issue #12
We can maybe start by adding the sum of all contacts, and the average of contacts.

Loading examples and example files

We should have a dedicated way of loading and working with example files. For example:

form UFCC.examples import GIRK
# we can then load the data using GIRK.directory notation: 
print (GIRK.trajectory)
# $INSTALLATION_DIR/data/GIRK/trajectory.xtc

Update the documentation.

Include usage from the command-line and the API.

Improve API usage.

Improve the way the users can use the software as an API using a python script.

Heatmap/contact projection error

Hello,
I was trying to visualise the contact projection (heatmap) and got this error (below ) could you help me please ?
pl.show_contact_projection(T, bf=js[metrics[0]], cmap="Spectral_r")

TypeError Traceback (most recent call last)
/tmp/ipykernel_3340230/3395143762.py in
1 # visualize the first metric
----> 2 pl.show_contact_projection(T, bf=js[metrics[0]], cmap="Spectral_r")

/softwares/Anaconda3/2023.07/envs/prolint/lib/python3.7/site-packages/prolintpy/vis/show_contact_projection.py in show_contact_projection(t, bf, protein, residue_list, ngl_repr, cmap)
69 else:
70 if len(df.resSeq.unique()) != len(bf):
---> 71 raise TypeError ('When projecting only a subset of residues provide a list of tuples: [(residue_id, value), ...]')
72 for atom in resseq:
73 atomic_bfactors.append(bf[atom-1])

TypeError: When projecting only a subset of residues provide a list of tuples: [(residue_id, value), ...]

i used the data.json file and P454_BB.pdb (which is the file of 2 peptides ) from the output

Add a cli version of the code

To facilitate the usage of the code, we should implement a cli version of the code. Initially, we only need a bare bones version with support for only the essential components.

Create Getting Started notebook.

Include Getting Started notebook.
Modify the overview in the documentation, including a benchmark figure and at least one snapshot of the new dashboard.

Redefine data structure for the results including different metrics.

Define valuable metrics and the best way to define them in the code.

Self-interactions.

Work on the analyses of self-interactions (e.g. lipid-lipid and lipid-protein interactions).

Fix issue with the interactions in the Dashboard.

The residues IDs in the dashboard are not correct. The issue should be similar to the previously fixed in the exporting functions.

Include parser

Argument parser to use the library directly from the command line.

Implement support for Index Library

Rather than making an artificial distinction between what is and what isn't protein, we should add support for a much more extensive and user friendly set of groups. We can start by taking the GROMACS make_ndx command as motivation.

Upon loading of user data, we retrieve all atom labels, and group everything into residue level. By default we can define the following labels:

1. System     Size
2. Protein    Size
3. Lipids     Size
4. Water      Size
5. Ions       Size
6. Ligands    Size
7. POPC       Size 
8. POPE       Size
...

Next we take all non-protein residues and list them all.

Now we also need a way to work with this Index Library. One suggestion could be to define a make_library() or make_index() function which takes two arguments: selection and action. Selection is a wrapper around the select_atoms MDAnalysis function, but which adds support for the default groups we define above. E.g. UFCC.make_index('select 1 and not 2', action='a')

Adding new lipid types with -al option in the command line.

The list type for the add_lipids variable in the command-line parser doesn't seem to be working as expected.

Support for logical operations between groups in the interactive selection.

Add support for + and -, which can be mapped into set operations (e.g. + can be union, | can be intersection, - can be A - (A∩B), and others) during the interactive selection of the Database and Query groups for the calculation of the contacts.

MDAnalysis already implements many of these. I checked the code and it should be possible to add atomgroups directly (and do other operations as well): link.

Have a look at this: https://www.codingem.com/python-__add__-method/.