tubiana / ttclust Goto Github PK

clusterize molecular dynamic trajectories (amber, gromacs, charmm, namd, pdb...)

License: Other

Python 100.00%

ttclust's Introduction

TTClust : A molecular simulation clustering program

DESCRIPTION

TTclust is a python program used to clusterize molecular dynamics simulation trajectories. It only requires a trajectory and a topology file (compatible with most molecular dynamic packages such as Amber, Gromacs, Chramm, Namd or trajectory in PDB format thanks to the MDtraj package). Easy to use, the program produces a visual feedback of the clustering through a dendrogram graph. Other graphic representations are made to describe all clusters (see OUTPUT EXAMPLE part for more details).

Python Compatibility

This program is compatible with python 2.7.x and 3.x. You can have a warning for matplotlib with python 3.x but the program still works

##GUI If you want to use a Graphical User Interface instead, you can install ttclustgui after ttclust (using pip).
Link to the GUI : https://github.com/tubiana/ttclustGUI

Installation

Installation with Conda

If you want an easy installation of ttclust, I suggest these steps:

If you don't have conda (or Python) installed, Install miniconda (https://docs.conda.io/en/latest/miniconda.html) (with Python3, it's the future...)
Install with the command conda install -c tubiana -c conda-forge ttclust

Installation with PIP

Install numpy and cython with pip : pip install cython numpy
Install ttclust pip install ttclust

I strongly suggest you to use conda because pip will compile mdtraj and increase the the chance that the install fails (cgg/microsoft visual C++ library....).

Installation & usage from sources

Clone this repo git clone https://github.com/tubiana/ttclust
Install dependancies:
- Using pip (and use python environment system)
  sudo pip install -r requirements.txt
- using conda, suggested (and use a virtual conda environment, leaving your python installation untouched)
  conda env create -f environment.yml
  - Don't forget to activate environement with conda activate ttclust
Use ttclust scripts with python {PATH}/ttclust/ttclust.py or python {PATH}/ttclust/ttclustGUI.py

Note : sometimes mdtraj is hard to install. If you use PIP, please install manually cython before in this case sudo pip install cython then sudo pip install -r requirements.txt.
If you have still issues in installing mdtraj, you can install it with conda with conda install mdtraj

Possibles issues

For CentOS user

If you have issues with pip, I suggest you install ANACONDA and restart your terminal afterwards. Then, you need to install wxPython with conda conda install wxPython. Finally, you can use the PIP commmand: sudo pip install -r requirements.txt

For Windows user

If you have issues with pip installing mdtraj (Microsoft Visual C++ Build Tools missing), I also suggest you install ANACONDA and restart yout terminal afterwards. Then, you can mdtraj with conda conda install mdtraj. Finally, you can use the PIP commmand: sudo pip install -r requirements.txt

NOTE if ttclust was installed with pip, ttclustGUI will not work due to the gooey package (I hope it will be fixed soon..)
But you can still use the GUI with the script by cloning this repo and execute ttclustGUI.py

For Mac user

If you have issues with pip, first try to add to pip the --ignore-installed argument : sudo pip install --ignore-installed -r requirements.txt If it still doesn't work, it's maybe because of the System Integrity Protection (SIP). I suggest you in this case install ANACONDA or MINICONDA and restart your terminal afterwards. Normally, the pip command should work because your default python will be the anaconda (or miniconda) python.

If you have still issues with the GUI or missing packages : install with pip :
pip install wxpython==4.0.0b1
pip install pandas
pip install ttclust

To activate autocompletion for the argpase module, you have to use this command (only once): sudo activate-global-python-argcomplete

Dependancies and installation:

Following packages are needed:

argparse
argcomplete (for autocompletion, optional)
cython (for mdtraj)
mdtraj (version >= 0.17)
progressbar
datetime (a python library standard)
glob (a python library standard)
matplotlib
scipy (version >= 0.18)
prettytable
sklearn (version >= 0.18)
RXPY>=0.1.0 (FOR GUI)
wxpython>=4.0.0b1 (FOR GUI)
Pillow>=4.3.0 (FOR GUI)
psutil>=5.4.2 (FOR GUI)
gooey (FOR GUI)

Atoms selection

For Selection syntax, use the one from MDTraj (https://mdtraj.org/1.9.4/atom_selection.html). You can specify different selections for the calculation:

st is used to extract a part of the trajectory (if this one is too big). ..* leave this blank if you want to keep your trajectory intact.
sa used to align your trajectory on the same reference (eg: a chain or ..* the backbone)
sr is used for the clustering (rmsd calculated on the atom selected with ..* this string)

NOTE on Nucleic Acids

MDTRAJ doesn't have nucleic acid keywords yet. We've implemented some keywords that will be altered to match DNA/RNA.... Keywords added :

dna : selection based on the residue name (DA/DT/DC/DG)
rna : selection based on the residue name (A/T/G/C or RA/RT/RG/RC)
backbone_na : backbone of nucleic acid. Selection based on the residue name and atom name (P, O3', O5', C3', C4', C5')
base : selection base on the residue name and atom name. Select RNA or DNA and exclude backbone_na, sugar's atoms and hydrogen
base_rna : same as base but for RNA
base_dna : same as base but for DNA

Those selection keywords can be used with other MDTRAJ selection keywords, e.g.:

"protein and not dna"
"rna and not type H"

Clustering Methods

With the scipy module, several methods for clustering are available. Change the method used with the -m argument. Methods available are:

single
complete
average
weighted
centroid
median
ward (DEFAULT)

4 possibilities are available for the calculation:

Autoclustering (default method). The autoclustering uses the elbow method to find the optimum cluster numbers.
Give the number of clusters you want. Eg: if you want 3 clusters, use the argument
-ng 3
Give a cutoff for the clustering. The final clusters are made from a dendrogram and this cutoff is used for the distance cutoff. If you want to ..* set this cutoff manually, use the argument
-cc x.x (x.x is the cutoff)
Choose your cutoff by clicking on the matplotlib windows (on the dendrogram) in this case don't use the other arguments. recommended for the first clustering

Distance Matrix

The distance matrix can be long to calculate depending on your trajectory size. That's why this matrix is saved on the ".npy" format, in order to be used later. The name of the matrix will be the name of your selection string for clustering (sr) If you use the same selection string for clustering (sr) the matrix will be detected and the programme will ask you if you want to use it again (Y), recalculate this matrix (N) or choose another matrix (O). If you want to use the saved matrix without this interactive question) add in argument -i n which will deactivate the interactive prompt.

ARGUMENTS

  -h, --help            show this help message and exit
  -f TRAJ, --traj TRAJ  trajectory file(s). You can give a list of trajectory (see usage example)
  -t TOP, --top TOP     topfile
  -s INT, --stride INT  stride, read every Xth frames
  -l LOGFILE, --logfile LOGFILE
                        logfile (default : clustering.log). The name of your
                        output file will be the basename (name before the
                        extention of this logfile
  -st SELECT_TRAJ, --select_traj SELECT_TRAJ (default: all)
                        selection syntax for trajectory extraction, with QUOTE 
  -sa SELECT_ALIGNEMENT, --select_alignement SELECT_ALIGNEMENT (default: backbone)
                        selection syntax for alignement with QUOTE
						If you don't want alignement: use "none"
  -sr SELECT_RMSD, --select_rmsd SELECT_RMSD (default: backbone)
                        selection syntax for RMSD with QUOTE 
  -m METHOD, --method METHOD (default: ward)
                        method for clustering: single; complete; average;
                        weighted; centroid; median and ward
  -cc CUTOFF, --cutoff CUTOFF
                        cutoff for clusterization from hierarchical clusturing
                        with Scipy. If you choose to click on the graph, cutoff
                        will be the clicked value 
  -ng NGROUP, --ngroup NGROUP
                        number of group wanted. Use the maxclust method to
                        clusterize in this case. If you specify "auto", kmeans clustering
						with the elbow algorithm is used to find the optimal number of
						clusters
  -aa AUTOCLUST, --autoclust AUTOCLUST
                        By default, autoclustering is activated. Autoclustering is desactivated
                        when specifiying anything other than "Y", a cutoff value ('-cc') or a 
                        number of group ('-ng') 
  -i INTERACTIVE, --interactive INTERACTIVE
                        Interactive mode for distance matrix (Y/n)
  -axis AXIS, --axis AXIS
                        if something is wrong in the axis of timed barplot
                        graph (wrong time unit), and you just want 'frame'
                        instead of time Choose 'frame' here.
  -limitmat LIMITMAT, --limitmat LIMITMAT
                        If the distance matrix is too long to generate choose
                        a limit here. Default is 100000000

USAGE:

There is some example usage with the examples files given on the "example" folder. Please note that the trajectory is reduced to the backbone in order to reduce the size of the git archive. Caution: You have to put quote beside your selection string (for sr, st, and sa arguments)

Simple usage (clustering on backbone, logfile is called clustering.log, output folder is "clustering") python ttclust.py -f examples/example.xtc -t examples/example.pdb
Simple usage with reading every 10 frames python ttclust.py -f examples/example.xtc -t examples/example.pdb -s 10
Simple usage with multiple trajectories
python ttclust.py -f traj1.xtc traj2.xtc -t examples/example.pdb
python ttclust.py -f *.xtc -t examples/example.pdb
Clustering on residues 30 to 200 and backbone python ttclust.py -f examples/example.xtc -t examples/example.pdb -sr "residue 30 to 200 and backbone" -l res30-200.log
Clustering on CA atoms and save this part of the trajectory with a cutoff of 2.75 python ttclust.py -f examples/example.xtc -t examples/example.pdb -sr "name CA" -st "name CA" -cc 2.75 -l CA-c2.75.log
Clustering on backbone of the protein and chain A (note that with mdtraj there is no chaine's name, but chaine ID starting from 0) with 10 clusters only python ttclust.py -f examples/example.xtc -t examples/example.pdb -sr "protein and backbone and chainid 0" -l backbone-chainA.log -ng 10
Note For PDB trajectory, don't use the -t argument python TrajectoryClustering.py -f traj.pdb -st "protein" -sr "backbone"

NB (For Mac User)

You need to use pythonw instead of python.

OUTPUT EXAMPLE

Cluster result (structure)

PDBs are saved in a new folder Cluster_PDB.
You can find in the PDB name the cluster number, size and the representative frame of the cluster (ie the frame number of the saved structure)
Example: C1-f11-s44.pdb corresponds to the cluter 1 made of 44 structures and the saved frame (representative) is the frame 11.

Logfile

In the log file you will find all arguments given to the program with cluster information:

size: number of structures in the cluster
representative frame: frame with lowest RMSD between all other frames of the cluster
Members: all frames belonging to the cluster
spread: mean RMSD between all frames in the cluster
RMSD between clusters: A tab with the RMSD between clusters
Average RMSD between clusters: the average RMSD between clusters.

Dendrogram

A dendrogram is generated at the end of the clustering with the corresponding cluster colors. The name of this file will be the same as the logfile with a ".png" extension. example: example.log --> example.png
The grey horizontal line is the cutoff value used.

LinearProjection representation

A linear projection of cluster is made for the trajectory. Every barline represents a frame and the color a cluster number. Note that:

If less or equal than 12 clusters: a defined color map was made in this order: red, blue, lime, gold, darkorchid, orange, deepskyblue, brown, gray, black, darkgreen, navy
Else, the matplotlib "hsv" color map is used but the color change according to the number of clusters.

Barplot representation

A vertical barplot is generated to have an overview of the cluster size. Each bar color corresponds to the cluster's color in the LinearProjection's representation and dendrogram cluster's color.

2D distance projection

A 2D projection of the distance(RMSD) between the representative frame of each cluster is made. The method used is the multidimentional scaling method from the sk-learn python module. We can follow the evolution of each cluster thanks to the relative distance between them. The color of the points is the same as for other graphs (i.e. cluster's color) and the radius of each point depends on the cluster's spread.

Distance matrix plot

A plot of the distance matrix is also made and allows to easily visualize the distance between two frames.

#CITATION If you use TTclust in a publication, please use this citation

Tubiana, T., Carvaillo, J.-C., Boulard, Y., & Bressanelli, S. (2018). TTClust: A Versatile Molecular Simulation Trajectory Clustering Program with Graphical Summaries. Journal of Chemical Information and Modeling, 58(11), 2178–2182. https://doi.org/10.1021/acs.jcim.8b00512

Licence

This program is under the GNU GPLv3 licence, which means that anyone who distributes this code or a derivative work has to make the source available under the same terms, and also provides an express grant of patent rights from contributors to users.

ttclust's People

Contributors

Stargazers

Watchers

ttclust's Issues

Ligand poses clustering

If I want to cluster the ligand poses instead of the protein backbone , should I select it from -sa, -sr, or -st?

stride issues on Ubuntu 20.4

Hi, I installed from sources so I could use the --stride option but I ran into this problem:

********************************************************
******************  TTCLUST 4.7.2 *********************
********************************************************

======= TRAJECTORY READING =======
/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/scipy/io/netcdf.py:308: RuntimeWarning: Cannot close a netcdf_file opened with mmap=True, when netcdf_variables or arrays referring to its data still exist. All data arrays obtained from such files refer directly to data on disk, and must be copied before the file can be cleanly closed. (See netcdf_file docstring for more information on mmap.)
  warnings.warn((
Traceback (most recent call last):
  File "/home/pbarletta/TTClust/ttclust/ttclust.py", line 1279, in <module>
    main()  # keep trajectory for usage afterwards (in shell, debug etc..)
  File "/home/pbarletta/TTClust/ttclust/ttclust.py", line 1268, in main
    traj = Cluster_analysis_call(args)
  File "/home/pbarletta/TTClust/ttclust/ttclust.py", line 1167, in Cluster_analysis_call
    traj = md.load(trajfile,
  File "/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/mdtraj/core/trajectory.py", line 433, in load
    value = loader(filename, **kwargs)
  File "/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/mdtraj/formats/netcdf.py", line 102, in load_netcdf
    return f.read_as_traj(topology, n_frames=n_frames, atom_indices=atom_indices, stride=stride)
  File "/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/mdtraj/formats/netcdf.py", line 205, in read_as_traj
    xyz, time, cell_lengths, cell_angles = self.read(n_frames=n_frames, stride=stride, atom_indices=atom_indices)
  File "/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/mdtraj/formats/netcdf.py", line 277, in read
    coordinates = self._handle.variables['coordinates'][frame_slice, atom_slice, :]
  File "/home/pbarletta/anaconda3/envs/ttclust/lib/python3.8/site-packages/scipy/io/netcdf.py", line 968, in __getitem__
    return self.data[index]
TypeError: slice indices must be integers or None or have an __index__ method

If I discard the --stride option everything works.
Thanks!

selection of heavy atoms of a ligand

Hi,

I want to cluster the five docking poses of a ligand based on the RMSD of only heavy atoms of the ligand. What should I select for -sr?
Please note that it should calculate the RMSDs for only heavy atoms of the ligand and I wouldn't align the ligand on the protein. I think I should select none for -sa?

Best,

ValueError: xyz must be shape (Any, 2108, 3) I need a help for this

Traceback (most recent call last):
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/mdtraj/core/trajectory.py", line 429, in load
t = loader(tmp_file, **kwargs)
File "mdtraj/formats/xtc/xtc.pyx", line 168, in mdtraj.formats.xtc.load_xtc
File "mdtraj/formats/xtc/xtc.pyx", line 175, in mdtraj.formats.xtc.load_xtc
File "mdtraj/formats/xtc/xtc.pyx", line 346, in mdtraj.formats.xtc.XTCTrajectoryFile.read_as_traj
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/mdtraj/core/trajectory.py", line 1226, in init
self.xyz = xyz
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/mdtraj/core/trajectory.py", line 939, in xyz
value = ensure_type(value, np.float32, 3, 'xyz', shape=shape,
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/mdtraj/utils/validation.py", line 148, in ensure_type
raise error
ValueError: xyz must be shape (Any, 2108, 3). You supplied (101, 49157, 3)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/es01/paratera/sce3820/.conda/envs/sce3820/bin/ttclust", line 10, in
sys.exit(main())
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/ttclust/ttclust.py", line 1329, in main
traj = Cluster_analysis_call(args)
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/ttclust/ttclust.py", line 1229, in Cluster_analysis_call
traj = md.load(trajfile,
File "/es01/paratera/sce3820/.conda/envs/sce3820/lib/python3.9/site-packages/mdtraj/core/trajectory.py", line 448, in load
raise ValueError('The topology and the trajectory files might not contain the same atoms\n'
ValueError: The topology and the trajectory files might not contain the same atoms
The input topology must contain all atoms even if you want to select a subset of them with atom_indices

Assigning x-axis (simulation time) in bar plots and generating distance matrix

Dear Tubiana,

Thank you for your amazing TTClust program.

First, I want to change the x-axis of the generated bar plots. It seems to count every 1000 frames as 1 ns, however, it may not be the case in my trajectory, so, how can I change it?

Second, the distance matrix is not generated with the other default figures, how can I generate it separately?

Thank you.

does water affect the clustering result?

Greetings,

I have a question about TTClust. i use this command to cluster my MDS results.

ttclust -f file.dcd -t file.psf

but I noticed that at the end it gave me this
<mdtraj.Trajectory with 250 frames, 174503 atoms, 54046 residues, and unitcells>

this number of atoms is the total number of atoms including water.

So, will this affect the results? and if it affects, how can I solve it?

Thanks

what is the unit of RMSD in the log file ?

in the log file produced from clustering, there are cutoff for clustering, RMSD between clusters, and average RMSD. Can you tell me what are the units for each one?

Replotting the graphs

Hi, is it possible to replot these files?
I want to better these graphs, specifically for publication purposes.
For example, the scaling of the y-axis is totally different from each other.
For comparison, this is not visually fair.
Could I replot them? Is it possible to get the raw data before the dendograms are plotted by the software?

Correct installation, but it does not start

Hi,

I have correctly installed TTClust under Mac OSX, but when I try to use the program I get the following error message:

Traceback (most recent call last):
  File "/Users/admin/anaconda2/bin/ttclust", line 7, in <module>
    from ttclust.ttclust import main
  File "/Users/admin/anaconda2/lib/python2.7/site-packages/ttclust/ttclust.py", line 22, in <module>
    sys.executable = shutil.which("pythonw")
AttributeError: 'module' object has no attribute 'which'

Can you help me? Thanks.

Clustering with only .pdb file (without .xtc file)

My simulation is done by openMM and hasn't generated the .xtc file. Only trajectory.pdb file is included. In this case, can I use this package for clustering? Thanks for your answer in advance.

ValueError: 'c' argument

Hi,

Do you know how I can fix the below error?

ttclust -f trajectory.xtc -t reference.pdb -sa none -cc 0.6 -sr all

....
AVERAGE RSMD BETWEEN CLUSTERS : 4.90
Traceback (most recent call last):
  File "/home/user/anaconda3/bin/ttclust", line 10, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/lib/python3.7/site-packages/ttclust/ttclust.py", line 1412, in main
    traj = Cluster_analysis_call(args)
  File "/home/user/anaconda3/lib/python3.7/site-packages/ttclust/ttclust.py", line 1365, in Cluster_analysis_call
    plot_2D_distance_projection(RMSD_matrix, clusters_list, colors_list, logname)
  File "/home/user/anaconda3/lib/python3.7/site-packages/ttclust/ttclust.py", line 1147, in plot_2D_distance_projection
    scatter = ax.scatter(x, y, s=radii, c=colors, alpha=0.5)
  File "/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py", line 1438, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 411, in wrapper
    return func(*inner_args, **inner_kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4453, in scatter
    get_next_color_func=self._get_patches_for_fill.get_next_color)
  File "/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4307, in _parse_scatter_color_args
    raise invalid_shape_exception(len(colors), xsize)
ValueError: 'c' argument has 6 elements, which is inconsistent with 'x' and 'y' with size 5.

problem in clustering MDS when the protein moves out from one edge of water box while applying periodic boundary conditions

Hi!
I have clustered an MDS trajectory in which the protein moves out from the water box from one side and enters from the other side. TTClust is not able to cluster it well because the representative frames are like this image

Is there a solution to this problem?

Thanks

Which version?

I downloaded the last versions of TTClust V 4.9.0 and when I checked the log file produced from the clustering, I found that the version is 4.8.4

which version is the correct one?

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\CLOUDS\\GoogleDrive\\WORK\\Perso\\DEV\\ttclust\\examples'

I have successfully installed ttclust on my Ubuntu system by conda with the command (conda install -c tubiana -c conda-forge ttclust). Now, I would like to install the GUI version of ttclust. For this purpose, the ttclustGUI.py has been placed in the same directory of ttclust.py (miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust). When running the command python ttclustGUI.py, the terminal reported the following error, could you tell me how to deal with it?

(ttclust) twg@TWG:/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust$ python ttclustGUI.py
Traceback (most recent call last):
File "/home/twg/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust/ttclustGUI.py", line 170, in
main()
File "/home/twg/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust/ttclustGUI.py", line 122, in main
(ttclust) twg@TWG:/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust$ python ttclustGUI.py
Traceback (most recent call last):
File "/home/twg/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust/ttclustGUI.py", line 170, in
main()
File "/home/twg/miniconda3/envs/ttclust/lib/python3.10/site-packages/ttclust/ttclustGUI.py", line 122, in main
os.chdir("D:\CLOUDS\GoogleDrive\WORK\Perso\DEV\ttclust\examples")
FileNotFoundError: [Errno 2] No such file or directory: 'D:\CLOUDS\GoogleDrive\WORK\Perso\DEV\ttclust\examples'

Clustering with rmsd calculated for heavy (noh) atoms

Can you provide the possible arguments for -sr? as I want to change it from the (default=backbone) to heavy atoms of both backbone and side chains.

Thank you.

error while doing the clustering

i installed ttclust using anaconda and ran it used this command:

command used: ttclust -f "step5_production_tri_mut.dcd" -t "step3_charmm2namd.pdb"

but it gave me this error

output:

****************** TTCLUST 4.8.2 *********************

======= TRAJECTORY READING =======
====== Clustering ========
creating distance matrix
NOTE : Extraction of subtrajectory for time optimisation
Interactive mode disabled. I will use the saved matrix

Distance Matrix File Loaded!
Matrix shape: (250, 250)
Scipy linkage in progress. Please wait. It can be long
ERROR : method name given for clustering didn't recognized
: methods are : single; complete; average; weighted; centroid; ward.
: check https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.
cluster.hierarchy.linkage.html for more info

RMSD as an input

Hi Thibault,

Is there an option to give RMSD as an input (e.g., I want an RMSD cutoff of 3 angstrom) or do I have to do it manually by changing the cutoff or number of clusters?

Thanks a lot for your help.

With best regards,
Sameer

Saving all frames belonging to a cluster

Hi Thibault,

Is there an option to save all the frames belonging to each cluster?

Thanks a lot for your help.

Sincerely,
Sameer

Clustering multiple trajectories

Hi, I have several trajectories on which I want to perform clustering on. Does TTclust do clustering on multiple trajectories? It is not possible to concatenate them into one and hence the question.
Many thanks.

Clustering of host guest systems

Hi there,

I want to align my trajectory along atoms from, serial 1 to 148.
I tried following commands -
ttclust -f 200ns.xtc -t Umb_1.pdb -sa "index 1 to 148"
tclust -f 200ns.xtc -t Umb_1.pdb -sa "index <= 147"

Error-

NOTE : Extraction of subtrajectory for time optimisation
ERROR : there is an error with your selection string
SELECTION STRING :
backbone

Kindly suggest the correct syntax for sa.
Thank you for your help.

parameter -s error

once i use -s 10,it reply :
Traceback (most recent call last):
File "/home3/feijunwen/anaconda3/bin/ttclust", line 10, in
sys.exit(main())
File "/home3/feijunwen/anaconda3/lib/python3.5/site-packages/ttclust/ttclust.py", line 1268, in main
traj = Cluster_analysis_call(args)
File "/home3/feijunwen/anaconda3/lib/python3.5/site-packages/ttclust/ttclust.py", line 1168, in Cluster_analysis_call
top=topfile, stride=args["stride"])
File "/home3/feijunwen/anaconda3/lib/python3.5/site-packages/mdtraj/core/trajectory.py", line 430, in load
value = loader(filename, **kwargs)
File "mdtraj/formats/dcd/dcd.pyx", line 134, in dcd.load_dcd (mdtraj/formats/dcd/dcd.c:2445)
File "mdtraj/formats/dcd/dcd.pyx", line 140, in dcd.load_dcd (mdtraj/formats/dcd/dcd.c:2399)
File "mdtraj/formats/dcd/dcd.pyx", line 382, in dcd.DCDTrajectoryFile.read_as_traj (mdtraj/formats/dcd/dcd.c:4638)
File "mdtraj/formats/dcd/dcd.pyx", line 457, in dcd.DCDTrajectoryFile.read (mdtraj/formats/dcd/dcd.c:5596)
TypeError: an integer is required

without -s parameter, it could run normally, could you please give me any suggestions?

Using TTClust on Google Colab

Hi,
Can TTClust be used on Google Colab? if so how can I install it and get it working? I tried installing condacolab, and then using the following command : !mamba install -c tubiana -c conda-forge ttclust to install TTclust, but after that when I tried to run a TTclust command , it just throws errors, so I wonder if there's a method I can use it on Colab?

Thanks in advance

ValueError: invalid literal for int() with base 10: 'auto'

Hello,
I was trying to cluster two trajectories using the command:
ttclust -f 1_SS_Unglycosylated-T1/step7_production-500ns.xtc 1_SS_Unglycosylated-T2/step7_production-500ns.xtc -t top.pdb -st "protein" -sr "backbone"

I got the following error:
`********************************************************
****************** TTCLUST 4.10.4 *********************

NOTE : Per default the clustering is made on the BACKBONE of a PROTEIN
PLEASE READ THE DOCUMENTATION AT https://www.github.com/tubiana/TTClust FOR PROPER USAGE

======= TRAJECTORY READING =======

Several trajectories given. Will concatenate them.
======= EXTRACTION OF SELECTED ATOMS =======
NOTE : 'st' argument given. I will save the subtrajectory in clustering/clustering.xtc and topology file as clustering/clustering.pdb
====== Clustering ========
creating distance matrix
NOTE : Extraction of subtrajectory for time optimisation
|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| Time: 0:08:03 |<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<|
Calculation ended - saving distance matrix
Saving distance matrix : backbone.npy
Traceback (most recent call last):
File "/home/ks1/mambaforge/bin/ttclust", line 8, in
sys.exit(main())
File "/home/ks1/mambaforge/lib/python3.10/site-packages/ttclust/ttclust.py", line 1338, in main
traj = Cluster_analysis_call(args)
File "/home/ks1/mambaforge/lib/python3.10/site-packages/ttclust/ttclust.py", line 1266, in Cluster_analysis_call
distances, clusters_labels, linkage, cutoff = create_cluster_table(traj, args)
File "/home/ks1/mambaforge/lib/python3.10/site-packages/ttclust/ttclust.py", line 712, in create_cluster_table
ncluster = int(args["ngroup"])
ValueError: invalid literal for int() with base 10: 'auto'`

P.S I am using the conda version of TTClust

File name size

OSError: [Errno 36] File name too long: '(residue_340_341_342_343_344_345_346_347_348_372_435_439_440_441_442_443_444_445_446_448_449_450_452_481_516_517_519_520_521_523_524_525_526_527_528_529_530_531_532_594_596_597_598_599_602_603_605_606_609_610_611_612_613_615_616_617_618_619_620_621_622_623)_and_backbone.npy'

Handling PBC

When working with periodic boundary conditions it often happens that a molecule may pass from one side of the box to the other.
When using MDTraj you can use the command image_molecules to make sure that some selected molecules stay within the box throughout the trajectory.
Does TTClust offer the possibility of performing such a step or does it in any way account for PBC during RMSD calculations?
It's very easy to just preprocess the trajectory, but it would ease the process to have the software directly perform this action.

The representative frame

Hello Sir

I hope you're well

I have a trajectory from GROMACS for a single protein. When I did clustering based on specific range of residues, I noticed that the representative frame of Cluster 1 is actually a frame within the cluster 2.
Is it possible?

Thnx

can i use a trajectory and align it to the same protein structure saved in another pdb file?

assuming that i have 3 xtc files with N frames and i want to align all of them to the same protein structure. however this structure is saved in another pdb file. can i do this?
will it work if i added this structure to one of the xtc files as the first frame then used:

ttclust -f xtc1_with ref xtc2 xtc3 -t top

ttclust.py:726: ClusterWarning: scipy.cluster: The symmetric non-negative hollow observation matrix looks suspiciously like an uncondensed distance matrix

Hello Sir,
I hope you're well
Thank you for such a nice tool for clustering traj frames.
While using this tool using the following command

ttclust -f md_0_200ns_noPBC_noWAT_noION.xtc -t first_frame.pdb -st "protein" -sr "name CA" -m average

I am getting this warning