lpdi-epfl / masif Goto Github PK

View Code? Open in Web Editor NEW

571.0 23.0 151.0 93.66 MB

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.

License: Apache License 2.0

Python 51.70% Shell 5.59% Jupyter Notebook 42.67% TeX 0.05%

protein-surface molecular-surface geometric-deep-learning

masif's People

Contributors

Stargazers

Watchers

Forkers

rdk boyuezhong nannemdp alchemistwu oliverbscott hongtengxu caddtips anu-bioinfo jorenretel hmms117 jiuxuan kelly1210 highdxy pincher-chen pablogainza sailfish009 yigecici shulp2211 jhmlam zerodesigner shekshaa tianbiao-yang lipi12q andaldanar breinert weaponeer somous-jhzhao apradhan biochunan rtviii mirjunaid26 rsundar dominiquesydow jhird stebliankin freyrs sazan-mahbub lfkrapp jiyanbio edraizen pyeongkim lifeixianshen biocodings sundw-818 klklklzzd heiidii wingring47 jiawei-huang z-linlinlin anthomarchand ramanarayan86 seerbio hyunp2 ai-and-ml frankji raoufkeskes al2na danidelhoyo baichuan1997 shunsunsun nottooxabi rgago-bio hui2000ji seral17 dou-du chenshengsgithub wayoffthecoast dot23 alecweech golovin-andrey mseok lzx325 gabi-a tikeng maduprey gooseman1721 max1461 xiaoleipei marco-peg wyaow7455608 wemd sakuraibit danielmonyak jeteveux yunda-si hwang-happy bgshin qiyii ronaldo-longlong forfreedomgit otaviaw xiaosh9527 rochoa85 dongcf darrengao628 hahajinjiaodawang dashwood99 daeseoklee m-hakmi drunknorsesailoraau

masif's Issues

The provide docker file do not support tensorflow GPU

Hi,
Thanks for sharing this great work. But the docker file you provided seems do not have the GPU included.
Do you know how could I include GPU and TensorFlow-GPU into the docker image?
Best regards!

Reduce versus Open Babel for Protonation

I'm curious to know why Reduce was chosen over Open Babel for protonation. Would Open Babel work as well?

pdl1_benchmark do not predict interface between 4ZQK_A and 4ZQK_B

Hi,

I tried to run pdl1_benchmark with pdl1_benchmark_nn.py, but the 4ZQK_A and 4ZQK_B are not in the list of top_scores.
They matched only when I lower iface_cutoff to 0.5, and only at the single point
near_points: [1820]
iface: [0.5888073]
diff: [1.696772]

Is the model provided in masif/data/masif_pdl1_benchmark/nn_models/
the same as described in the paper, or for the reproduction of the results I should train the model myself?

Thank you so much!

subprocess.py error while running data_prepare_one

Hi,
I have tried to run data_prepare_one but I get the following error and the program stops within ipdb.

MLC02GC4Z3Q05P:masif_site 4464689$ ./data_prepare_one.sh 1AKJ_AB_DE
:/Users/4464689/Downloads/masif/source/
Structure exists: '/var/folders/mn/7xx5f6314c1glph1lkt4k_9m002gyq/T/pdb1akj.ent' 
--Call--
>    /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py(875)__del__()
   873             self.wait()
   874 
--> 875     def __del__(self, _maxsize=sys.maxsize, _warn=warnings.warn):
    876         if not self._child_created:
    877             # We didn't get to successfully create a child process.

ipdb>

I appreciate any input,
Aleks

Masif_ppi_search output

Hi, I have run masif_ppi_search on the PDB 6M3M_A and have gotten two output files labeled:
p1_desc_flipped.npy and p1_desc_straight.npy
I cannot find documented what these two files mean and how they relate to finding binders of the input PDB. Any help would be appreciated, thanks!

Failed to place the graph without changing the devices of some resources

Hi.

I tried to train MaSIF-site and got the following. It is still training, but slowly.

2021-11-10 14:06:26.243439: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without   changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Identity: CPU XLA_CPU XLA_GPU 
Assign: CPU 
Const: CPU XLA_CPU XLA_GPU 
ApplyAdam: CPU 
VariableV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
fully_connected_3/biases/Initializer/zeros (Const) 
fully_connected_3/biases (VariableV2) /device:GPU:1
fully_connected_3/biases/Assign (Assign) /device:GPU:1
fully_connected_3/biases/read (Identity) /device:GPU:1
fully_connected_3/biases/Adam/Initializer/zeros (Const) /device:GPU:1
fully_connected_3/biases/Adam (VariableV2) /device:GPU:1
fully_connected_3/biases/Adam/Assign (Assign) /device:GPU:1
fully_connected_3/biases/Adam/read (Identity) /device:GPU:1
fully_connected_3/biases/Adam_1/Initializer/zeros (Const) /device:GPU:1
fully_connected_3/biases/Adam_1 (VariableV2) /device:GPU:1
fully_connected_3/biases/Adam_1/Assign (Assign) /device:GPU:1
fully_connected_3/biases/Adam_1/read (Identity) /device:GPU:1
Adam/update_fully_connected_3/biases/ApplyAdam (ApplyAdam) /device:GPU:1
save/Assign_62 (Assign) /device:GPU:1
save/Assign_63 (Assign) /device:GPU:1
save/Assign_64 (Assign) /device:GPU:1

docker run masif_ligand/data_prepare_one.sh meets error OSError: File not found: data_preparation/01-benchmark_surfaces//4ZQK_A.ply

i tried the script in docker, ./masif_ligand/data_prepare_one.sh 4ZQK_A
and gets the following error
Downloading PDB structure '4ZQK'...
Traceback (most recent call last):
File "/masif/source//data_preparation/00b-generate_assembly.py", line 3, in
from SBI.structure import PDB
ImportError: No module named SBI.structure
Traceback (most recent call last):
File "/masif/source//data_preparation/00c-save_ligand_coords.py", line 2, in
import numpy as np
ImportError: No module named numpy
Traceback (most recent call last):
File "/masif/source//data_preparation/01-pdb_extract_and_triangulate.py", line 48, in
extractPDB(pdb_filename, out_filename1+".pdb", chain_ids1)
File "/masif/source/input_output/extractPDB.py", line 21, in extractPDB
model = Selection.unfold_entities(struct, "M")[0]
IndexError: list index out of range
4ZQK_A
Reading data from input ply surface files.
Traceback (most recent call last):
File "/masif/source//data_preparation/04-masif_precompute.py", line 74, in
input_feat[pid], rho[pid], theta[pid], mask[pid], neigh_indices[pid], iface_labels[pid], verts[pid] = read_data_from_surface(ply_file[pid], params)
File "/masif/source/masif_modules/read_data_from_surface.py", line 23, in read_data_from_surface
mesh = pymesh.load_mesh(ply_fn)
File "/usr/local/lib/python3.6/site-packages/pymesh/meshio.py", line 21, in load_mesh
raise IOError("File not found: {}".format(filename));
OSError: File not found: data_preparation/01-benchmark_surfaces//4ZQK_A.ply

can anyone tell me where i got wrong, thx

How can i docking two protein monomer by masif-search?

I want to docking two proteins.
I just have two momomer,not Homopolymer or Heteromer.

my questions of masif-search pipeline

prepare data for ppi search:when i run data_prepare with PDBID_CHAIN,i can't get the p1_sc_labels.npyfile,but if i run it with PDBID_CHAIN1_CHAIN2,i got it.so i copy them:

cd /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation
cp 1A2K_C_A/p1_sc_labels.npy  1A2K_C/p1_sc_labels.npy
cp 1A2K_C_A/p2_sc_labels.npy  1A2K_A/p1_sc_labels.npy
cp 5JYL_A_B/p1_sc_labels.npy 5JYL_A/p1_sc_labels.npy
cp 5JYL_A_B/p2_sc_labels.npy 5JYL_B/p1_sc_labels.npy

but i don't know whether it is right,it is right?

transform from RegistrationResult and PointCloud to pdb format:i get the mutidock result, but i don't know how to transform it to pdb format,has some tools to transform it.
I am novice for docking and masif-search, if this pipeline has same other errors and suggestion,please point it.I don't whether i do right.

pipeline

run masif-site

Firstly,i run masif-site ./data_prepare_one.sh 2MWS_B,and then I run ./predict_site.sh 2MWS_B to predict sites,and i got four folds

00-raw_pdbs  01-benchmark_pdbs  01-benchmark_surfaces  04a-precomputation_9A

it run very well.

prepare data for ppi search

And Next run ppi searchcd ../masif_ppi_search,it need format PDBID_CHAIN1_CHAIN2,but i just have two momomer,not Homopolymer or Heteromer.the command /masif/data/masif_ppi_search/data_prepare_one.sh 3AXY_Bdoesn't work.

and i down and extract pdb manually,move they into data_preparation/00-raw_pdbs/

root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/00-raw_pdbs/
1A2K.pdb  5JYL.pdb
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/01-benchmark_pdbs/
1A2K_A.pdb  1A2K_C.pdb  5JYL_A.pdb  5JYL_B.pdb

next i run:

masif_root=$(git rev-parse --show-toplevel)
masif_source=$masif_root/source/
export PYTHONPATH=$PYTHONPATH:$masif_source
PDB_ID='1A2K'
CHAIN1='C'
CHAIN2='A'
# Load your environment here.
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py $PDB_ID\_$CHAIN2
python $masif_source/data_preparation/04-masif_precompute.py masif_site $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/04-masif_precompute.py masif_site $PDB_ID\_$CHAIN2
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search $PDB_ID\_$CHAIN2

get the output

root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/04a-precomputation_9A/precomputation/1A2K_A/
p1_X.npy  p1_Z.npy             p1_input_feat.npy    p1_mask.npy            p1_theta_wrt_center.npy
p1_Y.npy  p1_iface_labels.npy  p1_list_indices.npy  p1_rho_wrt_center.npy
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/04a-precomputation_9A/precomputation/1A2K_C
p1_X.npy  p1_Z.npy             p1_input_feat.npy    p1_mask.npy            p1_theta_wrt_center.npy
p1_Y.npy  p1_iface_labels.npy  p1_list_indices.npy  p1_rho_wrt_center.npy

and then calculate the description

./compute_descriptors.sh $PDB_ID\_$CHAIN1
./compute_descriptors.sh $PDB_ID\_$CHAIN2

and get the output

root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_C
p1_desc_flipped.npy  p1_desc_straight.npy
root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_A
p1_desc_flipped.npy  p1_desc_straight.npy

it doesn't raise any Exception just some warning.
for another protein,wo also do so.

python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py 5JYL_A
python $masif_source/data_preparation/04-masif_precompute.py masif_site 5JYL_A
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search 5JYL_A
./compute_descriptors.sh 5JYL_A

python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py 5JYL_B
python $masif_source/data_preparation/04-masif_precompute.py masif_site 5JYL_B
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search 5JYL_B
./compute_descriptors.sh 5JYL_B

nohup sh ./data_prepare_one.sh  5JYL_A_B &
nohup sh ./data_prepare_one.sh 1A2K_C_A &
nohup sh ./compute_descriptors.sh 5JYL_A_B &
nohup sh ./compute_descriptors.sh 1A2K_C_A &

i see the py file/masif/source/masif_ppi_search/second_stage_alignment_nn.py
it need some necessary setting variables.

masif_opts = {}
masif_opts["pdb_chain_dir"] = "data_preparation/01-benchmark_pdbs/"#* 每个蛋白对应的链
masif_opts["ply_chain_dir"] = "data_preparation/01-benchmark_surfaces/"#* 蛋白对应的表面 用site计算出来的
masif_opts["ppi_search"]={}
masif_opts["ppi_search"][
    "masif_precomputation_dir"
] = "data_preparation/04b-precomputation_12A/precomputation/"
masif_opts["ppi_search"]["desc_dir"] = "descriptors/sc05/all_feat/"#*  这里通过compute_description.sh计算得出
masif_opts["ppi_search"]["gif_descriptors_out"] = "gif_descriptors/"#方法gif才需要，masif不需要 空的目录文件夹
masif_opts["site"]={}
masif_opts["site"][
    "masif_precomputation_dir"
] = "data_preparation/04a-precomputation_9A/precomputation/"

it is global variable.
define my protein list:

root@eb92233498e0:/masif/data/masif_ppi_search# for i in $(ls data_preparation/01-benchmark_pdbs);do echo ${i/.pdb/}; done > lists/mylist.txt
root@eb92233498e0:/masif/data/masif_ppi_search# cat lists/mylist.txt
1A2K_A
1A2K_C
5JYL_A
5JYL_B

import os
import numpy as np
# Location of surface (ply) files. 
data_dir='/masif/data/masif_ppi_search'
surf_dir = os.path.join(data_dir, masif_opts["ply_chain_dir"])
desc_dir = os.path.join(data_dir, masif_opts["ppi_search"]["desc_dir"])
pdb_dir = os.path.join(data_dir, masif_opts["pdb_chain_dir"])
precomp_dir = os.path.join(
    data_dir, masif_opts["ppi_search"]["masif_precomputation_dir"]
)
precomp_dir_9A = os.path.join(
    data_dir, masif_opts["site"]["masif_precomputation_dir"]
)
benchmark_list=os.path.join(data_dir, 'lists','mylist.txt')
pdb_list = open(benchmark_list).readlines()[0:100]
pdb_list = [x.rstrip() for x in pdb_list]
# Read all surfaces.
all_pc = []
all_desc = []

rand_list = np.copy(pdb_list)
#np.random.seed(0)
np.random.shuffle(rand_list)
rand_list = rand_list[0:100]

p2_descriptors_straight = []
p2_point_clouds = []
p2_patch_coords = []
p2_names = []

lack file p1_sc_labels.npy,

root@eb92233498e0:/masif/source/masif_ppi_search# ls /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation/5JYL_B/
p1_X.npy  p1_Z.npy             p1_input_feat.npy    p1_mask.npy            p1_theta_wrt_center.npy
p1_Y.npy  p1_iface_labels.npy  p1_list_indices.npy  p1_rho_wrt_center.npy
root@eb92233498e0:/masif/source/masif_ppi_search# ls /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation/5JYL_A_B/
p1_X.npy             p1_input_feat.npy      p1_sc_labels.npy         p2_Z.npy             p2_mask.npy
p1_Y.npy             p1_list_indices.npy    p1_theta_wrt_center.npy  p2_iface_labels.npy  p2_rho_wrt_center.npy
p1_Z.npy             p1_mask.npy            p2_X.npy                 p2_input_feat.npy    p2_sc_labels.npy
p1_iface_labels.npy  p1_rho_wrt_center.npy  p2_Y.npy                 p2_list_indices.npy  p2_theta_wrt_center.npy

i move file but i don't know whether it is right

cd /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation
cp 1A2K_C_A/p1_sc_labels.npy  1A2K_C/p1_sc_labels.npy
cp 1A2K_C_A/p2_sc_labels.npy  1A2K_A/p1_sc_labels.npy
cp 5JYL_A_B/p1_sc_labels.npy 5JYL_A/p1_sc_labels.npy
cp 5JYL_A_B/p2_sc_labels.npy 5JYL_B/p1_sc_labels.npy

move model to workplace

cd /masif/source/masif_ppi_search/
cp -r /masif/comparison/masif_ppi_search/masif_descriptors_nn/models .

run docking by masif-search

cd /masif/source/masif_ppi_search;
touch a new python scripttouch docking.py

import scipy.sparse as spio
import copy
from Bio.PDB import *
from scipy.spatial import cKDTree
from transformation_training_data.score_nn import ScoreNN
from alignment_utils_masif_search import compute_nn_score, rand_rotation_matrix, \
        get_center_and_random_rotate, get_patch_geo, multidock, test_alignments, \
       subsample_patch_coords
import time
import sklearn.metrics
masif_opts = {}
masif_opts["pdb_chain_dir"] = "data_preparation/01-benchmark_pdbs/"#* 每个蛋白对应的链
masif_opts["ply_chain_dir"] = "data_preparation/01-benchmark_surfaces/"#* 蛋白对应的表面 用site计算出来的
masif_opts["ppi_search"]={}
masif_opts["ppi_search"][
    "masif_precomputation_dir"
] = "data_preparation/04b-precomputation_12A/precomputation/"
masif_opts["ppi_search"]["desc_dir"] = "descriptors/sc05/all_feat/"#*  这里通过compute_description.sh计算得出
masif_opts["ppi_search"]["gif_descriptors_out"] = "gif_descriptors/"#方法gif才需要，masif不需要 空的目录文件夹
masif_opts["site"]={}
masif_opts["site"][
    "masif_precomputation_dir"
] = "data_preparation/04a-precomputation_9A/precomputation/"
nn_model = ScoreNN()
import os
import numpy as np
data_dir='/masif/data/masif_ppi_search'
surf_dir = os.path.join(data_dir, masif_opts["ply_chain_dir"])
desc_dir = os.path.join(data_dir, masif_opts["ppi_search"]["desc_dir"])
pdb_dir = os.path.join(data_dir, masif_opts["pdb_chain_dir"])
precomp_dir = os.path.join(
    data_dir, masif_opts["ppi_search"]["masif_precomputation_dir"]
)
precomp_dir_9A = os.path.join(
    data_dir, masif_opts["site"]["masif_precomputation_dir"]
)
benchmark_list=os.path.join(data_dir, 'lists','mylist.txt')
pdb_list = open(benchmark_list).readlines()[0:100]
pdb_list = [x.rstrip() for x in pdb_list]
# Read all surfaces.
all_pc = []
all_desc = []

rand_list = np.copy(pdb_list)
#np.random.seed(0)
np.random.shuffle(rand_list)
rand_list = rand_list[0:100]

p2_descriptors_straight = []
p2_point_clouds = []
p2_patch_coords = []
p2_names = []

from geometry.open3d_import import *
for i, pdb in enumerate(rand_list):
    print("Loading patch coordinates for {}".format(pdb))
    pdb_id = pdb.split("_")[0]
    chains = pdb.split("_")[1]
    # Descriptors for global matching.
    p2_descriptors_straight.append(
        np.load(os.path.join(desc_dir, pdb, "p1_desc_straight.npy"))
    )
    p2_point_clouds.append(
        read_point_cloud(
            os.path.join(surf_dir, "{}.ply".format(pdb_id + "_" + chains))
        )
    )
    pc = subsample_patch_coords(pdb, "p1", precomp_dir_9A)
    p2_patch_coords.append(pc)
    p2_names.append(pdb)

all_positive_scores = []
all_positive_rmsd = []
all_negative_scores = []
# Match all descriptors.
count_found = 0
all_rankings_desc = []


# Now go through each target (p1 in every case) and dock each 'decoy' binder to it. 
# The target will have flipped (inverted) descriptors.
K=30
ransac_iter=100
ttf=[]
for target_ix, target_pdb in enumerate(rand_list):
    target_pdb_id = target_pdb.split("_")[0]
    chains = target_pdb.split("_")[1]
    # Load target descriptors for global matching.
    target_desc = np.load(os.path.join(desc_dir, target_pdb, "p1_desc_flipped.npy"))
    # Load target point cloud
    target_pc = os.path.join(surf_dir, "{}.ply".format(target_pdb_id + "_" + chains))
    target_pcd = read_point_cloud(target_pc)
    # Read the point with the highest shape compl.
    sc_labels = np.load(os.path.join(precomp_dir, target_pdb, "p1_sc_labels.npy"))
    center_point = np.argmax(np.median(np.nan_to_num(sc_labels[0]), axis=1))
    # Go through each source descriptor, find the top descriptors, store id+pdb
    num_negs = 0
    all_desc_dists = []
    all_pdb_id = []
    all_vix = []
    gt_dists = []
    # This is where the desriptors are actually compared (stage 1 of the MaSIF-search protocol)
    for source_ix, source_pdb in enumerate(rand_list):
        source_desc = p2_descriptors_straight[source_ix]
        desc_dists = np.linalg.norm(source_desc - target_desc[center_point], axis=1)
        all_desc_dists.append(desc_dists)
        all_pdb_id.append([source_pdb] * len(desc_dists))
        all_vix.append(np.arange(len(desc_dists)))
        if source_pdb == target_pdb:
            source_pcd = p2_point_clouds[source_ix]
            eucl_dists = np.linalg.norm(
                np.asarray(source_pcd.points)
                - np.asarray(target_pcd.points)[center_point, :],
                axis=1,
            )
            eucl_closest = np.argsort(eucl_dists)
            gt_dists = desc_dists[eucl_closest[0:50]]
            gt_count = len(source_desc)
    all_desc_dists = np.concatenate(all_desc_dists, axis=0)
    all_pdb_id = np.concatenate(all_pdb_id, axis=0)
    all_vix = np.concatenate(all_vix, axis=0)
    ranking = np.argsort(all_desc_dists)
    # Load target geodesic distances.
    target_coord = subsample_patch_coords(target_pdb, "p1", precomp_dir_9A, [center_point])
    # Get the geodesic patch and descriptor patch for the target.
    target_patch, target_patch_descs = get_patch_geo(
        target_pcd, target_coord, center_point, target_desc, flip=True
    )
    # Make a ckdtree with the target.
    target_ckdtree = cKDTree(target_patch.points)
    ## Load the structures of the target and the source (to get the ground truth).
    parser = PDBParser()
    target_struct = parser.get_structure(
        "{}_{}".format(target_pdb_id, chains[0]),
        os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains)),
    )
    #gt_source_struct = parser.get_structure(
    #    "{}_{}".format(target_pdb_id, chains[1]),
    #    os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains[1])),
    #)
    # Get coordinates of atoms for the ground truth and target.
    target_atom_coords = [atom.get_coord() for atom in target_struct.get_atoms()]
    target_ca_coords = [
        atom.get_coord() for atom in target_struct.get_atoms() if atom.get_id() == "CA"
    ]
    target_atom_coord_pcd = PointCloud()
    target_ca_coord_pcd = PointCloud()
    target_atom_coord_pcd.points = Vector3dVector(np.array(target_atom_coords))
    target_ca_coord_pcd.points = Vector3dVector(np.array(target_ca_coords))
    target_atom_pcd_tree = KDTreeFlann(target_atom_coord_pcd)
    target_ca_pcd_tree = KDTreeFlann(target_ca_coord_pcd)
    found = False
    myrank_desc = float("inf")
    chosen_top = ranking[0:K]
    pos_scores = []
    pos_rmsd = []
    neg_scores = []
    # This is where the matched descriptors are actually aligned.
    for source_ix, source_pdb in enumerate(rand_list):
        viii = chosen_top[np.where(all_pdb_id[chosen_top] == source_pdb)[0]]
        source_vix = all_vix[viii]
        if len(source_vix) == 0:
            continue
        source_desc = p2_descriptors_straight[source_ix]
        source_pcd = copy.deepcopy(p2_point_clouds[source_ix])
        source_coords = p2_patch_coords[source_ix]
        # Randomly rotate and translate.
        random_transformation = get_center_and_random_rotate(source_pcd)
        source_pcd.transform(random_transformation)
        # Dock and score each matched patch. 
        #print({'source_pcd':source_pcd,'source_coords':source_coords,'source_desc':source_desc,'source_vix':source_vix\
        #,'target_patch':target_patch,'target_patch_descs':target_patch_descs,'target_ckdtree':target_ckdtree,'ransac_iter':ransac_iter})
        if source_pdb!=target_pdb:#same structure does not need docking
            all_results, all_source_patch, all_source_scores = multidock(
                source_pcd,
                source_coords,
                source_desc,
                source_vix,
                target_patch,
                target_patch_descs,
                target_ckdtree,
                nn_model, 
                ransac_iter=ransac_iter
            )
            res={'target_pdb':target_pdb,'source_pdb':source_pdb,'all_results':all_results,\
            'all_source_patch':all_source_patch,'all_source_scores':all_source_scores}
            ttf.append(res)
            #ttf 返回的值代表好几种对接方式，通过指纹搜索来确定种类

see ttf[0]

>>> ttf[0]
{'target_pdb': '1A2K_C', 'source_pdb': '5JYL_B', 'all_results': [registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=2.600000e-01, inlier_rmse=6.074436e-01, and correspondence_set size of 26
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result.], 'all_source_patch': [geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points.], 'all_source_scores': [0.0014200918, 0.001413941, 2.9317687e-05, 0.001454716]}

all_results[0]is <open3d.open3d.registration.RegistrationResult>object,fitnessmeans correspondence point,if it eq 0,
the structure is bad.
but how can i transform from RegistrationResult and PointCloud to pdb format.I want to get the dockered structure.
I am Novice for docking,please help me.
Thank you.

Can not Download reduce software from your provide link

The complex surface is generated in a confusing way.

When computing the iface for parts of the protein complex, which interact with each other, the complex surface should only contain the chains of interest.

For example, Complex_id1 has 5 chains.

We are looking at the interaction between id1_AB and id1_CD parts, however the surface for the Complex_id1 still contains a fifth chain (E). This way we will wrongly label some parts of the id1_AB and id1_BC surfaces as interfaces because they are being covered by an extra chain.

I am assuming we only want to look at interfaces between the parts we know that interact, so I am assuming that using the full complex surface is not the right way to label the interfaces.

masif for small molecules

Is it possible to run masif on small molecule data?

Meaning of mask in precomputed numpy arrays

Hi,

I am exploring the numpy arrays produced by the precomputation script:
$masif_source/data_preparation/04-masif_precompute.py masif_ppi_search

Could you please explain the meaning of p1_mask.npy and p2_mask.npy?
I understand that the mask is for rho and theta, but don't understand the meaning. In which cases the value is zero?
Why do we skip some of the neighbors in the patch?

Thank you so much for all your effort!

Error while running on a predicted structure

Hello,

I encountered an error while running data_prepare_one.sh on a predicted protein structure.

The script ran fine on the download pdb file:

Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/4ZQK.pdb 4ZQK_A

Running masif site on data_preparation/00-raw_pdbs/4ZQK.pdb
cp: 'data_preparation/00-raw_pdbs/4ZQK.pdb' and 'data_preparation/00-raw_pdbs/4ZQK.pdb' are the same file
Empty
Removing degenerated triangles
Removing degenerated triangles
4ZQK_A
Reading data from input ply surface files.
Dijkstra took 3.65s
Only MDS time: 15.50s
Full loop time: 24.70s
MDS took 24.70s

It also ran fine on a predicted structure I downloaded online:

Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/AvrPita.pdb AvrPita_A

Running masif site on data_preparation/00-raw_pdbs/AvrPita.pdb
cp: 'data_preparation/00-raw_pdbs/AvrPita.pdb' and 'data_preparation/00-raw_pdbs/AvrPita.pdb' are the same file
Empty
Removing degenerated triangles
Removing degenerated triangles
AvrPita_A
Reading data from input ply surface files.
Dijkstra took 7.01s
Only MDS time: 29.31s
Full loop time: 46.89s
MDS took 46.89s

However, for this structure predicted on my local machine,

Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb MGG-01993-ITASSER_A

Running masif site on data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb
cp: 'data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb' and 'data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb' are the same file
--Call--

/usr/local/lib/python3.6/subprocess.py(758)del()
756 self.wait()
757
--> 758 def del(self, _maxsize=sys.maxsize, _warn=warnings.warn):
759 if not self._child_created:
760 # We didn't get to successfully create a child process.

ipdb>

I wasn't so sure what was causing this error.... Thank you in advance!

>head AvrPita.pdb
ATOM 1 N MET A 1 50.404 53.465 89.261 1.00 13.70
ATOM 2 CA MET A 1 49.060 53.953 88.970 1.00 13.70
ATOM 3 HA MET A 1 48.349 53.550 89.692 1.00 13.70
ATOM 4 CB MET A 1 49.107 55.497 89.071 1.00 13.70
ATOM 5 HB1 MET A 1 49.608 55.899 88.190 1.00 13.70
ATOM 6 HB2 MET A 1 49.694 55.787 89.940 1.00 13.70
ATOM 7 CG MET A 1 47.733 56.158 89.212 1.00 13.70
ATOM 8 HG1 MET A 1 47.334 55.866 90.163 1.00 13.70
ATOM 9 HG2 MET A 1 47.047 55.782 88.460 1.00 13.70
ATOM 10 SD MET A 1 47.708 57.971 89.163 1.00 13.70

>head MGG-011730-ITASSER.pdb
ATOM 1 H LEU 1 -30.724 18.366 -0.112 1.00 4.24
ATOM 2 N LEU 1 -30.717 19.328 -0.332 1.00 4.24
ATOM 3 CA LEU 1 -31.127 20.240 0.732 1.00 4.24
ATOM 4 C LEU 1 -30.216 20.107 1.947 1.00 4.24
ATOM 5 O LEU 1 -29.360 19.226 1.982 1.00 4.24
ATOM 6 CB LEU 1 -32.579 19.966 1.133 1.00 4.24
ATOM 7 CG LEU 1 -33.577 20.301 0.018 1.00 4.24
ATOM 8 CD1 LEU 1 -34.987 19.880 0.429 1.00 4.24
ATOM 9 CD2 LEU 1 -33.575 21.804 -0.259 1.00 4.24
ATOM 10 N PRO 2 -30.291 20.947 3.077 1.00 2.47

> tail AvrPita.pdb
ATOM 3585 H CYS A 224 76.364 66.692 47.328 1.00 5.19
ATOM 3586 CA CYS A 224 74.809 66.366 45.907 1.00 5.19
ATOM 3587 HA CYS A 224 74.247 65.434 45.807 1.00 5.19
ATOM 3588 CB CYS A 224 73.982 67.329 46.770 1.00 5.19
ATOM 3589 HB1 CYS A 224 73.157 67.727 46.174 1.00 5.19
ATOM 3590 HB2 CYS A 224 74.606 68.173 47.067 1.00 5.19
ATOM 3591 SG CYS A 224 73.262 66.560 48.246 1.00 5.19
ATOM 3592 C CYS A 224 75.018 66.922 44.489 1.00 5.19
ATOM 3593 O CYS A 224 76.123 67.110 43.980 1.00 5.19
TER

>tail MGG-011730-ITASSER.pdb
ATOM 661 OG1 THR 82 2.127 -6.129 7.772 1.00 2.03
ATOM 662 CG2 THR 82 0.326 -5.954 6.201 1.00 2.03
ATOM 663 N PRO 83 0.144 -8.497 9.877 1.00 3.59
ATOM 664 CA PRO 83 0.505 -9.336 10.943 1.00 3.59
ATOM 665 C PRO 83 0.584 -10.638 10.319 1.00 3.59
ATOM 666 O PRO 83 0.347 -10.751 9.103 1.00 3.59
ATOM 667 CB PRO 83 -0.615 -9.288 11.984 1.00 3.59
ATOM 668 CG PRO 83 -1.888 -9.067 11.196 1.00 3.59
ATOM 669 CD PRO 83 -1.811 -9.989 9.990 1.00 3.59
TER

multivalue in apbs

I cant find : export MULTIVALUE_BIN=/path/to/apbs/APBS-1.5-linux64/share/apbs/tools/bin/multivalue

i do:
export MULTIVALUE_BIN=/home/ubuntu/miniconda3/envs/masif/share/apbs/tools/mesh/multivalue.c
multivalue is a C program, when i run it ,
OSError: [Errno 8] Exec format error

Patch extraction

Sorry to bother you, I don't really understand how you divide the mesh surface into patches.

Do you start at one random vertex in the mesh, perform geodesic distance calculation for a given radius, select new vertex that is yet to be parsed in previous calculations or just compute patches for each vertex in the mesh? Input for neural network consists a matrix of size: number of patches x features considered (5)

I cannot find the code lines that explain this patch division.

Best,
Liv

NameError: name 'read_point_cloud' is not defined

when i run search masif-search ./second_stage_masif.sh 2000 with docker,occur problem this function not define,but i do not know where read_point_cloud where is......

root@8638240e23fe:/masif/comparison/masif_ppi_search/masif_descriptors_nn# ./second_stage_masif.sh 2000
2021-06-18 05:15:52.852452: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
['/masif/source//masif_ppi_search/second_stage_alignment_nn.py', '../../../data/masif_ppi_search', '2000', '2000', '1000', 'masif']
Loading patch coordinates for 2I32_A_E
Traceback (most recent call last):
  File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 114, in <module>
    read_point_cloud(
NameError: name 'read_point_cloud' is not defined

Recommended args for masif-search are not correct

Should be ./compute_descriptors.sh -l lists/testing.txt.

Pdl1 benchmark looking for masif_opts["coord_dir_npy"]

Looks like an artifact from matlab. What's the fix for this?

Generating Geometric Data

Hi,

Thanks for showing your work on MASIF. I was trying to understand the process a little more in depth, in particular how the angular and radial coordinates for each patch were generated and used. However, in the masif->source->data preparation section, the README mentions the files that I should look for, but the files from 02, 03, 03b from that README are missing in the github. Thus I cannot see the code for generating this geometric data.

If you could please upload these missing files, I'd greatly appreciate it!

Thank you
Devesh

Masif_pymol_plugin erorr

Hi,

I am able to install the masaif_pymol_plugin into pymol but when I try to load the plugin pymol gives me this error:
Unable to initialize plugin 'masif_pymol_plugin' (pmg_tk.startup.masif_pymol_plugin).

I tried writing "import pmg_tk.startup.masif_pymol_plugin" in the pymol command line as well but it gives me an error again:
Traceback (most recent call last):
File "/Applications/PyMOL.app/Contents/lib/python3.7/site-packages/pmg_tk/startup/masif_pymol_plugin/init.py", line 5, in
from loadPLY import *
ModuleNotFoundError: No module named 'loadPLY'

I also looked in the masif_pymol_plugin folder to make sure there was a loadPLY file and there was.
Thank you!

Ever considered changing the hydrophobicity scale?

I was wondering if you have ever considered using a more detailed treatment of hydrophobicity? I was just thinking about how if a tiny fraction of a leucine residue is solvent-accessible, this is then labelled as extremely hydrophobic. However, in this case it is incorrect - the Kyte-Doolittle scale rates leucine this way due to its size and composition, and if you reduce the (effective / solvent-accessible) size then the hydrophobicity should also be reduced.

For example, this one assigns a +1 or -1 depending on both the residue and the atom type:
A Simple Atomic-Level Hydrophobicity Scale Reveals Protein Interfacial Structure, Kapcha & Rossky, JMB 2014
https://www.sciencedirect.com/science/article/pii/S0022283613006232?via%3Dihub

Also, I notice that the module "triangulation.computeHydrophobicity" has no means of dealing with non-canonical or modified amino acids. I haven't checked it, but it looks like the code will break if it encounters one. Maybe you should at least use the .get() method to access the kd_scale dictionary. It would be best if there was some way of matching non-canonical / modified amino acids to some hydrophobicity scale. However I'm not sure if those terms are used consistently. At least I've never found a look-up table for 3-letter-code to modified amino acid, etc. And of course this should be made clear somewhere, whichever way you go with.

Use PDB structures from modelling software like ITASSER

Hi!!!

I wanted to know whether its possible to use the output PDB files from software like I-TASSER (or any other structural modeller) as an input for MASIF. My proteins of interest on PDB site have low resolution and the crystal structures are of partial sequences.
Hence, I want to create pdb files based on complete sequqence s and then use MASIF to predict the PPI interface. Could you please help me out with this ? Thanks in advance for any help.

Regards,

Anupam

i wonder how can i get the features list

In my project, i want the features scrached from the surface of protein . Therefore, i wonder how can i get the features list

Getting "docked" coordinates from masif_ppi

I have two proteins and would like to get the "docked" coordinates from masif_ppi. It looks like source/masif_ppi_search/second_stage_alignment.py perfoms an alignment of points, but I was wondering if there's another function or script that can apply the best N transformations to the input ligand in order to generate a set of N docked poses.

Pretrained model for the second stage alignment

Hi!

In MaSIF search protocol I'm trying to run second stage alignment by using the pretrained model, however in the directory /masif/data/masif_pdl1_benchmark/models/nn_score there are only .index and .data files. Is it possible to provide the trained_model.hdf5 file that is being called by score_nn.py script?

Thank you so much,
Goran

Missing lower bound in normalize_electrostatics

Hi,

Thank you for this great repository, it has been very useful so far. Working with it, I noticed an issue as highlighted below in the following function:

masif/source/masif_modules/read_data_from_surface.py

Line 251 in 0657ec9

def normalize_electrostatics(in_elec):

The electrostatics are upper-bounded to 3, but not lower-bounded to -3. I don't think this was intentional.

step 1 download pdf failed

When I run ./data_prepare_one.sh 1MBN_A_ under data/masif_site, it failed to download and showed

WARNING: The default download format has changed from PDB to PDBx/mmCif

Error handling: missing atoms when calculating hydrogen-bonding potential

Sometimes I've found that when looking for hydrogen bond acceptors, the code will break if the acceptor atom is there but coordinates are missing for its bonded, neighbouring atom. I.e. in "triangulation.computeCharges" line 82, res[acceptorAngleAtom[atom_name]].get_coord() will throw a KeyError exception. I found this on PDB entries 2avn and 1vdn.

My suggested fix was already implemented for acceptorPlaneAtom in the same function:

try:
    a = res[acceptor_atom.get_coord()
except KeyError:
    return 0.0

how to get the protein list？download in PDB？ i can‘t find how to download the protein IDand chains

StrBioInfo problem

Hi
I have install StrBioInfo 0.9a0.dev1.

However, when I execute line "struct_assembly = struct.apply_biomolecule_matrices()[0]" in "00b-generate_assembly.py ",it show error
AttributeError: 'PDBFrame' object has no attribute 'apply_biomolecule_matrices'

will this package gets update?

Remove tmp files when preparing data

MSMS, for example, is leaving huge files in the tmp directory. These should be removed.

I tried to use docker image & fastest and easiest way reproduce way.

2022-07-18 09:09:30.240557: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
['/masif/source//masif_ppi_search/second_stage_alignment_nn.py', '../../../data/masif_ppi_search', '100', '2000', '1000', 'masif']
Loading patch coordinates for 2I32_A_E
Loading patch coordinates for 2P47_A_B
Loading patch coordinates for 1BRS_A_D
Loading patch coordinates for 3P8B_C_D
Loading patch coordinates for 2Y32_B_D
Loading patch coordinates for 3M85_B_E
Loading patch coordinates for 2J12_A_B
Loading patch coordinates for 2ZXW_O_U
Loading patch coordinates for 3TND_B_D
Loading patch coordinates for 3F74_A_B
Loading patch coordinates for 1JZO_A_B
Loading patch coordinates for 2P45_A_B
Loading patch coordinates for 3IBM_A_B
Loading patch coordinates for 3CDW_A_H
Loading patch coordinates for 3S8V_A_X
Loading patch coordinates for 4KGG_C_A
Loading patch coordinates for 1TQ9_A_B
Loading patch coordinates for 1NPO_A_C
Loading patch coordinates for 3OGF_A_B
Loading patch coordinates for 1Z0K_A_C
Loading patch coordinates for 3Q9U_A_C
Loading patch coordinates for 1XUA_A_B
Loading patch coordinates for 3AXY_B_D
Loading patch coordinates for 2QLC_C_B
Loading patch coordinates for 3QWQ_A_B
Loading patch coordinates for 2O8Q_A_B
Loading patch coordinates for 2JI1_C_D
Loading patch coordinates for 1HBT_I_H
Loading patch coordinates for 3QWN_I_J
Loading patch coordinates for 2Z0P_C_D
Loading patch coordinates for 1XDT_T_R
Loading patch coordinates for 1ID5_H_L
Loading patch coordinates for 1PXV_A_C
Loading patch coordinates for 1I4O_B_D
Loading patch coordinates for 2Z7F_E_I
Loading patch coordinates for 2FE8_A_C
Loading patch coordinates for 1LQM_E_F
Loading patch coordinates for 2Z29_A_B
Loading patch coordinates for 3P71_C_T
Loading patch coordinates for 4CJ0_A_B
Loading patch coordinates for 4TQ1_A_B
Loading patch coordinates for 2WQ4_A_C
Loading patch coordinates for 2LBU_E_D
Loading patch coordinates for 3S9C_A_B
Loading patch coordinates for 1AVX_A_B
Loading patch coordinates for 1A2K_C_AB
Loading patch coordinates for 2WAM_A_C
Loading patch coordinates for 3SGB_E_I
Loading patch coordinates for 3B5U_J_L
Loading patch coordinates for 1YLQ_A_B
Loading patch coordinates for 1YY9_A_D
Loading patch coordinates for 2B3Z_C_D
Loading patch coordinates for 3HN6_B_D
Loading patch coordinates for 1T0F_A_B
Loading patch coordinates for 3PGA_1_4
Loading patch coordinates for 2AQX_A_B
Loading patch coordinates for 1SOT_A_C
Loading patch coordinates for 1SHY_A_B
Loading patch coordinates for 3EYD_C_D
Loading patch coordinates for 1UUG_A_B
Loading patch coordinates for 3KZH_A_B
Loading patch coordinates for 2HEK_A_B
Loading patch coordinates for 4YDJ_HL_G
Loading patch coordinates for 3HCG_A_C
Loading patch coordinates for 3K3C_A_B
Loading patch coordinates for 1JKG_A_B
Loading patch coordinates for 5GPG_A_B
Loading patch coordinates for 4AG2_A_C
Loading patch coordinates for 3SLH_A_B
Loading patch coordinates for 3ISM_A_B
Loading patch coordinates for 3KMT_A_B
Loading patch coordinates for 1XPJ_A_D
Loading patch coordinates for 1UGH_E_I
Loading patch coordinates for 1I07_A_B
Loading patch coordinates for 3CEW_C_D
Loading patch coordinates for 2HDP_A_B
Loading patch coordinates for 2G2W_A_B
Loading patch coordinates for 3WN7_A_B
Loading patch coordinates for 3Q0Y_C_B
Loading patch coordinates for 3CG8_C_B
Loading patch coordinates for 1Q5H_A_B
Loading patch coordinates for 2B42_B_A
Loading patch coordinates for 2YZJ_A_C
Loading patch coordinates for 3ECY_A_B
Loading patch coordinates for 3HRD_E_H
Loading patch coordinates for 1ZR0_A_B
Loading patch coordinates for 3E2U_A_E
Loading patch coordinates for 1ERN_A_B
Loading patch coordinates for 1O9Y_A_D
Loading patch coordinates for 3RDZ_A_C
Loading patch coordinates for 1ZVN_A_B
Loading patch coordinates for 3CHW_A_P
Loading patch coordinates for 4M5F_A_B
Loading patch coordinates for 3Q87_A_B
Loading patch coordinates for 3BTV_A_B
Loading patch coordinates for 3FJS_C_D
Loading patch coordinates for 2GKW_A_B
Loading patch coordinates for 2GD4_C_B
Loading patch coordinates for 2A2L_C_B
Loading patch coordinates for 1XT9_A_B
Docking all binders on target: 2I32_A_E 
Traceback (most recent call last):
  File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 198, in <module>
    target_coord = subsample_patch_coords(target_pdb, "p1", precomp_dir_9A, center_point)
  File "/masif/source/masif_ppi_search/alignment_utils_masif_search.py", line 305, in subsample_patch_coords
    for iii, v in enumerate(cv):
TypeError: 'numpy.int64' object is not iterable

what is cv and center_point role?
if subsample_patch_coords function is well functioning, center_point shape should be one-dimension list. but, center_point is int64.
I don't know how to run fastest and easiest way.

by the way, docker image belongs too better version open3D(0.9). fix your code, please.

Cannot download large PDB structures

Bio.PDB.PDBList() disallows the downloading of structures >62 chains or >99999 ATOM lines using the 'pdb' (.ent) format. Attempting this gives a "Desired structure doesn't exist" error.

There are a couple of other file_format options for which this is allowed, but it's not completely clear how to utilize these formats in the downstream data_preparation steps. It would be very helpful to be able to use one of these other formats in the MaSIF pipeline.

masif_ppi_search struggles with rotated chains

Hello,

Thanks for your work on MaSIF and for making the source code available! I'm running MaSIF in the supplied docker container, with some tweaks (for example, I changed the scripts a little to output the high-scoring docked structures for manual review).

I'm running on a very small list of proteins of interest to me. I first ran the proteins through the default pipeline where it downloads the complexes directly from the PDB to recompute the benchmark, followed by the second stage alignment. This was fairly successful.

However, in the next stage, I decided to try something more like a "real world" experiment where I would not necessarily have a native structure; instead, I would have two chains that I suspect interact but don't know how. So I used the same pairs from above and did some standard light protein prep (no backbone changes) and rotation to emulate the perhaps-unknown position of the two proteins to one another. There was a marked decrease in performance, and upon further trials it became clear that the protein preparation was not an issue but that rotation of the chains contributed to a huge decrease in performance, and even an inability to find anything close to a native structure despite previously finding near-native structures with ease.

Based on what I understand from the paper and the code, the interface descriptors and alignment phase should be agnostic to initial rotation of the chain. Is there something I'm missing here?

Thanks!

MaSIF-search second-stage issue

hi, there.
I have some questions about the MaSIF-search second-stage, I can't figure out where is the code to generates the complexes like the paper said.It would be great if anyone can give me a clue.

Steps 2 and 3 missing in data_preparation?

Goes straight from 1 to 4.

How to install reduce(3.23)

Hi,
Thanks for your great work. I was trying to install dependencies. But I am not sure how to install reduce(3.23) on the Linux system.
Could you show me how?

How can I get ground truth?

I would like to get other proteins' ground_truth '.ply' files,where can I search it?

dockerfile does not work

The repo in git clone https://github.com/Electrostatics/apbs-pdb2pqr is archived and not available anymore so the dockerfile doesn't work. Would it be possible to obtain an updated dockerfile?

Unexpected Errors second stage alignment

In the Masif search process after running the descriptors, I am. Unable to run the second_alignment_nn. Py successfully the pretrained model doesn't load. Any hint can u provide the correct pretrained.hdf5 file. Thanks. Hoping for a positive response

How can I get per-residue predictions from masif-site?

Following the docker tutorial of MaSIF-site, I can generate an "ID_chain.ply" file. But how can I get per-residue predictions?

PyMol plugin slow

Dear Masif-Team,

thanks for publishing MaSIF as open-source :-)
I find the PyMol plugin very useful, and I would like to use it also for general visualization tasks. However, I found that the current version of Simple_mesh is loading very slowly when the number of vertices is high. I managed to fix it (at least for my purposes) by changing the following:

        for jj in range(len(self.attributes["vertex_x"])):
            self.vertices = np.vstack(
                [
                    self.attributes["vertex_x"],
                    self.attributes["vertex_y"],
                    self.attributes["vertex_z"],
                ]
            ).T

        self.vertices = np.vstack(
            [
                self.attributes["vertex_x"],
                self.attributes["vertex_y"],
                self.attributes["vertex_z"],
            ]
        ).T

This reduces the loading time from >5 minutes (I killed PyMol at some point) to about 1 sec, for a surface with about 100000 vertices. I think this is safe to change because the jj variable is not used anywhere in the loop (maybe one could also do if len(...):?). I don't need any help with this, I just thought I'd report it in case you want to fix it on GitHub.

Best regards,
Franz Waibl

MaSIF v2 source code

Hi there,

Have you, are your planning to, release the source code from the recent biorxiv paper?

Thanks

Question regarding PyMOL plugin

I tried to install the provided plugin into the software (PyMOL). But I got the error message as following.

It says that the plugin is installed but the initialization fails. The above image is taken under a Windows machine.
And the same situation happens on my Ubuntu machine.
Does this mean that the plugin you provided only works with the MAC machine?

how to visualize the output files?

The pdl1_benchmark_nn.py produces several .pdb files and .vert files. Just wondering use these files, say to visualize the docking site in PyMol? Thank you.

Whether a local pdb file as input possible?

Hi,
If masif could take local pdb files(not deposited in PDB database) as inputs? May by modifying the scripts in data preparation?
Thanks in advance if anyone could give some suggestions!

Problems when extracting chains from PDB files

Hi,

I was using "input_output.extractPDB" to extract PDB chains for my work and two problems popped up. Essentially the code can end up missing a few residues depending on the following:

class NotDisordered(Select):
    def accept_atom(self, atom):
        return not atom.is_disordered() or atom.get_altloc() == "A"

According to the comment on this class, it is supposed to exclude disordered atoms. However, this class actually is used to save disordered atoms. It appears to be a result of an error in the validation of the X-ray-derived structure (link), whereby two different configurations are given for a set of atoms. What this code is supposed to do is to choose one out of those two configurations ().

Typically the two configurations are labelled 'A' and 'B', however I have noticed that they can be labelled as '1' or '2'. When they are labelled '1' or '2', the class returns False -- the disordered residues are ignored. For my own work, I have used the following function to ensure that the residues are not ignored, even if they are labelled '1' or '2'.

class NotDisordered(Select):
    def accept_atom(self, atom):
        return not atom.is_disordered() or atom.get_altloc() == "A" or atom.get_altloc() == "1"

Modified / non-canonical amino acids are designated HETATM in the PDB, even though they are clearly amino acids. I added a function which reads the names of modified / non-canonical amino acids from the PDB SEQRES section, and notes the non-standard codes.

from Bio.SeqUtils import IUPACData
PROTEIN_LETTERS = [x.upper() for x in IUPACData.protein_letters_3to1.keys()]

def find_modified_amino_acids(path):
    res_set = set()
    for line in open(path, 'r'):
        if line[:6] == 'SEQRES':
            for res in line.split()[4:]:
                res_set.add(res)
    for res in list(res_set):
        if res in PROTEIN_LETTERS:
            res_set.remove(res)
    return res_set

def extractPDB(
    infilename, outfilename, chain_ids=None, includeWaters=False, invert=False
):
    # extract the chain_ids from infilename and save in outfilename. 
    # includeWaters: deprecated parameter, include the crystallographic waters (should not be used). 
    # invert: Select all chains EXCEPT those in chain_ids.
    parser = PDBParser(QUIET=True)
    struct = parser.get_structure(infilename, infilename)
    model = Selection.unfold_entities(struct, "M")[0]
    chains = Selection.unfold_entities(struct, "C")

    # Select residues to extract and build new structure
    structBuild = StructureBuilder.StructureBuilder()
    structBuild.init_structure("output")
    structBuild.init_seg(" ")
    structBuild.init_model(0)
    outputStruct = structBuild.get_structure()


    # Load a list of non-standard amino acid names -- these are
    # typically listed under HETATM, so they would be typically
    # ignored by the orginal algorithm
    modified_amino_acids = find_modified_amino_acids(infilename)

    for chain in model:
        if (
            chain_ids == None
            or (chain.get_id() in chain_ids and not invert)
            or invert == True
        ):
            structBuild.init_chain(chain.get_id())
            for residue in chain:
                het = residue.get_id()
                if not invert:
                    if het[0] == " " or (het[0] == "W" and includeWaters):
                        outputStruct[0][chain.get_id()].add(residue)
                    elif het[0][-3:] in modified_amino_acids:
                        print(het[0])
                        outputStruct[0][chain.get_id()].add(residue)
                else:
                    if (het[0] == "W" and includeWaters) or (
                        chain.get_id() not in chain_ids
                    ):
                        outputStruct[0][chain.get_id()].add(residue)
                    elif het[0][-3:] in modified_amino_acids:
                        outputStruct[0][chain.get_id()].add(residue)
                                                                                                                                                                                                 63,0-1        


    # Output the selected residues
    pdbio = PDBIO()
    pdbio.set_structure(outputStruct)
    pdbio.save(outfilename, select=NotDisordered())

length assert error while run data_prepare_one.sh 1MBN_A_ under folder data/masif_site

Traceback

Traceback (most recent call last):
  File "/home/masif/source//data_preparation/04-masif_precompute.py", line 67, in <module>
    read_data_from_matfile_full_protein(coord_file, shape_file, ppi_pair_id, params, pid, label_iface=True)
  File "/home/masif/source/masif_modules/read_data_from_matfile.py", line 324, in read_data_from_matfile_full_protein
    assert len(np.unique(rows)) == len(p_s[protein_id]["X"][0])
AssertionError

length left 3273, right 3276

Any clue for this error? Thanks!

Lot of variation in masif_ppi_search outputs?

Hi there - I always get very different answers when doing the pdl1 benchmark with the neural network and I am not sure why.

What is the source of variation, shouldn't the neural network weights for pdl1_benchmark_nn.py be preloaded?

Otherwise a completely random model would be initialized with nn_model = ScoreNN(), which doesn't seem right. Thanks!

second_stage_alignment_nn.py error

Hi there,
I am trying to see if I can train the ppi-search model and use it for new predictions. However, when I was following the instruction and run ./second_stage_masif.sh 100 there is en error message from python 3.

File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 114, in
read_point_cloud(
NameError: name 'read_point_cloud' is not defined

Would you mind letting me know how to fix it?
Also, would you mind kindly describing how to prepare my own data ( I have some docking models from Rosetta) as I would like to use them for the ppi_search.
Thanks.