lpdi-epfl / masif Goto Github PK
View Code? Open in Web Editor NEWMaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
License: Apache License 2.0
MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
License: Apache License 2.0
Hi,
Thanks for sharing this great work. But the docker file you provided seems do not have the GPU included.
Do you know how could I include GPU and TensorFlow-GPU into the docker image?
Best regards!
I'm curious to know why Reduce was chosen over Open Babel for protonation. Would Open Babel work as well?
Hi,
I tried to run pdl1_benchmark with pdl1_benchmark_nn.py, but the 4ZQK_A and 4ZQK_B are not in the list of top_scores.
They matched only when I lower iface_cutoff to 0.5, and only at the single point
near_points: [1820]
iface: [0.5888073]
diff: [1.696772]
Is the model provided in masif/data/masif_pdl1_benchmark/nn_models/
the same as described in the paper, or for the reproduction of the results I should train the model myself?
Thank you so much!
Hi,
I have tried to run data_prepare_one but I get the following error and the program stops within ipdb.
MLC02GC4Z3Q05P:masif_site 4464689$ ./data_prepare_one.sh 1AKJ_AB_DE
:/Users/4464689/Downloads/masif/source/
Structure exists: '/var/folders/mn/7xx5f6314c1glph1lkt4k_9m002gyq/T/pdb1akj.ent'
--Call--
> /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py(875)__del__()
873 self.wait()
874
--> 875 def __del__(self, _maxsize=sys.maxsize, _warn=warnings.warn):
876 if not self._child_created:
877 # We didn't get to successfully create a child process.
ipdb>
I appreciate any input,
Aleks
Hi, I have run masif_ppi_search on the PDB 6M3M_A and have gotten two output files labeled:
p1_desc_flipped.npy and p1_desc_straight.npy
I cannot find documented what these two files mean and how they relate to finding binders of the input PDB. Any help would be appreciated, thanks!
Hi.
I tried to train MaSIF-site and got the following. It is still training, but slowly.
2021-11-10 14:06:26.243439: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Identity: CPU XLA_CPU XLA_GPU
Assign: CPU
Const: CPU XLA_CPU XLA_GPU
ApplyAdam: CPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
fully_connected_3/biases/Initializer/zeros (Const)
fully_connected_3/biases (VariableV2) /device:GPU:1
fully_connected_3/biases/Assign (Assign) /device:GPU:1
fully_connected_3/biases/read (Identity) /device:GPU:1
fully_connected_3/biases/Adam/Initializer/zeros (Const) /device:GPU:1
fully_connected_3/biases/Adam (VariableV2) /device:GPU:1
fully_connected_3/biases/Adam/Assign (Assign) /device:GPU:1
fully_connected_3/biases/Adam/read (Identity) /device:GPU:1
fully_connected_3/biases/Adam_1/Initializer/zeros (Const) /device:GPU:1
fully_connected_3/biases/Adam_1 (VariableV2) /device:GPU:1
fully_connected_3/biases/Adam_1/Assign (Assign) /device:GPU:1
fully_connected_3/biases/Adam_1/read (Identity) /device:GPU:1
Adam/update_fully_connected_3/biases/ApplyAdam (ApplyAdam) /device:GPU:1
save/Assign_62 (Assign) /device:GPU:1
save/Assign_63 (Assign) /device:GPU:1
save/Assign_64 (Assign) /device:GPU:1
i tried the script in docker, ./masif_ligand/data_prepare_one.sh 4ZQK_A
and gets the following error
Downloading PDB structure '4ZQK'...
Traceback (most recent call last):
File "/masif/source//data_preparation/00b-generate_assembly.py", line 3, in
from SBI.structure import PDB
ImportError: No module named SBI.structure
Traceback (most recent call last):
File "/masif/source//data_preparation/00c-save_ligand_coords.py", line 2, in
import numpy as np
ImportError: No module named numpy
Traceback (most recent call last):
File "/masif/source//data_preparation/01-pdb_extract_and_triangulate.py", line 48, in
extractPDB(pdb_filename, out_filename1+".pdb", chain_ids1)
File "/masif/source/input_output/extractPDB.py", line 21, in extractPDB
model = Selection.unfold_entities(struct, "M")[0]
IndexError: list index out of range
4ZQK_A
Reading data from input ply surface files.
Traceback (most recent call last):
File "/masif/source//data_preparation/04-masif_precompute.py", line 74, in
input_feat[pid], rho[pid], theta[pid], mask[pid], neigh_indices[pid], iface_labels[pid], verts[pid] = read_data_from_surface(ply_file[pid], params)
File "/masif/source/masif_modules/read_data_from_surface.py", line 23, in read_data_from_surface
mesh = pymesh.load_mesh(ply_fn)
File "/usr/local/lib/python3.6/site-packages/pymesh/meshio.py", line 21, in load_mesh
raise IOError("File not found: {}".format(filename));
OSError: File not found: data_preparation/01-benchmark_surfaces//4ZQK_A.ply
can anyone tell me where i got wrong, thx
I want to docking two proteins.
I just have two momomer,not Homopolymer or Heteromer.
PDBID_CHAIN
,i can't get the p1_sc_labels.npy
file,but if i run it with PDBID_CHAIN1_CHAIN2
,i got it.so i copy them:cd /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation
cp 1A2K_C_A/p1_sc_labels.npy 1A2K_C/p1_sc_labels.npy
cp 1A2K_C_A/p2_sc_labels.npy 1A2K_A/p1_sc_labels.npy
cp 5JYL_A_B/p1_sc_labels.npy 5JYL_A/p1_sc_labels.npy
cp 5JYL_A_B/p2_sc_labels.npy 5JYL_B/p1_sc_labels.npy
but i don't know whether it is right,it is right?
mutidock
result, but i don't know how to transform it to pdb format,has some tools to transform it.Firstly,i run masif-site ./data_prepare_one.sh 2MWS_B
,and then I run ./predict_site.sh 2MWS_B
to predict sites,and i got four folds
00-raw_pdbs 01-benchmark_pdbs 01-benchmark_surfaces 04a-precomputation_9A
it run very well.
And Next run ppi searchcd ../masif_ppi_search
,it need format PDBID_CHAIN1_CHAIN2
,but i just have two momomer,not Homopolymer or Heteromer.the command /masif/data/masif_ppi_search/data_prepare_one.sh 3AXY_B
doesn't work.
and i down and extract pdb manually,move they into data_preparation/00-raw_pdbs/
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/00-raw_pdbs/
1A2K.pdb 5JYL.pdb
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/01-benchmark_pdbs/
1A2K_A.pdb 1A2K_C.pdb 5JYL_A.pdb 5JYL_B.pdb
next i run:
masif_root=$(git rev-parse --show-toplevel)
masif_source=$masif_root/source/
export PYTHONPATH=$PYTHONPATH:$masif_source
PDB_ID='1A2K'
CHAIN1='C'
CHAIN2='A'
# Load your environment here.
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py $PDB_ID\_$CHAIN2
python $masif_source/data_preparation/04-masif_precompute.py masif_site $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/04-masif_precompute.py masif_site $PDB_ID\_$CHAIN2
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search $PDB_ID\_$CHAIN1
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search $PDB_ID\_$CHAIN2
get the output
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/04a-precomputation_9A/precomputation/1A2K_A/
p1_X.npy p1_Z.npy p1_input_feat.npy p1_mask.npy p1_theta_wrt_center.npy
p1_Y.npy p1_iface_labels.npy p1_list_indices.npy p1_rho_wrt_center.npy
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/04a-precomputation_9A/precomputation/1A2K_C
p1_X.npy p1_Z.npy p1_input_feat.npy p1_mask.npy p1_theta_wrt_center.npy
p1_Y.npy p1_iface_labels.npy p1_list_indices.npy p1_rho_wrt_center.npy
and then calculate the description
./compute_descriptors.sh $PDB_ID\_$CHAIN1
./compute_descriptors.sh $PDB_ID\_$CHAIN2
and get the output
root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_C
p1_desc_flipped.npy p1_desc_straight.npy
root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_A
p1_desc_flipped.npy p1_desc_straight.npy
it doesn't raise any Exception just some warning.
for another protein,wo also do so.
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py 5JYL_A
python $masif_source/data_preparation/04-masif_precompute.py masif_site 5JYL_A
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search 5JYL_A
./compute_descriptors.sh 5JYL_A
python $masif_source/data_preparation/01-pdb_extract_and_triangulate.py 5JYL_B
python $masif_source/data_preparation/04-masif_precompute.py masif_site 5JYL_B
python $masif_source/data_preparation/04-masif_precompute.py masif_ppi_search 5JYL_B
./compute_descriptors.sh 5JYL_B
nohup sh ./data_prepare_one.sh 5JYL_A_B &
nohup sh ./data_prepare_one.sh 1A2K_C_A &
nohup sh ./compute_descriptors.sh 5JYL_A_B &
nohup sh ./compute_descriptors.sh 1A2K_C_A &
i see the py file/masif/source/masif_ppi_search/second_stage_alignment_nn.py
it need some necessary setting variables.
masif_opts = {}
masif_opts["pdb_chain_dir"] = "data_preparation/01-benchmark_pdbs/"#* 每个蛋白对应的链
masif_opts["ply_chain_dir"] = "data_preparation/01-benchmark_surfaces/"#* 蛋白对应的表面 用site计算出来的
masif_opts["ppi_search"]={}
masif_opts["ppi_search"][
"masif_precomputation_dir"
] = "data_preparation/04b-precomputation_12A/precomputation/"
masif_opts["ppi_search"]["desc_dir"] = "descriptors/sc05/all_feat/"#* 这里通过compute_description.sh计算得出
masif_opts["ppi_search"]["gif_descriptors_out"] = "gif_descriptors/"#方法gif才需要,masif不需要 空的目录文件夹
masif_opts["site"]={}
masif_opts["site"][
"masif_precomputation_dir"
] = "data_preparation/04a-precomputation_9A/precomputation/"
it is global variable.
define my protein list:
root@eb92233498e0:/masif/data/masif_ppi_search# for i in $(ls data_preparation/01-benchmark_pdbs);do echo ${i/.pdb/}; done > lists/mylist.txt
root@eb92233498e0:/masif/data/masif_ppi_search# cat lists/mylist.txt
1A2K_A
1A2K_C
5JYL_A
5JYL_B
import os
import numpy as np
# Location of surface (ply) files.
data_dir='/masif/data/masif_ppi_search'
surf_dir = os.path.join(data_dir, masif_opts["ply_chain_dir"])
desc_dir = os.path.join(data_dir, masif_opts["ppi_search"]["desc_dir"])
pdb_dir = os.path.join(data_dir, masif_opts["pdb_chain_dir"])
precomp_dir = os.path.join(
data_dir, masif_opts["ppi_search"]["masif_precomputation_dir"]
)
precomp_dir_9A = os.path.join(
data_dir, masif_opts["site"]["masif_precomputation_dir"]
)
benchmark_list=os.path.join(data_dir, 'lists','mylist.txt')
pdb_list = open(benchmark_list).readlines()[0:100]
pdb_list = [x.rstrip() for x in pdb_list]
# Read all surfaces.
all_pc = []
all_desc = []
rand_list = np.copy(pdb_list)
#np.random.seed(0)
np.random.shuffle(rand_list)
rand_list = rand_list[0:100]
p2_descriptors_straight = []
p2_point_clouds = []
p2_patch_coords = []
p2_names = []
lack file p1_sc_labels.npy,
root@eb92233498e0:/masif/source/masif_ppi_search# ls /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation/5JYL_B/
p1_X.npy p1_Z.npy p1_input_feat.npy p1_mask.npy p1_theta_wrt_center.npy
p1_Y.npy p1_iface_labels.npy p1_list_indices.npy p1_rho_wrt_center.npy
root@eb92233498e0:/masif/source/masif_ppi_search# ls /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation/5JYL_A_B/
p1_X.npy p1_input_feat.npy p1_sc_labels.npy p2_Z.npy p2_mask.npy
p1_Y.npy p1_list_indices.npy p1_theta_wrt_center.npy p2_iface_labels.npy p2_rho_wrt_center.npy
p1_Z.npy p1_mask.npy p2_X.npy p2_input_feat.npy p2_sc_labels.npy
p1_iface_labels.npy p1_rho_wrt_center.npy p2_Y.npy p2_list_indices.npy p2_theta_wrt_center.npy
i move file but i don't know whether it is right
cd /masif/data/masif_ppi_search/data_preparation/04b-precomputation_12A/precomputation
cp 1A2K_C_A/p1_sc_labels.npy 1A2K_C/p1_sc_labels.npy
cp 1A2K_C_A/p2_sc_labels.npy 1A2K_A/p1_sc_labels.npy
cp 5JYL_A_B/p1_sc_labels.npy 5JYL_A/p1_sc_labels.npy
cp 5JYL_A_B/p2_sc_labels.npy 5JYL_B/p1_sc_labels.npy
move model to workplace
cd /masif/source/masif_ppi_search/
cp -r /masif/comparison/masif_ppi_search/masif_descriptors_nn/models .
cd /masif/source/masif_ppi_search;
touch a new python scripttouch docking.py
import scipy.sparse as spio
import copy
from Bio.PDB import *
from scipy.spatial import cKDTree
from transformation_training_data.score_nn import ScoreNN
from alignment_utils_masif_search import compute_nn_score, rand_rotation_matrix, \
get_center_and_random_rotate, get_patch_geo, multidock, test_alignments, \
subsample_patch_coords
import time
import sklearn.metrics
masif_opts = {}
masif_opts["pdb_chain_dir"] = "data_preparation/01-benchmark_pdbs/"#* 每个蛋白对应的链
masif_opts["ply_chain_dir"] = "data_preparation/01-benchmark_surfaces/"#* 蛋白对应的表面 用site计算出来的
masif_opts["ppi_search"]={}
masif_opts["ppi_search"][
"masif_precomputation_dir"
] = "data_preparation/04b-precomputation_12A/precomputation/"
masif_opts["ppi_search"]["desc_dir"] = "descriptors/sc05/all_feat/"#* 这里通过compute_description.sh计算得出
masif_opts["ppi_search"]["gif_descriptors_out"] = "gif_descriptors/"#方法gif才需要,masif不需要 空的目录文件夹
masif_opts["site"]={}
masif_opts["site"][
"masif_precomputation_dir"
] = "data_preparation/04a-precomputation_9A/precomputation/"
nn_model = ScoreNN()
import os
import numpy as np
data_dir='/masif/data/masif_ppi_search'
surf_dir = os.path.join(data_dir, masif_opts["ply_chain_dir"])
desc_dir = os.path.join(data_dir, masif_opts["ppi_search"]["desc_dir"])
pdb_dir = os.path.join(data_dir, masif_opts["pdb_chain_dir"])
precomp_dir = os.path.join(
data_dir, masif_opts["ppi_search"]["masif_precomputation_dir"]
)
precomp_dir_9A = os.path.join(
data_dir, masif_opts["site"]["masif_precomputation_dir"]
)
benchmark_list=os.path.join(data_dir, 'lists','mylist.txt')
pdb_list = open(benchmark_list).readlines()[0:100]
pdb_list = [x.rstrip() for x in pdb_list]
# Read all surfaces.
all_pc = []
all_desc = []
rand_list = np.copy(pdb_list)
#np.random.seed(0)
np.random.shuffle(rand_list)
rand_list = rand_list[0:100]
p2_descriptors_straight = []
p2_point_clouds = []
p2_patch_coords = []
p2_names = []
from geometry.open3d_import import *
for i, pdb in enumerate(rand_list):
print("Loading patch coordinates for {}".format(pdb))
pdb_id = pdb.split("_")[0]
chains = pdb.split("_")[1]
# Descriptors for global matching.
p2_descriptors_straight.append(
np.load(os.path.join(desc_dir, pdb, "p1_desc_straight.npy"))
)
p2_point_clouds.append(
read_point_cloud(
os.path.join(surf_dir, "{}.ply".format(pdb_id + "_" + chains))
)
)
pc = subsample_patch_coords(pdb, "p1", precomp_dir_9A)
p2_patch_coords.append(pc)
p2_names.append(pdb)
all_positive_scores = []
all_positive_rmsd = []
all_negative_scores = []
# Match all descriptors.
count_found = 0
all_rankings_desc = []
# Now go through each target (p1 in every case) and dock each 'decoy' binder to it.
# The target will have flipped (inverted) descriptors.
K=30
ransac_iter=100
ttf=[]
for target_ix, target_pdb in enumerate(rand_list):
target_pdb_id = target_pdb.split("_")[0]
chains = target_pdb.split("_")[1]
# Load target descriptors for global matching.
target_desc = np.load(os.path.join(desc_dir, target_pdb, "p1_desc_flipped.npy"))
# Load target point cloud
target_pc = os.path.join(surf_dir, "{}.ply".format(target_pdb_id + "_" + chains))
target_pcd = read_point_cloud(target_pc)
# Read the point with the highest shape compl.
sc_labels = np.load(os.path.join(precomp_dir, target_pdb, "p1_sc_labels.npy"))
center_point = np.argmax(np.median(np.nan_to_num(sc_labels[0]), axis=1))
# Go through each source descriptor, find the top descriptors, store id+pdb
num_negs = 0
all_desc_dists = []
all_pdb_id = []
all_vix = []
gt_dists = []
# This is where the desriptors are actually compared (stage 1 of the MaSIF-search protocol)
for source_ix, source_pdb in enumerate(rand_list):
source_desc = p2_descriptors_straight[source_ix]
desc_dists = np.linalg.norm(source_desc - target_desc[center_point], axis=1)
all_desc_dists.append(desc_dists)
all_pdb_id.append([source_pdb] * len(desc_dists))
all_vix.append(np.arange(len(desc_dists)))
if source_pdb == target_pdb:
source_pcd = p2_point_clouds[source_ix]
eucl_dists = np.linalg.norm(
np.asarray(source_pcd.points)
- np.asarray(target_pcd.points)[center_point, :],
axis=1,
)
eucl_closest = np.argsort(eucl_dists)
gt_dists = desc_dists[eucl_closest[0:50]]
gt_count = len(source_desc)
all_desc_dists = np.concatenate(all_desc_dists, axis=0)
all_pdb_id = np.concatenate(all_pdb_id, axis=0)
all_vix = np.concatenate(all_vix, axis=0)
ranking = np.argsort(all_desc_dists)
# Load target geodesic distances.
target_coord = subsample_patch_coords(target_pdb, "p1", precomp_dir_9A, [center_point])
# Get the geodesic patch and descriptor patch for the target.
target_patch, target_patch_descs = get_patch_geo(
target_pcd, target_coord, center_point, target_desc, flip=True
)
# Make a ckdtree with the target.
target_ckdtree = cKDTree(target_patch.points)
## Load the structures of the target and the source (to get the ground truth).
parser = PDBParser()
target_struct = parser.get_structure(
"{}_{}".format(target_pdb_id, chains[0]),
os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains)),
)
#gt_source_struct = parser.get_structure(
# "{}_{}".format(target_pdb_id, chains[1]),
# os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains[1])),
#)
# Get coordinates of atoms for the ground truth and target.
target_atom_coords = [atom.get_coord() for atom in target_struct.get_atoms()]
target_ca_coords = [
atom.get_coord() for atom in target_struct.get_atoms() if atom.get_id() == "CA"
]
target_atom_coord_pcd = PointCloud()
target_ca_coord_pcd = PointCloud()
target_atom_coord_pcd.points = Vector3dVector(np.array(target_atom_coords))
target_ca_coord_pcd.points = Vector3dVector(np.array(target_ca_coords))
target_atom_pcd_tree = KDTreeFlann(target_atom_coord_pcd)
target_ca_pcd_tree = KDTreeFlann(target_ca_coord_pcd)
found = False
myrank_desc = float("inf")
chosen_top = ranking[0:K]
pos_scores = []
pos_rmsd = []
neg_scores = []
# This is where the matched descriptors are actually aligned.
for source_ix, source_pdb in enumerate(rand_list):
viii = chosen_top[np.where(all_pdb_id[chosen_top] == source_pdb)[0]]
source_vix = all_vix[viii]
if len(source_vix) == 0:
continue
source_desc = p2_descriptors_straight[source_ix]
source_pcd = copy.deepcopy(p2_point_clouds[source_ix])
source_coords = p2_patch_coords[source_ix]
# Randomly rotate and translate.
random_transformation = get_center_and_random_rotate(source_pcd)
source_pcd.transform(random_transformation)
# Dock and score each matched patch.
#print({'source_pcd':source_pcd,'source_coords':source_coords,'source_desc':source_desc,'source_vix':source_vix\
#,'target_patch':target_patch,'target_patch_descs':target_patch_descs,'target_ckdtree':target_ckdtree,'ransac_iter':ransac_iter})
if source_pdb!=target_pdb:#same structure does not need docking
all_results, all_source_patch, all_source_scores = multidock(
source_pcd,
source_coords,
source_desc,
source_vix,
target_patch,
target_patch_descs,
target_ckdtree,
nn_model,
ransac_iter=ransac_iter
)
res={'target_pdb':target_pdb,'source_pdb':source_pdb,'all_results':all_results,\
'all_source_patch':all_source_patch,'all_source_scores':all_source_scores}
ttf.append(res)
#ttf 返回的值代表好几种对接方式,通过指纹搜索来确定种类
see ttf[0]
>>> ttf[0]
{'target_pdb': '1A2K_C', 'source_pdb': '5JYL_B', 'all_results': [registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=2.600000e-01, inlier_rmse=6.074436e-01, and correspondence_set size of 26
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result.], 'all_source_patch': [geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points.], 'all_source_scores': [0.0014200918, 0.001413941, 2.9317687e-05, 0.001454716]}
all_results[0]
is <open3d.open3d.registration.RegistrationResult>
object,fitness
means correspondence point,if it eq 0,
the structure is bad.
but how can i transform from RegistrationResult and PointCloud to pdb format.I want to get the dockered structure.
I am Novice for docking,please help me.
Thank you.
When computing the iface for parts of the protein complex, which interact with each other, the complex surface should only contain the chains of interest.
For example, Complex_id1 has 5 chains.
We are looking at the interaction between id1_AB and id1_CD parts, however the surface for the Complex_id1 still contains a fifth chain (E). This way we will wrongly label some parts of the id1_AB and id1_BC surfaces as interfaces because they are being covered by an extra chain.
I am assuming we only want to look at interfaces between the parts we know that interact, so I am assuming that using the full complex surface is not the right way to label the interfaces.
Is it possible to run masif on small molecule data?
Hi,
I am exploring the numpy arrays produced by the precomputation script:
$masif_source/data_preparation/04-masif_precompute.py masif_ppi_search
Could you please explain the meaning of p1_mask.npy
and p2_mask.npy
?
I understand that the mask is for rho and theta, but don't understand the meaning. In which cases the value is zero?
Why do we skip some of the neighbors in the patch?
Thank you so much for all your effort!
Hello,
I encountered an error while running data_prepare_one.sh on a predicted protein structure.
The script ran fine on the download pdb file:
Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/4ZQK.pdb 4ZQK_A
Running masif site on data_preparation/00-raw_pdbs/4ZQK.pdb
cp: 'data_preparation/00-raw_pdbs/4ZQK.pdb' and 'data_preparation/00-raw_pdbs/4ZQK.pdb' are the same file
Empty
Removing degenerated triangles
Removing degenerated triangles
4ZQK_A
Reading data from input ply surface files.
Dijkstra took 3.65s
Only MDS time: 15.50s
Full loop time: 24.70s
MDS took 24.70s
It also ran fine on a predicted structure I downloaded online:
Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/AvrPita.pdb AvrPita_A
Running masif site on data_preparation/00-raw_pdbs/AvrPita.pdb
cp: 'data_preparation/00-raw_pdbs/AvrPita.pdb' and 'data_preparation/00-raw_pdbs/AvrPita.pdb' are the same file
Empty
Removing degenerated triangles
Removing degenerated triangles
AvrPita_A
Reading data from input ply surface files.
Dijkstra took 7.01s
Only MDS time: 29.31s
Full loop time: 46.89s
MDS took 46.89s
However, for this structure predicted on my local machine,
Singularity masif_latest:/global/scratch/software/MaSIF/masif/data/masif_site> ./data_prepare_one.sh --file data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb MGG-01993-ITASSER_A
Running masif site on data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb
cp: 'data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb' and 'data_preparation/00-raw_pdbs/MGG-01993-ITASSER.pdb' are the same file
--Call--
/usr/local/lib/python3.6/subprocess.py(758)del()
756 self.wait()
757
--> 758 def del(self, _maxsize=sys.maxsize, _warn=warnings.warn):
759 if not self._child_created:
760 # We didn't get to successfully create a child process.
ipdb>
I wasn't so sure what was causing this error.... Thank you in advance!
>head AvrPita.pdb
ATOM 1 N MET A 1 50.404 53.465 89.261 1.00 13.70
ATOM 2 CA MET A 1 49.060 53.953 88.970 1.00 13.70
ATOM 3 HA MET A 1 48.349 53.550 89.692 1.00 13.70
ATOM 4 CB MET A 1 49.107 55.497 89.071 1.00 13.70
ATOM 5 HB1 MET A 1 49.608 55.899 88.190 1.00 13.70
ATOM 6 HB2 MET A 1 49.694 55.787 89.940 1.00 13.70
ATOM 7 CG MET A 1 47.733 56.158 89.212 1.00 13.70
ATOM 8 HG1 MET A 1 47.334 55.866 90.163 1.00 13.70
ATOM 9 HG2 MET A 1 47.047 55.782 88.460 1.00 13.70
ATOM 10 SD MET A 1 47.708 57.971 89.163 1.00 13.70
>head MGG-011730-ITASSER.pdb
ATOM 1 H LEU 1 -30.724 18.366 -0.112 1.00 4.24
ATOM 2 N LEU 1 -30.717 19.328 -0.332 1.00 4.24
ATOM 3 CA LEU 1 -31.127 20.240 0.732 1.00 4.24
ATOM 4 C LEU 1 -30.216 20.107 1.947 1.00 4.24
ATOM 5 O LEU 1 -29.360 19.226 1.982 1.00 4.24
ATOM 6 CB LEU 1 -32.579 19.966 1.133 1.00 4.24
ATOM 7 CG LEU 1 -33.577 20.301 0.018 1.00 4.24
ATOM 8 CD1 LEU 1 -34.987 19.880 0.429 1.00 4.24
ATOM 9 CD2 LEU 1 -33.575 21.804 -0.259 1.00 4.24
ATOM 10 N PRO 2 -30.291 20.947 3.077 1.00 2.47
> tail AvrPita.pdb
ATOM 3585 H CYS A 224 76.364 66.692 47.328 1.00 5.19
ATOM 3586 CA CYS A 224 74.809 66.366 45.907 1.00 5.19
ATOM 3587 HA CYS A 224 74.247 65.434 45.807 1.00 5.19
ATOM 3588 CB CYS A 224 73.982 67.329 46.770 1.00 5.19
ATOM 3589 HB1 CYS A 224 73.157 67.727 46.174 1.00 5.19
ATOM 3590 HB2 CYS A 224 74.606 68.173 47.067 1.00 5.19
ATOM 3591 SG CYS A 224 73.262 66.560 48.246 1.00 5.19
ATOM 3592 C CYS A 224 75.018 66.922 44.489 1.00 5.19
ATOM 3593 O CYS A 224 76.123 67.110 43.980 1.00 5.19
TER
>tail MGG-011730-ITASSER.pdb
ATOM 661 OG1 THR 82 2.127 -6.129 7.772 1.00 2.03
ATOM 662 CG2 THR 82 0.326 -5.954 6.201 1.00 2.03
ATOM 663 N PRO 83 0.144 -8.497 9.877 1.00 3.59
ATOM 664 CA PRO 83 0.505 -9.336 10.943 1.00 3.59
ATOM 665 C PRO 83 0.584 -10.638 10.319 1.00 3.59
ATOM 666 O PRO 83 0.347 -10.751 9.103 1.00 3.59
ATOM 667 CB PRO 83 -0.615 -9.288 11.984 1.00 3.59
ATOM 668 CG PRO 83 -1.888 -9.067 11.196 1.00 3.59
ATOM 669 CD PRO 83 -1.811 -9.989 9.990 1.00 3.59
TER
I cant find : export MULTIVALUE_BIN=/path/to/apbs/APBS-1.5-linux64/share/apbs/tools/bin/multivalue
i do:
export MULTIVALUE_BIN=/home/ubuntu/miniconda3/envs/masif/share/apbs/tools/mesh/multivalue.c
multivalue is a C program, when i run it ,
OSError: [Errno 8] Exec format error
Sorry to bother you, I don't really understand how you divide the mesh surface into patches.
Do you start at one random vertex in the mesh, perform geodesic distance calculation for a given radius, select new vertex that is yet to be parsed in previous calculations or just compute patches for each vertex in the mesh? Input for neural network consists a matrix of size: number of patches x features considered (5)
I cannot find the code lines that explain this patch division.
Best,
Liv
when i run search masif-search ./second_stage_masif.sh 2000
with docker,occur problem this function not define,but i do not know where read_point_cloud where is......
root@8638240e23fe:/masif/comparison/masif_ppi_search/masif_descriptors_nn# ./second_stage_masif.sh 2000
2021-06-18 05:15:52.852452: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
['/masif/source//masif_ppi_search/second_stage_alignment_nn.py', '../../../data/masif_ppi_search', '2000', '2000', '1000', 'masif']
Loading patch coordinates for 2I32_A_E
Traceback (most recent call last):
File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 114, in <module>
read_point_cloud(
NameError: name 'read_point_cloud' is not defined
Should be ./compute_descriptors.sh -l lists/testing.txt
.
Looks like an artifact from matlab. What's the fix for this?
Hi,
Thanks for showing your work on MASIF. I was trying to understand the process a little more in depth, in particular how the angular and radial coordinates for each patch were generated and used. However, in the masif->source->data preparation section, the README mentions the files that I should look for, but the files from 02, 03, 03b from that README are missing in the github. Thus I cannot see the code for generating this geometric data.
If you could please upload these missing files, I'd greatly appreciate it!
Thank you
Devesh
Hi,
I am able to install the masaif_pymol_plugin into pymol but when I try to load the plugin pymol gives me this error:
Unable to initialize plugin 'masif_pymol_plugin' (pmg_tk.startup.masif_pymol_plugin).
I tried writing "import pmg_tk.startup.masif_pymol_plugin" in the pymol command line as well but it gives me an error again:
Traceback (most recent call last):
File "/Applications/PyMOL.app/Contents/lib/python3.7/site-packages/pmg_tk/startup/masif_pymol_plugin/init.py", line 5, in
from loadPLY import *
ModuleNotFoundError: No module named 'loadPLY'
I also looked in the masif_pymol_plugin folder to make sure there was a loadPLY file and there was.
Thank you!
I was wondering if you have ever considered using a more detailed treatment of hydrophobicity? I was just thinking about how if a tiny fraction of a leucine residue is solvent-accessible, this is then labelled as extremely hydrophobic. However, in this case it is incorrect - the Kyte-Doolittle scale rates leucine this way due to its size and composition, and if you reduce the (effective / solvent-accessible) size then the hydrophobicity should also be reduced.
For example, this one assigns a +1 or -1 depending on both the residue and the atom type:
A Simple Atomic-Level Hydrophobicity Scale Reveals Protein Interfacial Structure, Kapcha & Rossky, JMB 2014
https://www.sciencedirect.com/science/article/pii/S0022283613006232?via%3Dihub
Also, I notice that the module "triangulation.computeHydrophobicity" has no means of dealing with non-canonical or modified amino acids. I haven't checked it, but it looks like the code will break if it encounters one. Maybe you should at least use the .get() method to access the kd_scale dictionary. It would be best if there was some way of matching non-canonical / modified amino acids to some hydrophobicity scale. However I'm not sure if those terms are used consistently. At least I've never found a look-up table for 3-letter-code to modified amino acid, etc. And of course this should be made clear somewhere, whichever way you go with.
Hi!!!
I wanted to know whether its possible to use the output PDB files from software like I-TASSER (or any other structural modeller) as an input for MASIF. My proteins of interest on PDB site have low resolution and the crystal structures are of partial sequences.
Hence, I want to create pdb files based on complete sequqence s and then use MASIF to predict the PPI interface. Could you please help me out with this ? Thanks in advance for any help.
Regards,
Anupam
In my project, i want the features scrached from the surface of protein . Therefore, i wonder how can i get the features list
I have two proteins and would like to get the "docked" coordinates from masif_ppi. It looks like source/masif_ppi_search/second_stage_alignment.py perfoms an alignment of points, but I was wondering if there's another function or script that can apply the best N transformations to the input ligand in order to generate a set of N docked poses.
Hi!
In MaSIF search protocol I'm trying to run second stage alignment by using the pretrained model, however in the directory /masif/data/masif_pdl1_benchmark/models/nn_score there are only .index and .data files. Is it possible to provide the trained_model.hdf5 file that is being called by score_nn.py script?
Thank you so much,
Goran
Hi,
Thank you for this great repository, it has been very useful so far. Working with it, I noticed an issue as highlighted below in the following function:
The electrostatics are upper-bounded to 3, but not lower-bounded to -3. I don't think this was intentional.
When I run ./data_prepare_one.sh 1MBN_A_
under data/masif_site
, it failed to download and showed
WARNING: The default download format has changed from PDB to PDBx/mmCif
Sometimes I've found that when looking for hydrogen bond acceptors, the code will break if the acceptor atom is there but coordinates are missing for its bonded, neighbouring atom. I.e. in "triangulation.computeCharges" line 82, res[acceptorAngleAtom[atom_name]].get_coord() will throw a KeyError exception. I found this on PDB entries 2avn and 1vdn.
My suggested fix was already implemented for acceptorPlaneAtom in the same function:
try:
a = res[acceptor_atom.get_coord()
except KeyError:
return 0.0
Hi
I have install StrBioInfo 0.9a0.dev1.
However, when I execute line "struct_assembly = struct.apply_biomolecule_matrices()[0]" in "00b-generate_assembly.py ",it show error
AttributeError: 'PDBFrame' object has no attribute 'apply_biomolecule_matrices'
will this package gets update?
MSMS, for example, is leaving huge files in the tmp directory. These should be removed.
2022-07-18 09:09:30.240557: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
['/masif/source//masif_ppi_search/second_stage_alignment_nn.py', '../../../data/masif_ppi_search', '100', '2000', '1000', 'masif']
Loading patch coordinates for 2I32_A_E
Loading patch coordinates for 2P47_A_B
Loading patch coordinates for 1BRS_A_D
Loading patch coordinates for 3P8B_C_D
Loading patch coordinates for 2Y32_B_D
Loading patch coordinates for 3M85_B_E
Loading patch coordinates for 2J12_A_B
Loading patch coordinates for 2ZXW_O_U
Loading patch coordinates for 3TND_B_D
Loading patch coordinates for 3F74_A_B
Loading patch coordinates for 1JZO_A_B
Loading patch coordinates for 2P45_A_B
Loading patch coordinates for 3IBM_A_B
Loading patch coordinates for 3CDW_A_H
Loading patch coordinates for 3S8V_A_X
Loading patch coordinates for 4KGG_C_A
Loading patch coordinates for 1TQ9_A_B
Loading patch coordinates for 1NPO_A_C
Loading patch coordinates for 3OGF_A_B
Loading patch coordinates for 1Z0K_A_C
Loading patch coordinates for 3Q9U_A_C
Loading patch coordinates for 1XUA_A_B
Loading patch coordinates for 3AXY_B_D
Loading patch coordinates for 2QLC_C_B
Loading patch coordinates for 3QWQ_A_B
Loading patch coordinates for 2O8Q_A_B
Loading patch coordinates for 2JI1_C_D
Loading patch coordinates for 1HBT_I_H
Loading patch coordinates for 3QWN_I_J
Loading patch coordinates for 2Z0P_C_D
Loading patch coordinates for 1XDT_T_R
Loading patch coordinates for 1ID5_H_L
Loading patch coordinates for 1PXV_A_C
Loading patch coordinates for 1I4O_B_D
Loading patch coordinates for 2Z7F_E_I
Loading patch coordinates for 2FE8_A_C
Loading patch coordinates for 1LQM_E_F
Loading patch coordinates for 2Z29_A_B
Loading patch coordinates for 3P71_C_T
Loading patch coordinates for 4CJ0_A_B
Loading patch coordinates for 4TQ1_A_B
Loading patch coordinates for 2WQ4_A_C
Loading patch coordinates for 2LBU_E_D
Loading patch coordinates for 3S9C_A_B
Loading patch coordinates for 1AVX_A_B
Loading patch coordinates for 1A2K_C_AB
Loading patch coordinates for 2WAM_A_C
Loading patch coordinates for 3SGB_E_I
Loading patch coordinates for 3B5U_J_L
Loading patch coordinates for 1YLQ_A_B
Loading patch coordinates for 1YY9_A_D
Loading patch coordinates for 2B3Z_C_D
Loading patch coordinates for 3HN6_B_D
Loading patch coordinates for 1T0F_A_B
Loading patch coordinates for 3PGA_1_4
Loading patch coordinates for 2AQX_A_B
Loading patch coordinates for 1SOT_A_C
Loading patch coordinates for 1SHY_A_B
Loading patch coordinates for 3EYD_C_D
Loading patch coordinates for 1UUG_A_B
Loading patch coordinates for 3KZH_A_B
Loading patch coordinates for 2HEK_A_B
Loading patch coordinates for 4YDJ_HL_G
Loading patch coordinates for 3HCG_A_C
Loading patch coordinates for 3K3C_A_B
Loading patch coordinates for 1JKG_A_B
Loading patch coordinates for 5GPG_A_B
Loading patch coordinates for 4AG2_A_C
Loading patch coordinates for 3SLH_A_B
Loading patch coordinates for 3ISM_A_B
Loading patch coordinates for 3KMT_A_B
Loading patch coordinates for 1XPJ_A_D
Loading patch coordinates for 1UGH_E_I
Loading patch coordinates for 1I07_A_B
Loading patch coordinates for 3CEW_C_D
Loading patch coordinates for 2HDP_A_B
Loading patch coordinates for 2G2W_A_B
Loading patch coordinates for 3WN7_A_B
Loading patch coordinates for 3Q0Y_C_B
Loading patch coordinates for 3CG8_C_B
Loading patch coordinates for 1Q5H_A_B
Loading patch coordinates for 2B42_B_A
Loading patch coordinates for 2YZJ_A_C
Loading patch coordinates for 3ECY_A_B
Loading patch coordinates for 3HRD_E_H
Loading patch coordinates for 1ZR0_A_B
Loading patch coordinates for 3E2U_A_E
Loading patch coordinates for 1ERN_A_B
Loading patch coordinates for 1O9Y_A_D
Loading patch coordinates for 3RDZ_A_C
Loading patch coordinates for 1ZVN_A_B
Loading patch coordinates for 3CHW_A_P
Loading patch coordinates for 4M5F_A_B
Loading patch coordinates for 3Q87_A_B
Loading patch coordinates for 3BTV_A_B
Loading patch coordinates for 3FJS_C_D
Loading patch coordinates for 2GKW_A_B
Loading patch coordinates for 2GD4_C_B
Loading patch coordinates for 2A2L_C_B
Loading patch coordinates for 1XT9_A_B
Docking all binders on target: 2I32_A_E
Traceback (most recent call last):
File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 198, in <module>
target_coord = subsample_patch_coords(target_pdb, "p1", precomp_dir_9A, center_point)
File "/masif/source/masif_ppi_search/alignment_utils_masif_search.py", line 305, in subsample_patch_coords
for iii, v in enumerate(cv):
TypeError: 'numpy.int64' object is not iterable
what is cv and center_point role?
if subsample_patch_coords function is well functioning, center_point shape should be one-dimension list. but, center_point is int64.
I don't know how to run fastest and easiest way.
by the way, docker image belongs too better version open3D(0.9). fix your code, please.
Bio.PDB.PDBList()
disallows the downloading of structures >62 chains or >99999 ATOM lines using the 'pdb'
(.ent) format. Attempting this gives a "Desired structure doesn't exist" error.
There are a couple of other file_format
options for which this is allowed, but it's not completely clear how to utilize these formats in the downstream data_preparation steps. It would be very helpful to be able to use one of these other formats in the MaSIF pipeline.
Hello,
Thanks for your work on MaSIF and for making the source code available! I'm running MaSIF in the supplied docker container, with some tweaks (for example, I changed the scripts a little to output the high-scoring docked structures for manual review).
I'm running on a very small list of proteins of interest to me. I first ran the proteins through the default pipeline where it downloads the complexes directly from the PDB to recompute the benchmark, followed by the second stage alignment. This was fairly successful.
However, in the next stage, I decided to try something more like a "real world" experiment where I would not necessarily have a native structure; instead, I would have two chains that I suspect interact but don't know how. So I used the same pairs from above and did some standard light protein prep (no backbone changes) and rotation to emulate the perhaps-unknown position of the two proteins to one another. There was a marked decrease in performance, and upon further trials it became clear that the protein preparation was not an issue but that rotation of the chains contributed to a huge decrease in performance, and even an inability to find anything close to a native structure despite previously finding near-native structures with ease.
Based on what I understand from the paper and the code, the interface descriptors and alignment phase should be agnostic to initial rotation of the chain. Is there something I'm missing here?
Thanks!
hi, there.
I have some questions about the MaSIF-search second-stage, I can't figure out where is the code to generates the complexes like the paper said.It would be great if anyone can give me a clue.
Goes straight from 1 to 4.
Hi,
Thanks for your great work. I was trying to install dependencies. But I am not sure how to install reduce(3.23) on the Linux system.
Could you show me how?
The repo in git clone https://github.com/Electrostatics/apbs-pdb2pqr is archived and not available anymore so the dockerfile doesn't work. Would it be possible to obtain an updated dockerfile?
In the Masif search process after running the descriptors, I am. Unable to run the second_alignment_nn. Py successfully the pretrained model doesn't load. Any hint can u provide the correct pretrained.hdf5 file. Thanks. Hoping for a positive response
Following the docker tutorial of MaSIF-site, I can generate an "ID_chain.ply" file. But how can I get per-residue predictions?
Dear Masif-Team,
thanks for publishing MaSIF as open-source :-)
I find the PyMol plugin very useful, and I would like to use it also for general visualization tasks. However, I found that the current version of Simple_mesh is loading very slowly when the number of vertices is high. I managed to fix it (at least for my purposes) by changing the following:
for jj in range(len(self.attributes["vertex_x"])):
self.vertices = np.vstack(
[
self.attributes["vertex_x"],
self.attributes["vertex_y"],
self.attributes["vertex_z"],
]
).T
to
self.vertices = np.vstack(
[
self.attributes["vertex_x"],
self.attributes["vertex_y"],
self.attributes["vertex_z"],
]
).T
This reduces the loading time from >5 minutes (I killed PyMol at some point) to about 1 sec, for a surface with about 100000 vertices. I think this is safe to change because the jj variable is not used anywhere in the loop (maybe one could also do if len(...):
?). I don't need any help with this, I just thought I'd report it in case you want to fix it on GitHub.
Best regards,
Franz Waibl
Hi there,
Have you, are your planning to, release the source code from the recent biorxiv paper?
Thanks
I tried to install the provided plugin into the software (PyMOL). But I got the error message as following.
It says that the plugin is installed but the initialization fails. The above image is taken under a Windows machine.
And the same situation happens on my Ubuntu machine.
Does this mean that the plugin you provided only works with the MAC machine?
The pdl1_benchmark_nn.py produces several .pdb files and .vert files. Just wondering use these files, say to visualize the docking site in PyMol? Thank you.
Hi,
If masif could take local pdb files(not deposited in PDB database) as inputs? May by modifying the scripts in data preparation?
Thanks in advance if anyone could give some suggestions!
Hi,
I was using "input_output.extractPDB" to extract PDB chains for my work and two problems popped up. Essentially the code can end up missing a few residues depending on the following:
1:
class NotDisordered(Select):
def accept_atom(self, atom):
return not atom.is_disordered() or atom.get_altloc() == "A"
According to the comment on this class, it is supposed to exclude disordered atoms. However, this class actually is used to save disordered atoms. It appears to be a result of an error in the validation of the X-ray-derived structure (link), whereby two different configurations are given for a set of atoms. What this code is supposed to do is to choose one out of those two configurations ().
Typically the two configurations are labelled 'A' and 'B', however I have noticed that they can be labelled as '1' or '2'. When they are labelled '1' or '2', the class returns False -- the disordered residues are ignored. For my own work, I have used the following function to ensure that the residues are not ignored, even if they are labelled '1' or '2'.
class NotDisordered(Select):
def accept_atom(self, atom):
return not atom.is_disordered() or atom.get_altloc() == "A" or atom.get_altloc() == "1"
2:
Modified / non-canonical amino acids are designated HETATM in the PDB, even though they are clearly amino acids. I added a function which reads the names of modified / non-canonical amino acids from the PDB SEQRES section, and notes the non-standard codes.
from Bio.SeqUtils import IUPACData
PROTEIN_LETTERS = [x.upper() for x in IUPACData.protein_letters_3to1.keys()]
def find_modified_amino_acids(path):
res_set = set()
for line in open(path, 'r'):
if line[:6] == 'SEQRES':
for res in line.split()[4:]:
res_set.add(res)
for res in list(res_set):
if res in PROTEIN_LETTERS:
res_set.remove(res)
return res_set
def extractPDB(
infilename, outfilename, chain_ids=None, includeWaters=False, invert=False
):
# extract the chain_ids from infilename and save in outfilename.
# includeWaters: deprecated parameter, include the crystallographic waters (should not be used).
# invert: Select all chains EXCEPT those in chain_ids.
parser = PDBParser(QUIET=True)
struct = parser.get_structure(infilename, infilename)
model = Selection.unfold_entities(struct, "M")[0]
chains = Selection.unfold_entities(struct, "C")
# Select residues to extract and build new structure
structBuild = StructureBuilder.StructureBuilder()
structBuild.init_structure("output")
structBuild.init_seg(" ")
structBuild.init_model(0)
outputStruct = structBuild.get_structure()
# Load a list of non-standard amino acid names -- these are
# typically listed under HETATM, so they would be typically
# ignored by the orginal algorithm
modified_amino_acids = find_modified_amino_acids(infilename)
for chain in model:
if (
chain_ids == None
or (chain.get_id() in chain_ids and not invert)
or invert == True
):
structBuild.init_chain(chain.get_id())
for residue in chain:
het = residue.get_id()
if not invert:
if het[0] == " " or (het[0] == "W" and includeWaters):
outputStruct[0][chain.get_id()].add(residue)
elif het[0][-3:] in modified_amino_acids:
print(het[0])
outputStruct[0][chain.get_id()].add(residue)
else:
if (het[0] == "W" and includeWaters) or (
chain.get_id() not in chain_ids
):
outputStruct[0][chain.get_id()].add(residue)
elif het[0][-3:] in modified_amino_acids:
outputStruct[0][chain.get_id()].add(residue)
63,0-1
# Output the selected residues
pdbio = PDBIO()
pdbio.set_structure(outputStruct)
pdbio.save(outfilename, select=NotDisordered())
Traceback
Traceback (most recent call last):
File "/home/masif/source//data_preparation/04-masif_precompute.py", line 67, in <module>
read_data_from_matfile_full_protein(coord_file, shape_file, ppi_pair_id, params, pid, label_iface=True)
File "/home/masif/source/masif_modules/read_data_from_matfile.py", line 324, in read_data_from_matfile_full_protein
assert len(np.unique(rows)) == len(p_s[protein_id]["X"][0])
AssertionError
length left 3273, right 3276
Any clue for this error? Thanks!
Hi there - I always get very different answers when doing the pdl1 benchmark with the neural network and I am not sure why.
What is the source of variation, shouldn't the neural network weights for pdl1_benchmark_nn.py be preloaded?
Otherwise a completely random model would be initialized with nn_model = ScoreNN(), which doesn't seem right. Thanks!
Hi there,
I am trying to see if I can train the ppi-search model and use it for new predictions. However, when I was following the instruction and run ./second_stage_masif.sh 100 there is en error message from python 3.
File "/masif/source//masif_ppi_search/second_stage_alignment_nn.py", line 114, in
read_point_cloud(
NameError: name 'read_point_cloud' is not defined
Would you mind letting me know how to fix it?
Also, would you mind kindly describing how to prepare my own data ( I have some docking models from Rosetta) as I would like to use them for the ppi_search.
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.