twopin / camp Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 29.0 175.78 MB

predicting peptide-protein interactions

Python 48.50% Makefile 0.58% C 33.74% C++ 11.48% Shell 0.23% Java 5.48%

camp's People

Contributors

Stargazers

camp's Issues

step2_pepBDB_pep_bindingsites.py 'df_part_all' is not defined

step2_pepBDB_pep_bindingsites.py fails at line 42 because the variable 'df_part_all' is not defined. What is this variable supposed to be?

About DrugBank Data

Hi, twopin. I wonder if you use drugbank dataset for training and what data pre-processing did you use ?

Hi,
According to Data curation, I need to format the peptide-protein data like "protein sequence, peptide sequence, protein_ss, peptide_ss", but in fact preprocess_features.py needs me to provide the Protein_pssm_dict, Protein_Intrinsic_dict and Peptide_Intrinsic_dict_v3 files, if I understand correctly.
I can get pssm_dict according to step3_generate_features.py, but what are the next two files?
Thanks,
LeeLee

Confusion about PLIP analysis

Hi, twopin
I have some confusion about PLIP analysis. First of all, you did not provide Data_construction.txt. I'm not sure what the result.txt file is. Is it the XXX_protonated.pdb file?
Then I need to provide analysis files of different chains of pdb_id, such as 6dub_e_result.txt and 6dub_f_result.txt, but I found that I cannot specify a specific chain when downloading the PDB file. Is there a problem with my download method or the analysis method? It would be very helpful if you can provide the code of PLIP if it is convenient.
Thank,
LeeLee

How to train my own models

Hi, I want to reimplement the project in Pytorch, but I can not find any structure information of CAMP.h5, so where can I find the information of hyperparameter of the model, like the kernel size of the convolution layers, the structure of the feature extractor encoder, etc... Thanks a lot.

predict with example code use the test data

Hi, I predict with example code and use the test data with the command 'python -u predict_CAMP.py'.
However, the following error is indicated.
‘File "predict_CAMP.py", line 178, in
X_pep, X_p, X_SS_pep, X_SS_p, X_2_pep, X_2_p, X_dense_pep, X_dense_p, pep_sequence, prot_sequence = load_example(model_mode)
File "predict_CAMP.py", line 82, in load_example
protein_feature_dict = pickle.load(f)
TypeError: a bytes-like object is required, not 'str' ’

Am I using the features in ‘example_data_feature’ incorrectly, or is there a problem with the file format somewhere? I look forward to and appreciate your answer.

Could you check PLIPv2.2.2 output vs v1.4.2? current codes only for PLIPv1.4.2?

Dear twopin,

It is my honor to read your paper and codes, excellent idea to merge so many data and excellent model design. I try to reproduce your work.
In data_prepare/step1_pdb_process.py, from your example file, PLIPv1.4.2 is used, current version is v2.2.2. I compare the same PDB 1A0M, the v2.2.2 output is quite different with v1.4.2. Could you update you code with PLIP v2.2.2, or could you give us instructions on how to deal with PLIPv2.2.2? thanks!!
From your codes, with PLIPv1.4.2, 1 PDB code may output multiple result files: './peptide_result/' + pdb_id + '_' + chain + '_result.txt', that's one chain one result file, yes?
With PLIPv2.2.2, 1 PDB code ouput only 1 result file with possible multiple Interacting chain.
1A0M-v1.4.2.txt
1A0M-v2.2.2.txt

Confusion about psi-blast

Hello, twopin, I wanna know what the library file is when I use psi-blast ? I didn't find the answer according to the paper or python scripts. Could you give me some clues about it? Thank you !

interpretion of results

Hello @twopin,

Is it possible to have a clear protocol on how to interpret the results? which files should we look at? How to open the files and the significance of each value?
And once this is done, how do we generate a file to see the interactions between the proteins and peptides?

Thank you.

docker or environment.yml

is there a docker container or environment.yml to help folks get this running on a local system or cluster? I am having trouble getting the program installed due to package dependencies.

error in preprocessing features

Hello,

I am trying to generate the features for a set of peptides and proteins using the preprocess_features.py.
The problem is I cannot manage to create those two files: protein_ss_feature_dict and peptide_ss_feature_dict
I also tried with the example_data.tsv file and I still have the same issue.
This is the error I get whenever I run the script:
Traceback (most recent call last): File "preprocess_features.py", line 179, in <module> feature = label_seq_ss(pep_ss, pad_pep_len, seq_ss_set) File "preprocess_features.py", line 66, in label_seq_ss X[i] = res_ind[res] TypeError: 'set' object is not subscriptable

I do not know if there is an error in this step:
def label_seq_ss(line, pad_prot_len, res_ind): line = line.strip().split(',') X = np.zeros(pad_prot_len) for i ,res in enumerate(line[:pad_prot_len]): X[i] = res_ind[res] return X

or if the seq_set_set is not defined properly like the other sets?

Can you please help resolve this problem?

Thank you,

Mariem

peptide_dense_feature_dict generated by step3 has only two features per AA whereas given example data has three features per AA

After running iupred2a on my peptides and proteins and passing the results into step3, I have a peptide feature file with shape (None, 50, 2) instead of (None, 50, 3). These two features are the long and short predictions from iupred2a. How can I generate the third feature that is present in the given example data?

Self_Attention passes Reshape3.0 to Theano, which requires a 1-D or 2-D input

Anyone can help with this? Thank you!

x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Software versions:
Python 2.7.18 |Anaconda, Inc.| (default, Jun 4 2021, 14:47:46)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

print (keras.version,tensorflow.version,rdkit.version,sklearn.version)
('2.0.8', '1.2.1', '2018.09.3', '0.20.3')

$ python -u predict_CAMP.py
Using Theano backend.
2021-09-29 14:25:44.007318: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2021-09-29 14:25:44.007379: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2021-09-29 14:25:44.007404: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2021-09-29 14:25:44.007429: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2021-09-29 14:25:44.007452: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
('Start loading model :', './model/CAMP.h5')
Traceback (most recent call last):
File "predict_CAMP.py", line 175, in
model = load_model(model_name,custom_objects={'Self_Attention': Self_Attention})
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/models.py", line 239, in load_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/models.py", line 313, in model_from_config
return layer_module.deserialize(config, custom_objects=custom_objects)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/layers/init.py", line 54, in deserialize
printable_module_name='layer')
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
list(custom_objects.items())))
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/engine/topology.py", line 2497, in from_config
process_node(layer, node_data)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/engine/topology.py", line 2454, in process_node
layer(input_tensors[0], **kwargs)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/engine/topology.py", line 602, in call
output = self.call(inputs, **kwargs)
File "/home/data/TData/software/Denodo_CAMP/twopin-CAMP-0708396/Self_Attention.py", line 29, in call
QK = K.softmax(QK)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 1529, in softmax
return T.nnet.softmax(x)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 815, in softmax
return softmax_op(c)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/theano/gof/op.py", line 615, in call
node = self.make_node(*inputs, **kwargs)
File "/home/data/anaconda3/envs/CAMPNoGPU/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 430, in make_node
x.type)
ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Help wanted

Hello All,
I read your paper, and I enjoy reading this paper. I am fairly new in coding if you could provide the model src instead/along with CAMP.h5 that will be great. Thank you so much

About blast database

Hello, I have a question to ask. Regarding the psiblast database, I chose the nr and swissprot databases, but neither can reproduce your results. Which database did you choose?

Error in STEP 1

Hi，
Thank you for providing excellent tools. In the process of using the code to process data, I encountered some troubles. The following is my error:

python ./CAMP-master/data_prepare/step1_pdb_process.py 
Traceback (most recent call last):
  File "./CAMP-master/data_prepare/step1_pdb_process.py", line 40, in <module>
    PDB_chain_lst = [x.split('_')[1].split(' ')[0].lower() for x in raw_list]
IndexError: list index out of range

I am a python rookie, and I guess it is something that happened when processing the pdb_seqres_test.txt data, but the data is downloaded from ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt.gz, how can I solve this problem?
Thanks,
LeeLee

Problems in the STEP 2: Generate labels of peptide binding residues

Anyone can help with the step2?

How to generate the file: './pdb/peptide-mapping.txt'?
I can not find the "query_mapping.py" and "target_mapping.py ".

"#python query_mapping.py #to get peptide sequence vectors (the output is "peptide-mapping.txt ")"
"#python target_mapping.py #to get target sequence vector"

Thank you!

Where is the directory './peptide_results'?

Hi, when I try to run step1_pdb_process, I can't find the directory './peptide_results'. Could you tell me how to find it?

demo data: example output files

We have worked to get CAMP running on our local cluster, and I have successfully run predict_CAMP.py on the example_data.csv. This produced 3 npy files and I could not find documentation explaining the output and interpretation of results. @twopin-can you push the example data output files to the repo and provide more documentation on how to interpret the CAMP output?

how to generate feature for model by provide protein and peptide sequenc❓

Data_construction.txt for data prepare

hi, a question that where is Data_construction.txt for preparing data in step1? I couldn't find the file.

step1_pdb_process

Anyone can help with the step1_pdb_process.py -> step 2. Get fasta sequence of the predicted interacting chains

df_predict_det1 =df_predict_det1.drop(['PDB_id_x','chain_x','PDB_id_y','chain_y'],axis =1)

What are the 'PDB_id_x', 'chain_x'?

df_predict_det1.columns = ['pdb_id','pep_chain','predicted_chain','pep_seq','prot_seq']
2. I think that 'pep_seq' comes from 'PDB_seq' of pdbid_all_fasta, but how to get 'prot_seq'?

How to download the file uniprot2seq from UniProt Website

Hi, in the function load_uni_seq in data_prepare/step1_pdb_process.py, it requires an input called uniprot2seq_file, uniprot2seq from UniProt Website (a tab separated file with fields including Uniprot_id,Uniprot Sequence,Protein_name,Protein_families), how to download this file from UniProt Website and can you provide a download link for this file. Thanks.

Running CAMP on own data

Hi,

I want to try CAMP to predict my own dataset. I understand that the raw sequences are not enough, and some third-party tools are required that I need to run online manually.

Could you provide a step-by-step instruction how I need to format the peptide-protein pairs, what input files I need to submit to the online services and what output to download, and what scripts I then need to run?

I understand that technically I am supposed to run step3_generate_features.py, but as all paths are hardcoded it is a bit hard to figure out how to get there starting completely from the beginning.

Thanks for your help!

ImportError: No module named camp_utils

how can I install camp_utils for python?

SSpro claculation for peptide

hello,
the SSpro program in the SCRATCH-1D suite was needed the protein sequence is at least 30 amino acids in length, but many peptides in the provied sample data are less than that number in length, i wonder konw how to deal with those peptides using SSpro program.
thx!

Hi yipin, about re-implement your project

Dear Yipin, I wish to re-implement CAMP, but the code you provided here is not complete. If possible, could you please upload the full version of the code or detailed instructions for your implementation? The Nat. Comm. paper you published is very nice, but you need to make it replicable to promote your achievement. Look forward to your reply.

If the process is too complex, I can totally understand it, and I am willing to pay for your effort, please send me your Wechat ID or email address.

dataset

Hello. I produce the train data from RCSB PDB recently. And I get 7604 samples(6701 proteins, 6341 peptides) and get a low performance on AUROC and AUPRC. It is inconsistent with the data in your paper, can you please share the data in RCSB?

Questions about creating "query_peptide.fasta" and "target_peptide.fasta"

Hello!

I am working on running step2_pepbdb_pep_bindingsites.py but I am having a few problems with the inputs.
I have successfully run crawl.py and have crawl_results.csv. However, I am confused about how to generate query_peptide.fasta and target_peptide.fasta. I understand that odd numbered lines should have information about the peptide and even numbered lines should have the sequence. What information should be included for each peptide?

Also, am I correct in saying that crawl.py should be run first, followed by query_mapping.py, then pyssw.py, then target_mapping.py?

Finally, is this the correct link to download pepbdb database: http://huanglab.phys.hust.edu.cn/pepbdb/db/download/

Thank you for your help!

twopin / camp Goto Github PK

camp's People

Contributors

Stargazers

Forkers

camp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs