flatironinstitute / deepfri Goto Github PK

Deep functional residue identification

License: BSD 3-Clause "New" or "Revised" License

Python 96.52% Shell 3.48%

deep-learning machine-learning gene-ontology graph-convolutional-networks protein-data-bank class-activation-maps tensorflow

deepfri's Introduction

DeepFRI

Deep functional residue identification

Citing

@article {Gligorijevic2019,
	author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Kosciolek, Tomasz and Leman,
	Julia Koehler and Cho, Kyunghyun and Vatanen, Tommi and Berenberg, Daniel
	and Taylor, Bryn and Fisk, Ian M. and Xavier, Ramnik J. and Knight, Rob and Bonneau, Richard},
	title = {Structure-Based Function Prediction using Graph Convolutional Networks},
	year = {2019},
	doi = {10.1101/786236},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2019/10/04/786236},
	journal = {bioRxiv}
}

Dependencies

DeepFRI is tested to work under Python 3.7.

The required dependencies for DeepFRI are TensorFlow, Biopython and scikit-learn. To install all dependencies run:

pip install .

Protein function prediction

To predict protein functions use predict.py script with the following options:

seq str, Protein sequence as a string
cmap str, Name of a file storing a protein contact map and sequence in *.npz file format (with the following numpy array variables: C_alpha, seqres. See examples/pdb_cmaps/)
pdb str, Name of a PDB file (cleaned)
pdb_dir str, Directory with cleaned PDB files (see examples/pdb_files/)
cmap_csv str, Filename of the catalogue (in *.csv file format) containg mapping between protein names and directory with *.npz files (see examples/catalogue_pdb_chains.csv)
fasta_fn str, Fasta filename (see examples/pdb_chains.fasta)
model_config str, JSON file with model filenames (see trained_models/)
ont str, Ontology (mf - Molecular Function, bp - Biological Process, cc - Cellular Component, ec - Enzyme Commission)
output_fn_prefix str, Output filename (sampe prefix for predictions/saliency will be used)
verbose bool, Whether or not to print function prediction results
saliency bool, Whether or not to compute class activaton maps (outputs a *.json file)

Generated files (see examples/outputs/):

output_fn_prefix_MF_predictions.csv Predictions in the *.csv file format with columns: Protein, GO-term/EC-number, Score, GO-term/EC-number name
output_fn_prefix_MF_pred_scores.json Predictions in the *.json file with keys: pdb_chains, Y_hat, goterms, gonames
output_fn_prefix_MF_saliency_maps.json JSON file storing a dictionary of saliency maps for each predicted function of every protein

DeepFRI offers 6 possible options for predicting functions. See examples below.

Option 1: predicting functions of a protein from its contact map

Example: predicting MF-GO terms for Parvalbumin alpha protein using its sequence and contact map (PDB: 1S3P):

>> python predict.py --cmap ./examples/pdb_cmaps/1S3P-A.npz -ont mf --verbose

Output:

Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.99824 calcium ion binding

Option 2: predicting functions of a protein from its sequence

Example: predicting MF-GO terms for Parvalbumin alpha protein using its sequence (PDB: 1S3P):

>> python predict.py --seq 'SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKDGFIDEDELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES' -ont mf --verbose

Output:

Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.99769 calcium ion binding

Option 3: predicting functions of proteins from a fasta file

>> python predict.py --fasta_fn examples/pdb_chains.fasta -ont mf -v

Output:

Protein GO-term/EC-number Score GO-term/EC-number name
1S3P-A GO:0005509 0.99769 calcium ion binding
2J9H-A GO:0004364 0.46937 glutathione transferase activity
2J9H-A GO:0016765 0.19910 transferase activity, transferring alkyl or aryl
(other than methyl) groups
2J9H-A GO:0097367 0.10537 carbohydrate derivative binding
2PE5-B GO:0003677 0.53502 DNA binding
2W83-E GO:0032550 0.99260 purine ribonucleoside binding
2W83-E GO:0001883 0.99242 purine nucleoside binding
2W83-E GO:0005525 0.99231 GTP binding
2W83-E GO:0019001 0.99222 guanyl nucleotide binding
2W83-E GO:0032561 0.99194 guanyl ribonucleotide binding
2W83-E GO:0032549 0.99149 ribonucleoside binding
2W83-E GO:0001882 0.99135 nucleoside binding
2W83-E GO:0017076 0.98687 purine nucleotide binding
2W83-E GO:0032555 0.98641 purine ribonucleotide binding
2W83-E GO:0035639 0.98611 purine ribonucleoside triphosphate binding
2W83-E GO:0032553 0.98573 ribonucleotide binding
2W83-E GO:0097367 0.98168 carbohydrate derivative binding
2W83-E GO:0003924 0.52355 GTPase activity
2W83-E GO:0016817 0.36863 hydrolase activity, acting on acid anhydrides
2W83-E GO:0016818 0.36683 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides
2W83-E GO:0017111 0.35465 nucleoside-triphosphatase activity
2W83-E GO:0016462 0.35303 pyrophosphatase activity

Option 4: predicting functions of proteins from contact map catalogue

>> python predict.py --cmap_csv examples/catalogue_pdb_chains.csv -ont mf -v

Output:

Protein GO-term/EC-number Score GO-term/EC-number name
1S3P-A GO:0005509 0.99824 calcium ion binding
2J9H-A GO:0004364 0.84826 glutathione transferase activity
2J9H-A GO:0016765 0.82014 transferase activity, transferring alkyl or aryl
(other than methyl) groups
2PE5-B GO:0003677 0.89086 DNA binding
2PE5-B GO:0017111 0.12892 nucleoside-triphosphatase activity
2PE5-B GO:0004386 0.12847 helicase activity
2PE5-B GO:0032553 0.12091 ribonucleotide binding
2PE5-B GO:0097367 0.11961 carbohydrate derivative binding
2PE5-B GO:0016887 0.11331 ATPase activity
2W83-E GO:0097367 0.97069 carbohydrate derivative binding
2W83-E GO:0019001 0.96842 guanyl nucleotide binding
2W83-E GO:0017076 0.96737 purine nucleotide binding
2W83-E GO:0001882 0.96473 nucleoside binding
2W83-E GO:0035639 0.96439 purine ribonucleoside triphosphate binding
2W83-E GO:0032555 0.96294 purine ribonucleotide binding
2W83-E GO:0016818 0.96181 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides
2W83-E GO:0032550 0.96142 purine ribonucleoside binding
2W83-E GO:0016817 0.96082 hydrolase activity, acting on acid anhydrides
2W83-E GO:0016462 0.95998 pyrophosphatase activity
2W83-E GO:0032553 0.95935 ribonucleotide binding
2W83-E GO:0032561 0.95930 guanyl ribonucleotide binding
2W83-E GO:0032549 0.95877 ribonucleoside binding
2W83-E GO:0003924 0.95453 GTPase activity
2W83-E GO:0001883 0.95271 purine nucleoside binding
2W83-E GO:0005525 0.94635 GTP binding
2W83-E GO:0017111 0.93942 nucleoside-triphosphatase activity
2W83-E GO:0044877 0.64519 protein-containing complex binding
2W83-E GO:0001664 0.31413 G protein-coupled receptor binding
2W83-E GO:0005102 0.20078 signaling receptor binding

Option 5: predicting functions of a protein from a PDB file

>> python predict.py -pdb ./examples/pdb_files/1S3P-A.pdb -ont mf -v

Output:

Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.99824 calcium ion binding

Option 6: predicting functions of a protein from a directory with PDB files

>> python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_backprop

Output:

See files in: examples/outputs/

Training DeepFRI

To train DeepFRI run the following command from the project directory:

>> python train_DeepFRI.py -h

or to launch jobs run the following script:

>> ./run_train_DeepFRI.sh

Output

Generated files:

model_name_prefix_ont_model.hdf5 trained model with architecture and weights saved in HDF5 format
model_name_prefix_ont_pred_scores.pckl pickle file with predicted GO term/EC number scores for test proteins
model_name_prefix_ont_model_params.json JSON file with metadata (GO terms/names, architecture params, etc.)

See examples of pre-trained models (*.hdf5) and model params (*.json) in: trained_models/.

Functional residue identification

To visualize class activation (saliency) maps use viz_gradCAM.py script with the following options:

saliency_fn str, JSON filename with saliency maps generated by predict.py script (see Option 6 above)
list_all bool, list all proteins and their predicted GO terms with corresponding class activation (saliency) maps
protein_id str, protein (PDB chain), saliency maps of which are to be visualized for each predicted function
go_id str, GO term, saliency maps of which are to be visualized
go_name str, GO name, saliency maps of which are to be visualized

Generated files:

saliency_fig_PDB-chain_GOterm.png class activation (saliency) map profile over sequence (see fig below, right)
pymol_viz.py pymol script for mapping salient residues onto 3D structure (pymol output is shown in fig below, left)

Example:

>>> python viz_gradCAM.py -i ./examples/outputs/DeepFRI_MF_saliency_maps.json -p 1S3P-A -go GO:0005509

Output:

Data

Data (train and validation) used for training DeepFRI model are provided as TensorFlow-specific TFRecord files and they can be downloaded from:

PDB	SWISS-MODEL
Gene Ontology(19GB)	Gene Ontology(165GB)
Enzyme Commission(13GB)	Enzyme Commission(117GB)

Pretrained models

Pretrained models can be downloaded from:

Models (use these models if you run DeepFRI on GPU)
Newest Models (use these models if you run DeepFRI on CPU)

Uncompress tar.gz file into the DeepFRI directory (tar xvzf trained_models.tar.gz -C /path/to/DeepFRI).

deepfri's People

Contributors

Stargazers

Watchers

deepfri's Issues

Getting structural embeddings

Hello,

Is there a way to extract the structural embeddings from the model?

About the dataset

Thank you for your contributions in this field. But we can't get the dataset and trained models from the urls in readme file. Do you plan to re-open source the dataset in the repo?

Is there a way to download the file bc-95.out In script data_collection.sh?

the link 'https://cdn.rcsb.org/resources/sequence/clusters/bc-95.out' in data_collection.sh can't work ,can you give me some guidance?

DeepFRI project based on pytorch

Could you offer the DeepFRI project based on the pytorch, Thank you!

Saliency output format

I'm noticing that there are pickle files that are the output of deepfri.

There are a few immediate reasons why this may not be advantageous

Pickle files are specific for python - can't parse them in another language
There are security issues surrounding python
There are likely to be version issues that pop up due to the choice of python and numpy, where one user has numpy=1.10, but the pickle can only be read in 1.17 (these issues do actually happen).

Looking inside of the contents, it looks like this can be easily encoded as a json format, which is a more forgiving format.

Hello,
Thank you for this great work! I just wanted to ask, how to evaluate the performance of DeepFRI (e.g. - get F1max, AUPR etc) on my custom dataset ? I want to check how DeepFRI performs on my dataset. I tried using your pre-trained models. However with the JSON files, am facing some issues. If you could please recommend a way, that would be very helpful. Thanks a lot!

regarding using of online deepFRI data downloading

Hi,
I have working on DEEPFRI for first time. I am trying to download the data. Only .json file downloaded that also gives error, not able to open the structure in chimera. Also, I didn't get other csv files. May I know what will be the reason for that.

Analysis of PDB files

I was testing the use of DeepFri on PDB files -- so I downloaded the PDB file for 3LZB from the PDB API
I get the following error:
Traceback (most recent call last):
File "predict.py", line 41, in
predictor.predict(args.pdb_fn)
File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 107, in predict
A, S, seqres = self._load_cmap(test_prot, cmap_thresh=cmap_thresh)
File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 74, in _load_cmap
D, seq = load_predicted_PDB(filename)
File "/usr/local/DeepFRI/deepfrier/utils.py", line 25, in load_predicted_PDB
two = residues[y]["CA"].get_coord()
File "/usr/local/lib/python3.8/dist-packages/Bio/PDB/Entity.py", line 45, in getitem
return self.child_dict[id]
KeyError: 'CA'

I was wondering if that was related to having many chains in the file -- so I tried to split the files into chains using pdb-tools (pub_splitchain) and I got a similar error:
/usr/local/lib/python3.8/dist-packages/Bio/SeqIO/PdbIO.py:303: BiopythonParserWarning: 'HEADER' line not found; can
't determine PDB ID.
warnings.warn(
Traceback (most recent call last):
File "predict.py", line 41, in
predictor.predict(args.pdb_fn)
File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 107, in predict
A, S, seqres = self._load_cmap(test_prot, cmap_thresh=cmap_thresh)
File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 74, in _load_cmap
D, seq = load_predicted_PDB(filename)
File "/usr/local/DeepFRI/deepfrier/utils.py", line 25, in load_predicted_PDB
two = residues[y]["CA"].get_coord()
File "/usr/local/lib/python3.8/dist-packages/Bio/PDB/Entity.py", line 45, in getitem
return self.child_dict[id]
KeyError: 'CA'

I'm not quite sure if there is another preferred input file.

Error in saliency maps

Dear DeepFRI developers,
There seems to be an error in the saliency maps when the latest release and the downloadable models are used. For example in the case of 2PE5 in the Nat. Comms paper for the DNA-binding GO term (GO:0003677) the DNA-binding domain of the protein is activated (~1-50 residues, Figure 4C), but with the current version of the tool and models, residues that are not involved in DNA-binding are activated:

Best wishes,
George

-ont bp error on predict.py

Hi, I think there is a problem with the pre-trained file for the bp ontology, can you take a look? Thanks!

Here is the error I got:

OSError: SavedModel file does not exist at: ./trained_models/DeepFRI-MERGED_MultiGraphConv_3x512_fcd_1024_ca_10A_biological_process.hdf5/{saved_model.pbtxt|saved_model.pb}

Text version of SWISS-MODEL dataset

Hi, Can you release the text version of the SWISS-MODEL dataset like the PDB dataset included in the repository? Thanks!

The trained model can't be downloaded!

Dear author,
I find that the model can not be downloaded now. The error is like"503 Service Temporarily Unavailable".
How to fix it?

Thanks very much for your response.
Best

DeepGO baseline

For the baseline DeepGO, I would like to know how to deal with the GO data to fit hierarchically structured classification layers.

Cannot get your data

The link of the bc-95.out is not available,which is https://cdn.rcsb.org/resources/sequence/clusters/bc-95.out
Can you update it so that we can enjoy it again, thanks!

Questions about data

First of all, thank you for sharing such a good project.My question may be a little unprofessional.
1.In script data_collection.sh, I have no way to download the file bc-95.out.Whenever I visit this URL, a "Not Found" prompt appears.
After that, the file with the URL is https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-95.txt downloaded on the PDB.
Will the file I downloaded be the same as the one you gave? If it's different,Can you give me other ways to get the right file?
2.The number of mf/bp/cc in the file nrPDB-GO_2019.06.18_annot.tsv is not the same as the number in the jison file of the trained model you gave, is this the case?

Installation failure

Hi,
I have tried many ways to install deepFRI,all have failed.
First I created a conda environment to have python3.7. Then I installed everything with pip.
I get Illegal instruction when running python predict.py -h
and
python -m pip show tensorflow
WARNING: Package(s) not found: tensorflow

Then I installed tensorflow v2.9.3
and the error was :
python predict.py -h
The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. Aborted

I tried instead tensorflow v1.15.3 or v 2.5 and the error is back to
Illegal instruction

I instead tried making a python environment with python 3.9 and I get similar errors.
Can you please provide with all the packages in an environment that worked for you and in particular, an appropriate version of tensorflow ?

URLs for data and models are not available

Hello,

the URLs provided for data and pretrained models on github are not available. With all I have an error message "403 Forbidden". Would it be possible to update them again?

Following, the list of unavailable URLs:

Data

PDB Gene Ontology (19GB)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/PDB-GO.tar.gz

PDB Enzyme Commission (13GB)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/PDB-EC.tar.gz

SWISS-MODEL Gene Ontology (165GB)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/SWISS-MODEL-GO.tar.gz

PDB Enzyme Commission (117GB)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/SWISS-MODEL-EC.tar.gz

Pretrained models

Models (use these models if you run DeepFRI on GPU)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/trained_models.tar.gz

Newest Models (use these models if you run DeepFRI on CPU)
https://users.flatironinstitute.org/vgligorijevic/public_www/DeepFRI_data/newest_trained_models.tar.gz

Thank you very much!

Cheers,

Alex

BiopythonParserWarning: 'HEADER' line not found; can't determine PDB ID.

Hi DeepFRI!
Do you have idea how to troubleshoot this BiopythonParserWarning: 'HEADER' line not found error as follows?

(DeepFRI) johnnytam100@DESKTOP-BDBH5VJ:/mnt/e/test/test_DeepFRI/DeepFRI$ python predict.py --pdb_dir ./allergen_esmfold_domain_pdb -ont mf --saliency
2024-07-08 12:16:36.392951: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-08 12:16:37.742941: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-07-08 12:16:37.904566: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:968] could not open file to read NUMA node: /sys/bus/pci/devices/0000:04:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-07-08 12:16:37.906981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: NVIDIA GeForce RTX 3080 Ti computeCapability: 8.6
coreClock: 1.695GHz coreCount: 80 deviceMemorySize: 12.00GiB deviceMemoryBandwidth: 849.46GiB/s
2024-07-08 12:16:37.907029: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-08 12:16:37.940296: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2024-07-08 12:16:37.975221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2024-07-08 12:16:37.978857: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2024-07-08 12:16:38.033696: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2024-07-08 12:16:38.046140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2024-07-08 12:16:38.047257: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64:
2024-07-08 12:16:38.047293: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-07-08 12:16:38.047688: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-08 12:16:38.056416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3700035000 Hz
2024-07-08 12:16:38.058111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x481ed10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-07-08 12:16:38.058134: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-07-08 12:16:38.059240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-07-08 12:16:38.059264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
### Computing predictions from directory with PDB files...
/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/Bio/SeqIO/PdbIO.py:292: BiopythonParserWarning: 'HEADER' line not found; can't determine PDB ID.
  BiopythonParserWarning,
Traceback (most recent call last):
  File "predict.py", line 47, in <module>
    predictor.predict_from_PDB_dir(args.pdb_dir)
  File "/mnt/e/test/test_DeepFRI/DeepFRI/deepfrier/Predictor.py", line 144, in predict_from_PDB_dir
    y = self.model([A, S], training=False).numpy()[:, :, 0].reshape(-1)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 659, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 110, in call
    output, states = self._process_batch(inputs, initial_state)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 507, in _process_batch
    outputs, h, c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv2(**args)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1740, in cudnn_rnnv2
    ctx=_ctx)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1817, in cudnn_rnnv2_eager_fallback
    attrs=_attrs, ctx=ctx, name=name)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError: Could not find device for node: {{node CudnnRNNV2}} = CudnnRNNV2[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=0, seed2=0]
All kernels registered for op CudnnRNNV2:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]
 [Op:CudnnRNNV2]

PDB files prediction

Hello,
I have a problem about the prediction with some PDB files. You say python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_backprop. But in your pridict.py, the args don't have this arg "use_backpop". How can I deal with it !

Swiss-Model preprocessing script

Firstly thanks for such a well organised and easy to navigate repository. It's been really fun having a play around with it!

I'd like to have a go at training some other architectures on an updated version of the dataset since both PDB and Swiss-prot are updated all the time. I see you helpfully provided a data_collection.sh file. This file only seems to collect PDB files (e.g. pdb_chain_go.tsv.gz). Do you have similar download links for the Swiss-prot dataset you use?

No such file or directory: './trained_models/model_config.json'

Hi developers:

I have installed DeepFri according to the instructions. But i am not sure the tool is successfully installled, then i test the data in the directory of examples with the command python predict.py --fasta_fn examples/pdb_chains.fasta -ont mf -v, i get the following error
2023-11-10 18:53:00.603407: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 Traceback (most recent call last): File "predict.py", line 23, in <module> with open(args.model_config) as json_file: FileNotFoundError: [Errno 2] No such file or directory: './trained_models/model_config.json'
Indeed, i did not find the directory of trained_models, can you please help, thanks.

Questions about after input structure data

Hi there! i got some problems about after input multiple structures data into DeepFRI web, how can I collect these data one time by output .json or somewhat that can involve in the full datas

The download link for the dataset

The download link for the dataset is invalid. Could you please provide a new valid download link? Thank you.

'trained_models' folder not found

Recommended score values

Hi,

I trying to use DeepFRI on a butterfly proteome. I'm using the most recent model and I was wondering what would be the recommended threshold for the score.

CHeers
F

Need training dataset for CC functions (16020 available from 27000)

Hi,

Thanks a lot for this amazing work. I just wanted to ask, if the whole dataset for CC functions can be kindly provided. We have 29,902 data for MF and BP. However for CC (PDB-GO ), we could only find 16020 data points from this file (nrPDB-GO_2019.06.18_annot) provided in your pre-processing directory in GitHub.

It will be very helpful to get 29,902 ( i.e. data points) for CC functions as well. Thanks a lot.

map GO to other ontologies

GO is our gold standard (and for a good reason) but many people and application also provide different ontologies. it would be neat to have a mapping between GO and other ontologies as a post-processing step.

for example (from eggNOG):

trained_model

Hi, is it possible that the pretrained model from lm could be provided?

How are BioLip's activation points added to the chart?

Excuse me, but how are the activation sites of BioLip added to the graph? Where is the data where to download it?

SavedModel file does not exist

Hi,

I was trying to test your software.

I tried your command line for fasta file:

python predict.py --fasta_fn examples/pdb_chains.fasta -ont mf -v

But I got this?

2023-09-29 12:57:17.302713: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "predict.py", line 40, in <module>
    predictor = Predictor(models[ont], gcn=gcn)
  File "/user/work/tk19812/software/DeepFRI-1.0.0/deepfrier/Predictor.py", line 53, in __init__
    self._load_model()
  File "/user/work/tk19812/software/DeepFRI-1.0.0/deepfrier/Predictor.py", line 56, in _load_model
    self.model = tf.keras.models.load_model(self.model_prefix + '.hdf5',
  File "/user/work/tk19812/software/DeepFRI-1.0.0/venv/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 186, in load_model
    loader_impl.parse_saved_model(filepath)
  File "/user/work/tk19812/software/DeepFRI-1.0.0/venv/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
    raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
OSError: SavedModel file does not exist at: ./trained_models/DeepCNN-MERGED_molecular_function.hdf5/{saved_model.pbtxt|saved_model.pb}

Any help?
F

Can you provide the PDB file of the test set?

Hi,

Thanks for opensource this wonderful work. The link provided by the project contains only the training dataset and validation set data. Can you provide the PDB file of the test set?

Huge differences between DeepFRI server and local predictions

Hello, I am using a local instance of DeepFRI with the Newest Models(CPU) and I've noticed that the predictions I get are totally different when compared with the server.

For example for the 1S3P sequence the server returns a total of 12 predictions with a score over 0.50 while my local instance returns none. I'm running in a Windows Subsystem for Linux environment without GPU but I don't think that should be an issue for the CPU models?

$ python ./predict.py --seq 'SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKDGFIDEDELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES' -ont mf --verbose
2022-03-21 22:37:37.026507: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-03-21 22:37:37.026573: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-03-21 22:37:38.620674: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-03-21 22:37:38.620760: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-21 22:37:38.620818: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host: /proc/driver/nvidia/version does not exist
2022-03-21 22:37:38.621137: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-21 22:37:38.635260: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 4001000000 Hz
2022-03-21 22:37:38.637696: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ffffa70a150 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-21 22:37:38.637795: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
### Computing predictions on a single protein...
Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.13736 calcium ion binding
query_prot GO:0016788 0.11856 hydrolase activity, acting on ester bonds
### Saving predictions to *.json file...

About train details

Hello, I'm trying to train the model in the same dataset. Could you provide more details about training the model. I'm trying to train your model on the same dataset for task named "mf", but the model will converge in one epoch, and the final model weights doesn't perform as well as the trained models you provided.

Incomprehension regarding data processing

Hello,

I have a few questions regarding the way you process the data.

In your code you seem to use nrPDB-GO_2019.06.18_train.txt and nrPDB-GO_2020.06.18_annot.tsv to build the training data, but in your data you only have nrPDB-GO_2019.06.18_annot.tsv, is it normal ?
I analyzed your results file (DeepCNN-MERGED_molecular_function_results.pckl, DeepCNN-MERGED_cellular_component_results.pckl), and the size of the test set is the same depending on the ontologies. However, in your Supplementary table, you say that the size of the test set differ between MF, BP, CC. Why ?
In your Supplementary Table, the train/val/test set have different sizes depending on MF, BP, CC. Shouldn't they have the same size ?

Is DeepFRI no longer being updated or supported?

Am interested in using DeepFRI in a new pipeline I'm writing, but am concerned (based upon the number and age of unaddressed Issues here) that DeepFRI is no longer supported.

Can I input a fasta file and a pdb dir at the same time?

Dear all,

Thanks for your excellent work.
I want to know whether I can input a pdb dir and a fasta file at the same time. Because I have some protein sequences and pdbs?

can I use a command like python predict.py --fasta_fn examples/pdb_chains.fasta -ont mf -v --pdb_dir ./examples/pdb_files --saliency? will it use the sequence feature and structure features and make a prediction?

Thanks!

experimental data

The link provided in this paper about the experimental data is no longer valid. Could you please send it to me?I think it's very helpful for me to understand your work.I really appreciate your positive response.Thank you very much!

DeepFRI not working on recent versions of CUDA / Tensorflow

I'm unable to get DeepFRI working on my local machine.
My CUDA driver version is 12.3 and I can install tensorflow version 2.16.2
I believe the required versions are 10.3 and 2.3.1 but I'd like to keep my drivers at 12.3.

Details:

With default installation (pip install .), tensorflow-gpu 2.3.1 does not work with my CUDA version.

With python 3.7:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 09:53:57.153691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:57.912333: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-07-11 09:53:58.038477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0001:00:00.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.57GiB deviceMemoryBandwidth: 298.08GiB/s
2024-07-11 09:53:58.038517: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-11 09:53:58.040400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2024-07-11 09:53:58.042116: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2024-07-11 09:53:58.042411: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2024-07-11 09:53:58.044346: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2024-07-11 09:53:58.045340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2024-07-11 09:53:58.045554: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2024-07-11 09:53:58.045570: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

with Python 3.8:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 32, in <module>
    from tensorflow.core.framework import function_pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/data/jasper/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/data/jasper/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 553, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 3. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Using the most recent version of tensorflow (2.16.2) works fine:

>>> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 10:04:07.184788: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:04:07.184845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:04:07.186709: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:04:07.193000: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

But fails to load the model weights since CuDNNLSTM is no longer a keras layer:

>>> python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_guided_grads
2024-07-11 10:06:40.479994: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 10:06:40.480050: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 10:06:40.481994: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 10:06:40.488387: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-11 10:06:42.516791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14653 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0001:00:00.0, compute capability: 7.5
Traceback (most recent call last):
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/predict.py", line 35, in <module>
    predictor = Predictor(models[ont], gcn=gcn)
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 61, in __init__
    self._load_model()
  File "/data/jasper/PDB-to-Seq-annotation/DeepFRI/deepfrier/Predictor.py", line 74, in _load_model
    self.model = tf.keras.models.load_model(self.model_prefix + '.hdf5',
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/saving/saving_api.py", line 189, in load_model
    return legacy_h5_format.load_model_from_hdf5(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/legacy_h5_format.py", line 133, in load_model_from_hdf5
    model = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/model.py", line 521, in from_config
    return functional_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 477, in functional_from_config
    process_layer(layer_data)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/models/functional.py", line 457, in process_layer
    layer = saving_utils.model_from_config(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/saving_utils.py", line 85, in model_from_config
    return serialization.deserialize_keras_object(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/legacy/saving/serialization.py", line 504, in deserialize_keras_object
    deserialized_obj = cls.from_config(cls_config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 679, in from_config
    return cls(**config)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/lstm.py", line 486, in __init__
    super().__init__(
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/rnn/rnn.py", line 204, in __init__
    super().__init__(**kwargs)
  File "/data/jasper/envs/deepfri/lib/python3.9/site-packages/keras/src/layers/layer.py", line 266, in __init__
    raise ValueError(
ValueError: Unrecognized keyword arguments passed to LSTM: {'time_major': False}

If anyone is able to give more information about a working installation of their own (that doesn't require a downgrade of CUDA) that'd be super useful!

Much appreciated,
Jasper

(DeepFRI) yuanqm@gpu2:~/protein_function/DeepFRI$ python predict.py --cmap ./examples/pdb_cmaps/1S3P-A.npz -ont mf --verbose
2022-11-07 22:03:37.689676: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-11-07 22:03:37.689727: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-07 22:03:38.802448: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2022-11-07 22:03:38.839570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:3e:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-11-07 22:03:38.839863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: 
pciBusID: 0000:40:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-11-07 22:03:38.840124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: 
pciBusID: 0000:b1:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-11-07 22:03:38.840397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties: 
pciBusID: 0000:b5:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-11-07 22:03:38.840530: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-11-07 22:03:38.840596: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2022-11-07 22:03:38.886881: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-11-07 22:03:38.887473: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2022-11-07 22:03:38.887753: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2022-11-07 22:03:38.887887: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2022-11-07 22:03:38.887957: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2022-11-07 22:03:38.887973: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-11-07 22:03:38.888385: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-07 22:03:38.903619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3000000000 Hz
2022-11-07 22:03:38.910226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bd2270 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-11-07 22:03:38.910280: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-11-07 22:03:38.912764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-11-07 22:03:38.912826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      
### Computing predictions on a single protein...
Traceback (most recent call last):
  File "predict.py", line 39, in <module>
    predictor.predict(args.cmap)
  File "/home/yuanqm/protein_function/DeepFRI/deepfrier/Predictor.py", line 109, in predict
    y = self.model([A, S], training=False).numpy()[:, :, 0].reshape(-1)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 659, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 110, in call
    output, states = self._process_batch(inputs, initial_state)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 507, in _process_batch
    outputs, h, c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv2(**args)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1740, in cudnn_rnnv2
    ctx=_ctx)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1817, in cudnn_rnnv2_eager_fallback
    attrs=_attrs, ctx=ctx, name=name)
  File "/home/yuanqm/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError: Could not find device for node: {{node CudnnRNNV2}} = CudnnRNNV2[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=0, seed2=0]
All kernels registered for op CudnnRNNV2:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]
 [Op:CudnnRNNV2]

Your "pos_weights" method doesn't seem to work

In train_DeepCNN.py

# computing weights for imbalanced go classes
class_sizes = counts[args.ontology]
mean_class_size = np.mean(class_sizes)
pos_weights = mean_class_size / class_sizes
pos_weights = np.maximum(1.0, np.minimum(10.0, pos_weights))
pos_weights = np.concatenate([pos_weights.reshape((len(pos_weights), 1)), pos_weights.reshape((len(pos_weights), 1))], axis=-1)
model.train(train_tfrecord_fn, valid_tfrecord_fn, epochs=args.epochs, batch_size=args.batch_size, pad_len=args.pad_len, ont=args.ontology, class_weight=pos_weights)

Also, in DeepCNN.py

    # fit model
    history = self.model.fit(batch_train,
                             epochs=epochs,
                             validation_data=batch_valid,
                             class_weight=class_weight,
                             steps_per_epoch=n_train_records//batch_size,
                             validation_steps=n_valid_records//batch_size,
                             callbacks=[es, mc])

But I found that the "pos_weights" and "class_weight" you write here doesn't work and has errors. "class_weight" seems to be important when it comes to more goterms, do you have some ideas?

Thanks!!! XD

flatironinstitute / deepfri Goto Github PK

deepfri's Introduction

DeepFRI

Citing

Dependencies

Protein function prediction

Option 1: predicting functions of a protein from its contact map

Output:

Option 2: predicting functions of a protein from its sequence

Output:

Option 3: predicting functions of proteins from a fasta file

Output:

Option 4: predicting functions of proteins from contact map catalogue

Output:

Option 5: predicting functions of a protein from a PDB file

Output:

Option 6: predicting functions of a protein from a directory with PDB files

Output:

Training DeepFRI

Output

Functional residue identification

Example:

Output:

Data

Pretrained models

deepfri's People

Contributors

Stargazers

Watchers

Forkers

deepfri's Issues

Data

Pretrained models

Recommend Projects

Recommend Topics

Recommend Org

Jobs