GithubHelp home page GithubHelp logo

egret's Introduction

EGRET

This repository contains the official implementation of our paper "EGRET: Edge Aggregated GRaph Attention NETworks and Transfer Learning Improve Protein-Protein Interaction Site Prediction" published in the journal Briefings in Bioinformatics.

If you use any part of this repository, we shall be obliged if you cite our paper.

Usage

Pytorch and DGL installation

We implemented our method using PyTorch and Deep Graph Library (DGL). Please install these two for successfully running our code. Necessary installation instructions are available at the following links (Within the parentheses, I am also mentioning the versions that were used during our experiments)-

  1. Python 3.7.x
  2. PyTorch 1.6.x
  3. Deep Graph Library 0.4.x

Download pretrained-model weights:

ProtBERT model weight

  1. Please download the pretrained model weight-file "pytorch_model.bin" from here.
  2. Place this weight-file in the folder "EGRET/inputs/ProtBert_model". If you use this pretrained model for your paper, please cite the paper ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

EGRET model weight

  1. Please download the pretrained model weight-file "egret_model_weight.dat" from here.
  2. Place this weight-file in the folder "EGRET/models".

Input Data

To store input-features, navigate to the folder "EGRET/inputs". In this folder, follow the following steps:

  1. Store the PDB files of the isolated proteins that shall be used for prediction in the folder "pdb_files". Rename the PDB files in the format: "<an arbritary name>_<chain IDs>". Please see the example PDB files provided in this folder. Please provide the real chain IDs (as available in the PDB file) after the underscore ("_") correctly. (In the provided examples <an arbritary name> is the PDB ID of a complex in which this input protein is one of the subunits. It is not mendatory.)
  2. List all the protein-names in the file "protein_list.txt"

Run inference to predict numeric propensity (of each of the residues) for interaction

  1. From command line navigate to the folder "EGRET" (where the "run_egret.py" file is situated).
  2. Please run the following command:
python run_egret.py
  1. The command above will generate the results in the "EGRET/outputs" folder.

Output format

  1. The output generated by running EGRET will be stored as a pickle file in the "EGRET/outputs" folder. To open the pickle file please run the following commands in the python interpreter:
import pickle
output = pickle.load(open('EGRET/outputs/prediction_and_attention_scores.pkl', 'rb'))
  1. In the above commands the "output" variable is a python dictionary (with the four keys: 'pred', 'protein_info', 'edges', 'attention_scores').
  • To access the predicted numeric propensity, please run the following commands:

    prediction = output['pred']  
    protein_index = 0  
    print(prediction[protein_index])  

    In the above commands, "protein_index" represents the index of the protein-name in the "protein_list.txt" file. (You can set it to any number, e.g: for the protein-name at index 2 (third row of the "protein_list.txt" file), set protein_index=2).
    These commands will print the predicted numeric propensities of all the residues in the protein at index "0" of "protein_list.txt" file. The propensities will be printed sequentially following the order of the residues in the input PDB file of this protein.

  • To access general information about the input proteins, please run the following commands:

    protein_information = output['protein_info']  
    protein_index = 0  
    print(protein_information[protein_index])    

    These commands will print a python dictionary corresponding to the protein at index "0" of "protein_list.txt" file. This python dictionary contains the number of residues in the protein (represented with the key 'seq_length' in this dictionary).

  • To access the edges of the graphs representions of the input proteins, please run the following commands:

    graph_edges = output['edges']  
    protein_index = 0  
    print(graph_edges[protein_index])  

    These commands will print a numpy array corresponding to the protein at index "0" of "protein_list.txt" file. Each row of this numpy array corresponds to a neighborhood, that contains the indices of the neighboring nodes (residues) of one residue (i.e. the center of the neighborhood). (please see our paper for more details). This center of the neighborhood is the row count of the matrix. The following example command will print the neighborhood (neighboring residue indices) of the residue with index 2 -

    center_node = 2  
    print(graph_edges[protein_index][center_node])  
  • To access the attention scores associated with the edges, please run the following commands:

    attention_scores = output['attention_scores']  
    protein_index = 0  
    print(attention_scores[protein_index])  

    These commands will print a numpy array corresponding to the protein at index "0" of "protein_list.txt" file. Each row of this numpy array contains the attention scores associated with the corresponding edge. In the following example command, center of the neighborhood is the residue at position 2. This command will print the attention scores associated with the edges from its neighboring residues (nodes) to this residue-

    center_node = 2  
    print(attention_scores[protein_index][center_node])  

Please reach out if you face any issues while trying to run the code!

Citation

Sazan Mahbub, Md Shamsuzzoha Bayzid, EGRET: edge aggregated graph attention networks and transfer learning improve protein–protein interaction site prediction, Briefings in Bioinformatics, 2022;, bbab578, https://doi.org/10.1093/bib/bbab578

BibTeX:

@article{10.1093/bib/bbab578,
    author = {Mahbub, Sazan and Bayzid, Md Shamsuzzoha},
    title = "{EGRET: edge aggregated graph attention networks and transfer learning improve protein–protein interaction site prediction}",
    journal = {Briefings in Bioinformatics},
    year = {2022},
    month = {01},
    abstract = "{Protein–protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites.We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET’s network behavior to provide insights about the causes of its decisions.EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET.shams\[email protected]}",
    issn = {1477-4054},
    doi = {10.1093/bib/bbab578},
    url = {https://doi.org/10.1093/bib/bbab578},
    note = {bbab578},
    eprint = {https://academic.oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbab578/42350487/bbab578.pdf},
}

egret's People

Contributors

sazan-mahbub avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.