GithubHelp home page GithubHelp logo

lhy1111 / speedppi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from patrickbryant1/speedppi

0.0 0.0 0.0 12.35 MB

Rapid protein-protein interaction network creation from multiple sequence alignments with Deep Learning

License: Other

Shell 2.17% Python 97.83%

speedppi's Introduction

SpeedPPI

This repository contains code for predicting a pairwise protein-protein interaction network from a set of protein sequences or two lists of sequences thought to interact.

The procedure is based on MSA creation and evaluation with AlphaFold2 adapted for pairwise interactions aka FoldDock

AlphaFold2 is available under the Apache License, Version 2.0 and so is SpeedPPI, which is a derivative thereof.
The AlphaFold2 parameters are made available under the terms of the CC BY 4.0 license and have not been modified.
You may not use these files except in compliance with the licenses.
For an example study, see Towards a structurally resolved human protein interaction network where 65'484 human protein interactions were evaluated.

The number of possible pairs grows quadratically with the number of input sequences.
As this number rapidly becomes too large to handle, we apply a number of techniques to speed up the network generation. We also greatly reduce the memory footprint of the dependencies for the protein structure prediction and generated MSAs. Overall, the speedup is 40x (for a set of 1000 proteins, 499500 pairwise interactions) and the disk space is reduced 4000x. The benefits are larger the more proteins are being predicted.

We provide two options for running SpeedPPI:

  1. All-vs-all mode.
  • Runs all proteins in a fasta file against each other.
  1. Some-vs-some mode.
  • Runs all proteins in two different fasta files against each other.

The PPI network is built automatically and all structures above a score threshold that you set are saved to disk.
Follow the instructions below, provide your sequences and relax while your network is being built.
If you like SpeedPPI, please star this repo and if you use it in your research please cite FoldDock and SpeedPPI

Install dependencies

Python packages

There are two options to install the packages used here.

  1. Install all packages to your local environment with python pip. This is recommended - if possible.
bash install_pip_requirements.txt
  1. Install all packages into a conda environment (requires https://docs.conda.io/en/latest/miniconda.html)
conda env create -f speed_ppi.yml

Note that the prediction part here runs on GPU. You will therefore have to install all appropriate CUDA drivers for your system. Otherwise the GPU will not be used.
The installation scripts do install these drivers, but this can fail in some cases.
To check if the gpu is accessed (first activate the conda environment if this was installed: conda activate SpeedPPI), run:

python3 ./src/test_gpu_avail.py

If the output is "gpu" - everything is fine.

HHblits


This is installed from source.

git clone https://github.com/soedinglab/hh-suite.git
mkdir -p hh-suite/build && cd hh-suite/build
cmake -DCMAKE_INSTALL_PREFIX=. ..
make -j 4 && make install
cd ..

Uniclust30


25 Gb download, 87 Gb extracted

wget http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz --no-check-certificate
tar -zxvf uniclust30_2018_08_hhsuite.tar.gz -C data/

AlphaFold2 parameters

mkdir data/params
wget https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar
mv alphafold_params_2021-07-14.tar data/params
tar -xf data/params/alphafold_params_2021-07-14.tar
mv params_model_1.npz data/params

Cleanup - remove unnecessary files

rm uniclust30_2018_08_hhsuite.tar.gz
rm data/params/alphafold_params_2021-07-14.tar
rm params_*.npz

Run the pipeline

All-vs-all mode

  • Input paths to the following resources (separated by space)
  1. A fasta file with sequences for all proteins you want to analyse
  2. Path to HHblits (default: hh-suite/build/bin/hhblits)
  3. pDockQ threshold. This determines what structures are saved to disk. (0.5 is recommended)
  4. Output directory

    Try the test case:
bash create_ppi_all_vs_all.sh ./data/dev/test.fasta hh-suite/build/bin/hhblits 0.5 ./data/dev/all_vs_all/

Some-vs-some mode

  • Input paths to the following resources (separated by space)
  1. A fasta file with sequences for the proteins in list 1
  2. A fasta file with sequences for the proteins in list 2
  3. Path to HHblits
  4. Output directory
  5. pDockQ threshold. This determines what structures are saved to disk. (0.5 is recommended)



Try the test case:

bash create_ppi_some_vs_some.sh ./data/dev/test1.fasta ./data/dev/test2.fasta hh-suite/build/bin/hhblits 0.5 ./data/dev/some_vs_some/

Note

If you have a computational cluster available, it will be much faster to run your predictions in parallel. This requires some knowledge of computational infrastructure, however. Steps 2 and 3 in create_ppi_[all_vs_all, some_vs_some].sh are written in individual scripts assuming a SLURM infrastructure in ./src/parallel/. You can copy these, modify the paths and variables and queue them at your cluster to make the predictions even more efficient!

speedppi's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.