GithubHelp home page GithubHelp logo

isabella232 / provis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from salesforce/provis

0.0 0.0 0.0 7.58 MB

Official code repository of "BERTology Meets Biology: Interpreting Attention in Protein Language Models."

Home Page: https://arxiv.org/abs/2006.15222

License: Other

Shell 8.11% Python 85.03% Jupyter Notebook 6.86%

provis's Introduction

BERTology Meets Biology: Interpreting Attention in Protein Language Models

This repository is the official implementation of BERTology Meets Biology: Interpreting Attention in Protein Language Models.

Table of Contents

ProVis Attention Visualizer

This section provides instructions for generating visualizations of attention projected onto 3D protein structure.

Image Image

Installation

General requirements:

  • Python >= 3.7
pip install biopython==1.77
pip install tape-proteins==0.4
pip install jupyterlab==3.0.14
pip install nglview
jupyter-nbextension enable nglview --py --sys-prefix

If you run into problems installing nglview, please refer to their installation instructions for additional installation details and options.

Execution

cd <project_root>/notebooks
jupyter notebook provis.ipynb

If you get an error running the notebook, you may need to execute the notebook as follows:

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

See nglview installation instructions for more details.

You may edit the notebook to choose other proteins, attention heads, etc. The visualization tool is based on the excellent nglview library.


Experiments

This section describes how to reproduce the experiments in the paper.

UPDATE: Some steps below cannot be executed, due to data

Installation

cd <project_root>
python setup.py develop

To download additional required datasets from TAPE:

cd <project_root>/data
wget http://s3.amazonaws.com/songlabdata/proteindata/data_pytorch/secondary_structure.tar.gz
tar -xvf secondary_structure.tar.gz && rm secondary_structure.tar.gz
wget http://s3.amazonaws.com/songlabdata/proteindata/data_pytorch/proteinnet.tar.gz
tar -xvf proteinnet.tar.gz && rm proteinnet.tar.gz

Attention Analysis

The following steps will reproduce the attention analysis experiments and generate the reports currently found in <project_root>/reports/attention_analysis. This includes all experiments besides the probing experiments (see Probing Analysis).

Before performing steps, navigate to appropriate directory:

cd <project_root>/protein_attention/attention_analysis

Tape BERT Model

The following executes the attention analysis (may run for several hours):

sh scripts/compute_all_features_tape_bert.sh

The above script create a set of extract files in <project_root>/data/cache corresponding to various properties being analyzed. You may edit the script files to remove properties that you are not interested in. If you wish to run the analysis without a GPU, you must specify the --no_cuda flag.

The following generate reports based on the files created in previous step:

sh scripts/report_all_features_tape_bert.sh

If you removed steps from the analysis script above, you will need to update the reporting script accordingly.

ProtTrans Models

In order to generate reports for the ProtTrans models, follow the instructions as for the TapeBert model above, but substitute the following commands:

ProtBert:

sh scripts/compute_all_features_prot_bert.sh
sh scripts/report_all_features_prot_bert.sh

ProtBertBFD:

sh scripts/compute_all_features_prot_bert_bfd.sh
sh scripts/report_all_features_prot_bert_bfd.sh

ProtAlbert:

sh scripts/compute_all_features_prot_albert.sh
sh scripts/report_all_features_prot_albert.sh

ProtXLNet:

sh scripts/compute_all_features_prot_xlnet.sh
sh scripts/report_all_features_prot_xlnet.sh

Probing Analysis

The following steps will recreate the figures from the probing analysis, currently found in <project_root>/reports/probing

Navigate to directory:

cd <project_root>/protein_attention/probing

Training

Train diagnostic classifiers. Each script will write out an extract file with evaluation results. Note: each of these scripts may run for several hours.

sh scripts/probe_ss4_0_all
sh scripts/probe_ss4_1_all
sh scripts/probe_ss4_2_all
sh scripts/probe_sites.sh
sh scripts/probe_contacts.sh

Reports

python report.py

License

This project is licensed under BSD3 License - see the LICENSE file for details

Acknowledgments

This project incorporates code from the following repo:

Citation

When referencing this repository, please cite this paper.

@misc{vig2020bertology,
    title={BERTology Meets Biology: Interpreting Attention in Protein Language Models},
    author={Jesse Vig and Ali Madani and Lav R. Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani},
    year={2020},
    eprint={2006.15222},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2006.15222}
}

provis's People

Contributors

jessevig avatar svc-scm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.