GithubHelp home page GithubHelp logo

adaptyvbio / proteinflow Goto Github PK

View Code? Open in Web Editor NEW
172.0 7.0 8.0 60.62 MB

Versatile computational pipeline for processing protein structure data for deep learning applications.

Home Page: https://adaptyvbio.github.io/ProteinFlow/

License: BSD 3-Clause "New" or "Revised" License

Python 90.51% Shell 0.30% Dockerfile 0.38% Mako 8.79% JavaScript 0.01%
bioinformatics deep-learning protein-data-bank protein-design protein-structure dataset

proteinflow's People

Contributors

danielnzg85 avatar elkoz avatar mawskay avatar not-a-feature avatar stochasticribosome avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

proteinflow's Issues

Attribution request

Hello,

I've noticed that a large portion of your data/utils.py code is borrowed from our repository, sidechainnet (repo, file). I think it was attributed in the past, but you no longer provide attribution or a citation. Would you mind providing attribution, in this case, by linking to our repository (https://github.com/jonathanking/sidechainnet) or paper (https://doi.org/10.1002/prot.26169) and providing my name?

Thank you for your consideration. I'm very happy that I could contribute to your work since ProteinFlow seems quite exciting.

Best,
Jonathan

pip install proteinflow fails in Colab

Hello,

I had created a Google Colab notebook for data analysis, and it was working brilliantly. Thank you for creating such a great tool!

However, my Colab does not work anymore despite zero code changes. pip install proteinflow no longer seems to work. I am attaching the two command lines that can be used to reproduce the error:
Screen Shot 2023-10-07 at 14 29 17

Screen Shot 2023-10-07 at 14 29 28

classes_to_exclude is not filtering proteins correctly

Hi @elkoz,

I noticed while visualizing the proteins that setting the classes_to_exclude method is not properly filtering the required classes. It runs with no issues, but it still returns proteins that are homomeric or heteromeric.
It might be removing some of the homomers/heteromers, but I haven't tested if it does.

Here is the code that I used for filtering and a few of the resulting proteins:

Screen Shot 2024-01-31 at 01 08 55 Screen Shot 2024-01-31 at 01 08 21

Add conda build for ARM based Mac

Currently, while installing on a M1 Mac, the package is not found since the build is not currently available (osx-arm64)

PackagesNotFoundError: The following packages are not available from current channels:

  - proteinflow

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-arm64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://conda.anaconda.org/bioconda/osx-arm64
  - https://conda.anaconda.org/bioconda/noarch
  - https://conda.anaconda.org/adaptyvbio/osx-arm64
  - https://conda.anaconda.org/adaptyvbio/noarch
  - https://repo.anaconda.com/pkgs/main/osx-arm64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-arm64
  - https://repo.anaconda.com/pkgs/r/noarch

Calling ProteinEntry.from_pickle(<path>).to_pdb(<target_path>) on the entire dataset reveals errors

Hi Liza,

I noticed when trying to create a W&B table visualization for the entire dataset that converting the pickle files into pdbs reveals multiple bugs.

Firstly, I got a "UnpicklingError: unpickling stack underflow" from the line "protein_entry = ProteinEntry.from_pickle(pickle_path)" It did not happen with every protein, so when I handled that exception I realized that PDBParser could not properly parse a few of the generate pdb files, throwing out an error in the line "structure = parser.get_structure(pdb_id, target_path)"

Screen Shot 2024-01-27 at 23 26 16 Screen Shot 2024-01-27 at 23 26 55

Support python=3.12

The requirement biotite=0.35 may be too out-of-date to fail to support python=3.12

Solving environment: - warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package proteinflow-1.3.6-0 requires biotite 0.35.0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ pin-1 is installable and it requires
│  └─ python 3.12.* , which can be installed;
└─ proteinflow is not installable because it requires
   └─ biotite 0.35.0  but there are no viable options
      ├─ biotite 0.35.0 would require
      │  └─ python >=3.10,<3.11.0a0 *_cpython, which conflicts with any installable versions previously reported;
      ├─ biotite 0.35.0 would require
      │  └─ python >=3.11,<3.12.0a0 *_cpython, which conflicts with any installable versions previously reported;
      ├─ biotite 0.35.0 would require
      │  └─ python >=3.8,<3.9.0a0 *_cpython, which conflicts with any installable versions previously reported;
      └─ biotite 0.35.0 would require
         └─ python >=3.9,<3.10.0a0 *_cpython, which conflicts with any installable versions previously reported.

Use of own sequences for splitting

Really nice package! One thing I feel is missing is being able to split based on a set of sequences, for example, sequences that may have some biophysical properties one is trying to predict using ML methods.

I did not find a way to do this if it already exists.

recreate pdbs after generate?

Dear authors,

thanks a lot for this great pipeline! I was able to run some tests without many problems! I have a question though, is it possible to create the actual PDBs behind the pickle files within ProteinFlow?

Maybe I missed it but the filtering seems to indicate nicely how many structures were removed for what reason but not as obviously how many we had initially / in the end?

Great appreciation for your work! Thank you!

include nanobody?

Hey there! Does the antibody dataset include nanobodies? I have downloaded and processed the data but didn't find nanobodies... Maybe we need an additional key to denote whether it is an nanobody.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.