GithubHelp home page GithubHelp logo

volkamerlab / ratar Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 3.59 MB

Read-across the targetome

Home Page: http://gepris.dfg.de/gepris/projekt/391684253?language=en

License: MIT License

Python 99.26% Shell 0.74%

ratar's People

Contributors

dominiquesydow avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

ratar's Issues

Check units and scaling of moments

First three moments of distribution:

  • Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
  • Scaling: none (should fingerprint/moments be normalised?)

Add `from_path` class method to all ratar.encoding classes

Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in case of from_molecule.

Differing behaviour here will not work well downstream, right? Check this.

Test robustness of reference points

How robust are reference points in binding site?

  • Within one binding site: how do different reference points affect distance distributions?
  • Across similar binding sites (e.g. kinases): similar reference points?

Define binding sites and their size

Binding site definition/size varies between datasets:

  • KLIFS (~1300 atoms)
  • scPDB (~500 atoms)
  • TOUGH (~150 atoms)

What size is needed for good performance of encoding method?
Can we compare performance on these datasets with each other?

Packages used

Here is a list of the packages that are used in the PR refactoring #1 , queried using https://github.com/volkamerlab/ratar/search?p=1&q=import

biopandas
pandas
numpy
seaborn
pymol

Testing or env scripts:
pytest
yaml

Used but not necessary to put in conda env, since already in python (see here):

  • sys
  • pathlib
  • datetime
  • argparse
  • os
  • glob
  • re
  • shutil
  • subprocess
  • typing
  • contextlib
  • tempfile
  • pickle

Include HETATM entries

Molecules contain atoms belonging not to standard amino acids:

  • protonated amino acids
  • non-standard amino acids
  • ions
  • water
  • other solvent molecules

For calculations including z-scales, all atoms that do not belong to a standard amino acids are removed.

What about other encoding methods such as pdbqt?

Required `ratar` updates

Updates needed for this code base:

December 2021

See PR #14 for details.

  • Update environment file as suggested by @t-kimber in #12
  • Check if CLI is still running > now it is again.
  • Update README + installation + usage instructions (update tutorial!!!!)
  • Fill follow-up section below

Follow-up

Methodology / encoding

  • So far only full pocket encoding; we probably need subpocket encoding (overlapping patches)
  • Define binding sites and their size #7
  • Include non-standard amino acids #8
  • Check units of 4th to 6th dimensions of reference points
  • Check units and scaling of moments - First three moments of distribution:
    • Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
    • Scaling: none (should fingerprint/moments be normalised?)
  • We started to look into pdbqt files to be added as "physchem" properties to our fingerprint, take a look at this notebook if still of interest
  • We already started to benchmark the method against similar/dissimilar pocket pairs from FuzCav, ProSPECCTs, and TOUGH-M1 (see README)
  • Encoding workers fine for mol2 files; pdb files may not and need revision
  • Since we probably have to move to NGLview anyways, PyMol functions have not been checked since 2019; probably they do not work anymore.

Testing and CI

  • Add unit tests for similarity module
  • CI: Add back Windows + MacOS support, lint package, format+lint+test docs tutorials

Code

  • Check similarity module - refactoring needed?
  • Address #FIXMEs and #TODOs in code (left-overs of major refactoring in PR #1); to be done after setting up unit tests
  • Remove pymol dependency (not on conda-forge; currently installed from tpeulen)
  • Remove flatten-dict dependency (only pip-installable)
  • Add from_path class method to all ratar.encoding classes: Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in the case of from_molecule.
  • We set up a logging.conf file to fine-grain our logging. Include back into the package if of interest.

Packaging

  • Update ratar environment - enable conda packaging

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.