GithubHelp home page GithubHelp logo

peptoneltd / dspp-keras Goto Github PK

View Code? Open in Web Editor NEW
166.0 14.0 21.0 76 KB

Protein order and disorder data for Keras, Tensor Flow and Edward frameworks with automated update cycle made for continuous learning applications.

Home Page: https://peptone.io/dspp

License: GNU Affero General Public License v3.0

Python 100.00%
protein protein-structure biology biotechnology machine-learning ai tensorflow keras amino-acids polymer

dspp-keras's People

Contributors

jandom avatar ktamiola avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dspp-keras's Issues

Error to Download data with Python3

Hi.
when downloading the datasets, the dspp.py module raises the following error in python3.
TypeError: the JSON object must be str, not 'bytes'

The main reason explains here.

The fast solution is provided here as the first answer, which works for both Py2 and Py3.

Negative CHI2

Something seems to be quite off. dspp_cnn.py is producing negative CHI2.

5120/6054 [========================>.....] - ETA: 0s - loss: 0.0362 - rmsd: 0.3122 - chi2: -23331.6329 R(l/v_l)=0.87
6054/6054 [==============================] - 1s - loss: 0.0366 - rmsd: 0.3147 - chi2: -24054.9850 - val_loss: 0.0420 - val_rmsd: 0.3410 - val_chi2: -39990.6675

Restart functionality

We simply need to have an ability to restart training. I could envision that as an extension of dspp_utils.py. We could use built-in Keras Checkpoint callbacks.

Incomplete README.md

  1. Better intro (adapted from the abstract)
  2. Propensity (figure)
  3. Correct reference
  4. Usage license

Decoder needed

We will need a decent decoder to plot the results of recurrent networks with embedding.

The propensity values are not in range [-1,1]

Hi,
As I understood from your README.md the propensity values should be between -1 and 1, which is not fulfilled in your dataset when I use dspp-keras package.

Could you please explain what is the issue here?

I put my example here.

x,y=dspp.load_data()
y[0].max()

which results in 2.5.

Could you please explain how you normalize your dataset?
by min-max scaler? normal-distribution scaler?

Abnormal Y values

We have a potential show stopper here. It seems the propensity values are running in -2.5 to 1.2 ranges. This is very worrisome and obviously messes up training.
plot

Obsolete Keras API in multi-GPU code

An obsolete API calls are made in the multi-GPU code (dspp_utils.py). It would be good to make the Merge layers compatible with the latest Keras API.

JSON instead of cPickle

For safety reasons we should parse JSON as opposed to cPickle. It has been suggested to me by Keras developers.

Outdated dspp-keras pip package

I have updated dspp-keras with

pip instal dspp-keras --upgrade

All looks good. However, the example produces:

Using TensorFlow backend.
Downloading data from https://github.com/PeptoneInc/dspp-data/blob/master/database.pkl?raw=true
Traceback (most recent call last):
  File "dspp_cnn.py", line 130, in <module>
    X, Y = dspp.load_data()
  File "/usr/local/lib/python2.7/dist-packages/dsppkeras/datasets/dspp.py", line 15, in load_data
    path = get_file(path, origin='https://github.com/PeptoneInc/dspp-data/blob/master/database.pkl?raw=true')
  File "/usr/local/lib/python2.7/dist-packages/dsppkeras/utils/data_utils.py", line 203, in get_file
    raise Exception(error_msg.format(origin, e.errno, e.reason))
Exception: URL fetch failure on https://github.com/PeptoneInc/dspp-data/blob/master/database.pkl?raw=true: None -- Not Found

h5py dependency

We may need to consider adding h5py dependency to pip package.

Traceback (most recent call last):
  File "dspp_cnn.py", line 148, in <module>
    model.save_weights("model.h5")
  File "/home/kamil/.local/lib/python2.7/site-packages/keras/models.py", line 723, in save_weights
    raise ImportError('`save_weights` requires h5py.')
ImportError: `save_weights` requires h5py.

Suspiciously high CHI2

The CHI2 code is reporting suspiciously high values. We should compare the outcome of CHI2 with known power divergence in various tests in Scipy.

Embedding with 0 masking

Just an idea! Instead of weights.

From: https://keras.io/layers/embeddings/

mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1).

Batch size affects metrics (chi2, rmsd)

Insofar we are reporting collective CHI2 per whole batch. Consequently, the CHI2 values are pathologically big. We should be able to pass batch size as a parameter of get it from Keras.Callback class and divide the CHI2 by it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.