GithubHelp home page GithubHelp logo

theislab / batchglm Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 9.0 3.25 MB

Fit generalized linear models in python.

License: BSD 3-Clause "New" or "Revised" License

Python 98.42% Dockerfile 0.06% Makefile 1.52%
differential-expression generalized-linear-models tensorflow

batchglm's People

Contributors

davidsebfischer avatar hoeze avatar ilan-gold avatar kadam0 avatar le-ander avatar picciama avatar sabrinarichter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

batchglm's Issues

Build errors in documentation

/home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:10: WARNING: Unexpected indentation. /home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:16: WARNING: Block quote ends without a blank line; unexpected unindent.
--

See #33 (comment)

Memory usage in numerical hessian

I have a peak memory load of 45GB on 500 cells, I believe this is during hessian estimation, on a test data set. That should not be the case I think, it s only 10 or so parameters? We should go over step by step whether really only the relevant parts of the overall hessian are computed.

convergence criterium positive loss change

I noticed that Newton-type solver converge relatively good in some cases but than bounce off to extremely bad objective values within a single step. This might be because they reach plateaus around the MLE where the quadratic approximation is bad. One easy way to mitigate this, which worked for me in the past, was to evaluate convergence in these cases by step (no batching!):
if loss change is negative, update parameters and continue
else do not update parameters and return
What do you think @Hoeze ?
I think the way this is coded now it we would have to perform the update step, check for this criterium
and revert the parameter update and return if the objective is not improved.

Adjust gradient calculation in nb_glm estimator

The gradient of a_only / b_only optimizers currently calculates the full params gradient, i.e. using one of those optimizers does not improve runtime compared to the full optimizer.

However, it would be sufficient to calculate only the gradient of a or b in this case.

Size factors in computation graph

I had to use this double transpose so that the column wise addition of the size factor vector to the mean model matrix works:

log_mu = tf.transpose(tf.add(tf.transpose(log_mu), size_factors))

I think this is probably still faster then tiling and is more memory efficient, happy to hear thoughts on this syntax though. @Hoeze you can close if you think it is not too bad.

Preventing batchglm from using all cores on a server

So far I'm doing this:

from batchglm.pkg_constants import TF_CONFIG_PROTO
TF_CONFIG_PROTO.inter_op_parallelism_threads = 1

But every time tensorflow uses the entirety of the server.

Also, running diffxpy with the wald test function using size_factors and constraints was running for about 18 hours on 24 cores for a dataset of ~500 cells. The slowdown is probably due to the huge memory use on 24 cores... I think 272 GB of swap memory was used on a server with 126 GB of RAM

Evaluate use of pydata/sparse

Currently, using scipy.sparse.csr_matrix is computationally inefficient and hard to integrate with Dask and xarray.
pydata/sparse tries to replace scipy.sparse with a more modern and easier to integrate library. Dask and XArray will depend on this library to support sparse formats.

While scipy.sparse.csr_matrix itself seems to be currently more efficient due to usage of C/C++ CSR format, pydata/sparse could be still the faster option to use because Dask could directly make use of specialized sparse methods.
Using scipy.sparse, we cannot make use of those faster methods without having to wrap each single operation by hand. Also, we are copying a lot of data only to get it from scipy -> dask -> xarray -> Tensorflow.

See also this paper:
http://conference.scipy.org/proceedings/scipy2018/pdfs/hameer_abbasi.pdf

tf broadcast_to

Traceback (most recent call last): File "<stdin>", line 5, in <module> File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 770, in test_lrt training_strategy=training_strategy, File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 653, in _fit estim = test_model.Estimator(input_data=input_data, init_model=init_model, **constructor_args) File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 811, in __init__ extended_summary=extended_summary File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 479, in __init__ param_nonzero_a = tf.broadcast_to(feature_isnonzero, [num_design_loc_params, num_features]) AttributeError: module 'tensorflow' has no attribute 'broadcast_to'

Should this call be tf.contrib.framework.broadcast_to() instead of tf.broadcast_to() ?

Update to tensorflow 1.10

From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb_glm/estimator.py:370: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True).
From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb/util.py:46: NegativeBinomial.init (from tensorflow.contrib.distributions.python.ops.negative_binomial) is deprecated and will be removed after 2018-10-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributions instead of tf.contrib.distributions.

r trained even though quick_scale==True

Disp. model should not be trained with this call, but it is:
test = de.test.wald( data=adata_clust, dmat_loc = dmat_astro_loc.data_vars['design'], dmat_scale = dmat_astro_scale.data_vars['design'], constraints_loc = constraint_mat, constraints_scale = None, size_factors=adata_clust.obs['size_factors'].values, coef_to_test=['condition[T.True]'], training_strategy='QUICK', quick_scale=True, batch_size=None, dtype='float64' )
dmat_scale is only the intercept.

tf depreceation warning inspect.getargspec()

miniconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_inspect.py:75: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
return _inspect.getargspec(target)

Switch to closed-form Jacobian

@davidsebfischer The gradient has to be adjusted for:

  • batch_trainers
  • batch_trainers_a_only
  • batch_trainers_b_only
  • full_data_trainers
  • full_data_trainers_a_only
  • full_data_trainers_b_only

It has to be changed among others at the following lines:

loss=batch_model.norm_neg_log_likelihood,
variables=[model_vars.params],

loss=full_data_model.norm_neg_log_likelihood,
variables=[model_vars.params],

to gradient = [..] like in the following part:
gradients=[
(
tf.concat([
tf.gradients(batch_model.norm_neg_log_likelihood, model_vars.a)[0],
tf.zeros_like(model_vars.b),
], axis=0),
model_vars.params
),
],

Investigate direct replacement of gradients

It is possible to directly overwrite the gradient of Tensorflow operations.
See also this example.

Instead of directly specifying the Jacobian for each combination of linker method and noise distribution, we could only exchange critical parts of the graph.
An example for this would be to directly specify the gradient of the NegativeBinomial distribution class.

This would eventually help as well to keep the gradient from getting NaN.

batch_size control

Allow user to set batch size. Do this as part of training strategy? You can leave defaults as they are now.

Reduce graph building overhead of tensorflow

Re-implement models without tensorflow (HIPS/autograd?) if optimization is not necessary, e.g. if only the hessian should be calculated.
Another option is to add a closed-form hessian.

bug nbglm estimator.train with NR

Loss is increasing if training is called via estimator.train(optim_algo="NR") but not if newton type is called via train_sequence.

Dependency versions

pip install yields tensorflow1.11 right now, this seems to produce some issues with keras versioning. We should check this. Secondly, we need at least tf1.10 right now? This is not listed in the setup. Lastly, we might need new dask versions, again this popped up as an issue with a user.

Consider merging intercept and slope

Pro:

  • Fit non-confounded models as-is (Would resolve theislab/diffxpy#6)
  • remove complexity: no necessity to split every variable into *_intercept and *_slope
  • would simplify full-sample operations

Con:

  • Potential problems with more complex models (e.g. RSA)
  • lots of statistical method require an intercept

release 0.4

Merge dev into master and do release 0.4 once all public release issues are done.

Extension Error building docs

So I just loaded sphinx v1.8.1 and tried to build the docs using make html, and I got the following error:

Extension error:
Could not import extension sphinx_autodoc_typehints (exception: No module named 'sphinx_autodoc_typehints')
make: *** [Makefile:20: html] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.