theislab / batchglm Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 9.0 3.25 MB

Fit generalized linear models in python.

License: BSD 3-Clause "New" or "Revised" License

Python 98.42% Dockerfile 0.06% Makefile 1.52%

differential-expression generalized-linear-models tensorflow

batchglm's People

Contributors

Stargazers

Watchers

Forkers

sabrinarichter grst ax jeffhsu3 dandanhan91054 mafrasiabi michalk8 le-ander jfnavarro

batchglm's Issues

Build errors in documentation

/home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:10: WARNING: Unexpected indentation. /home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:16: WARNING: Block quote ends without a blank line; unexpected unindent.
--

See #33 (comment)

Memory usage in numerical hessian

I have a peak memory load of 45GB on 500 cells, I believe this is during hessian estimation, on a test data set. That should not be the case I think, it s only 10 or so parameters? We should go over step by step whether really only the relevant parts of the overall hessian are computed.

Add closed form hessians

Allows to entirely avoid overheads if model can be estimated entirely in closed form.

convergence criterium positive loss change

I noticed that Newton-type solver converge relatively good in some cases but than bounce off to extremely bad objective values within a single step. This might be because they reach plateaus around the MLE where the quadratic approximation is bad. One easy way to mitigate this, which worked for me in the past, was to evaluate convergence in these cases by step (no batching!):
if loss change is negative, update parameters and continue
else do not update parameters and return
What do you think @Hoeze ?
I think the way this is coded now it we would have to perform the update step, check for this criterium
and revert the parameter update and return if the objective is not improved.

Remove log(exp(mu)) and log(exp(r)) where possible

Current implementation of negative binomial distribution uses log_mu and log_r.
Directly providing log_mu and log_r saves computation time and improves numeric stability.

Adjust gradient calculation in nb_glm estimator

The gradient of a_only / b_only optimizers currently calculates the full params gradient, i.e. using one of those optimizers does not improve runtime compared to the full optimizer.

However, it would be sufficient to calculate only the gradient of a or b in this case.

logger inheritance

Debug should inherit all flags from info I think?

change installer recommendation from pip to conda?

Apparently, tensorflow via conda comes with MKL where as it does not via pip:
https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c

Use cholesky decomposition via fast=True in lstsqs in Newton updates

This is used in IRLS implementations, check whether stable.

Size factors in computation graph

I had to use this double transpose so that the column wise addition of the size factor vector to the mean model matrix works:

batchglm/batchglm/train/tf/nb_glm/estimator.py

Line 160 in fc8ea94

log_mu = tf.transpose(tf.add(tf.transpose(log_mu), size_factors))

I think this is probably still faster then tiling and is more memory efficient, happy to hear thoughts on this syntax though. @Hoeze you can close if you think it is not too bad.

Preventing batchglm from using all cores on a server

So far I'm doing this:

from batchglm.pkg_constants import TF_CONFIG_PROTO
TF_CONFIG_PROTO.inter_op_parallelism_threads = 1

But every time tensorflow uses the entirety of the server.

Also, running diffxpy with the wald test function using size_factors and constraints was running for about 18 hours on 24 cores for a dataset of ~500 cells. The slowdown is probably due to the huge memory use on 24 cores... I think 272 GB of swap memory was used on a server with 126 GB of RAM

supress warnings in dependencies if logger is set to info

This is confusing for user of diffxpy who do not want to dive into details.

write custom adam and GD ops

This will be faster than the TF ops with the closed form jacobians, can solve similarly to newton-rhapson.

Evaluate use of pydata/sparse

Currently, using scipy.sparse.csr_matrix is computationally inefficient and hard to integrate with Dask and xarray.
pydata/sparse tries to replace scipy.sparse with a more modern and easier to integrate library. Dask and XArray will depend on this library to support sparse formats.

While scipy.sparse.csr_matrix itself seems to be currently more efficient due to usage of C/C++ CSR format, pydata/sparse could be still the faster option to use because Dask could directly make use of specialized sparse methods.
Using scipy.sparse, we cannot make use of those faster methods without having to wrap each single operation by hand. Also, we are copying a lot of data only to get it from scipy -> dask -> xarray -> Tensorflow.

Allow external contraints to fit underdetermined models.

warnings for extreme variance model values

High detail variance models may be hard to fit. We should issue warnings if extreme values occur so that the user is aware that this may have gone wrong.

Add general exponential family GLM interface

code readability in estimator

This

batchglm/batchglm/train/tf/nb_glm/estimator.py

Line 1062 in b4a335f

MonitoredTFEstimator.__init__(self, model)

is a return statement of the init, correct? I would find it easier to read if the line contained the return call explicitly. The two notations are synonymous anyway, right?

ADAM diverges with NaN loss on real data

This didnt happen before. Happens after first step.

Adapt initialisation of location model to size_factors?

Right now, size factors are ignored in init and only used in BasicModelGraph. This is important as some closed forms may not hold?

decide on license

tf broadcast_to

Traceback (most recent call last): File "<stdin>", line 5, in <module> File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 770, in test_lrt training_strategy=training_strategy, File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 653, in _fit estim = test_model.Estimator(input_data=input_data, init_model=init_model, **constructor_args) File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 811, in __init__ extended_summary=extended_summary File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 479, in __init__ param_nonzero_a = tf.broadcast_to(feature_isnonzero, [num_design_loc_params, num_features]) AttributeError: module 'tensorflow' has no attribute 'broadcast_to'

Should this call be tf.contrib.framework.broadcast_to() instead of tf.broadcast_to() ?

Re-implement map_reduce operations

See tensorflow/tensorflow#20255

Add gaussian noise estimator

training divergence with small batches in newton

This should be fixable. Maybe no obs for one parameter?

bug in design matrix parsing?

What is this

batchglm/batchglm/models/nb_glm/base.py

Line 66 in 8f93f8d

dmat.coords[dim] = dmat.coords[dim]

Update to tensorflow 1.10

From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb_glm/estimator.py:370: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True).
From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb/util.py:46: NegativeBinomial.init (from tensorflow.contrib.distributions.python.ops.negative_binomial) is deprecated and will be removed after 2018-10-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributions instead of tf.contrib.distributions.

r trained even though quick_scale==True

Disp. model should not be trained with this call, but it is:
test = de.test.wald( data=adata_clust, dmat_loc = dmat_astro_loc.data_vars['design'], dmat_scale = dmat_astro_scale.data_vars['design'], constraints_loc = constraint_mat, constraints_scale = None, size_factors=adata_clust.obs['size_factors'].values, coef_to_test=['condition[T.True]'], training_strategy='QUICK', quick_scale=True, batch_size=None, dtype='float64' )
dmat_scale is only the intercept.

Rename fitting "batches" to "chunks"

tf depreceation warning inspect.getargspec()

miniconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_inspect.py:75: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
return _inspect.getargspec(target)

Move scipy minimizer from diffxpy to batchglm.train.np

reserve package name on pypi?

chunk iteration overhead for discrete groups

Decide whether to keep separate fitting by group or to fit all groups in single model to reduce overhead.

Write test function that validates estimated coefficients based on simualted ones

Switch to closed-form Jacobian

@davidsebfischer The gradient has to be adjusted for:

batch_trainers
batch_trainers_a_only
batch_trainers_b_only
full_data_trainers
full_data_trainers_a_only
full_data_trainers_b_only

It has to be changed among others at the following lines:

batchglm/batchglm/train/tf/nb_glm/estimator.py

Lines 321 to 322 in 947e8b0

 loss=batch_model.norm_neg_log_likelihood, 

 variables=[model_vars.params],

batchglm/batchglm/train/tf/nb_glm/estimator.py

Lines 364 to 365 in 947e8b0

 loss=full_data_model.norm_neg_log_likelihood, 

 variables=[model_vars.params],

to gradient = [..] like in the following part:

batchglm/batchglm/train/tf/nb_glm/estimator.py

Lines 328 to 336 in 947e8b0

 gradients=[ 

 ( 

 tf.concat([ 

 tf.gradients(batch_model.norm_neg_log_likelihood, model_vars.a)[0], 

 tf.zeros_like(model_vars.b), 

 ], axis=0), 

 model_vars.params 

 ), 

 ],

add documentation of usage of name spaces in tensorflow via tf.name_scopes

As this stretches across classes, it would be nice to document this in an extra document as this is hard to understand from scratch.

Investigate direct replacement of gradients

It is possible to directly overwrite the gradient of Tensorflow operations.
See also this example.

Instead of directly specifying the Jacobian for each combination of linker method and noise distribution, we could only exchange critical parts of the graph.
An example for this would be to directly specify the gradient of the NegativeBinomial distribution class.

This would eventually help as well to keep the gradient from getting NaN.

Evaluate use of XLA JIT compilation

See https://www.tensorflow.org/performance/xla/jit.
Consider putting certain operations into XLA_jit scopes to fuse them.
When used correctly, this might greatly improve the performance of certain operations.

This gets possible due to tensorflow 1.12 getting XLA support by default.

batch_size control

Allow user to set batch size. Do this as part of training strategy? You can leave defaults as they are now.

Reduce graph building overhead of tensorflow

Re-implement models without tensorflow (HIPS/autograd?) if optimization is not necessary, e.g. if only the hessian should be calculated.
Another option is to add a closed-form hessian.

bug nbglm estimator.train with NR

Loss is increasing if training is called via estimator.train(optim_algo="NR") but not if newton type is called via train_sequence.

Implement batch-wise hessian

Necessary for mini-batch Newton-Raphson

tf graph broken at size factor addition in hessian

shape mismatches after reformulating this to work after #26 was fixed.

Extend docs and link to rtfd server

Dependency versions

pip install yields tensorflow1.11 right now, this seems to produce some issues with keras versioning. We should check this. Secondly, we need at least tf1.10 right now? This is not listed in the setup. Lastly, we might need new dask versions, again this popped up as an issue with a user.

Add option to choose between optimization algorithms

SGD, ADAM, RMS-Prop, ...

Consider merging intercept and slope

Pro:

Fit non-confounded models as-is (Would resolve theislab/diffxpy#6)
remove complexity: no necessity to split every variable into *_intercept and *_slope
would simplify full-sample operations

Con:

Potential problems with more complex models (e.g. RSA)
lots of statistical method require an intercept

Model initialisation with reference model

Why is this

batchglm/batchglm/train/tf/nb_glm/estimator.py

Line 934 in 154e871

init_loc = np.random.uniform(

a random initialisation within a very small interval rather than a constant? I do not see the benefit of introducing this stochasticity but it can break reproducibility. As this additive in log space, I would just set the parameters that did not occur in the full model to zero?

release 0.4

Merge dev into master and do release 0.4 once all public release issues are done.

Extension Error building docs

So I just loaded sphinx v1.8.1 and tried to build the docs using make html, and I got the following error:

Extension error:
Could not import extension sphinx_autodoc_typehints (exception: No module named 'sphinx_autodoc_typehints')
make: *** [Makefile:20: html] Error 2

	loss=batch_model.norm_neg_log_likelihood,
	variables=[model_vars.params],

	loss=full_data_model.norm_neg_log_likelihood,
	variables=[model_vars.params],

	gradients=[
	(
	tf.concat([
	tf.gradients(batch_model.norm_neg_log_likelihood, model_vars.a)[0],
	tf.zeros_like(model_vars.b),
	], axis=0),
	model_vars.params
	),
	],

theislab / batchglm Goto Github PK

batchglm's People

Contributors

Stargazers

Watchers

Forkers

batchglm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs