theislab / batchglm Goto Github PK
View Code? Open in Web Editor NEWFit generalized linear models in python.
License: BSD 3-Clause "New" or "Revised" License
Fit generalized linear models in python.
License: BSD 3-Clause "New" or "Revised" License
/home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:10: WARNING: Unexpected indentation. /home/icb/malte.luecken/github_packages/diffxpy/diffxpy/testing/base.py:docstring of diffxpy.api.test.two_sample:16: WARNING: Block quote ends without a blank line; unexpected unindent.
--
See #33 (comment)
I have a peak memory load of 45GB on 500 cells, I believe this is during hessian estimation, on a test data set. That should not be the case I think, it s only 10 or so parameters? We should go over step by step whether really only the relevant parts of the overall hessian are computed.
Allows to entirely avoid overheads if model can be estimated entirely in closed form.
I noticed that Newton-type solver converge relatively good in some cases but than bounce off to extremely bad objective values within a single step. This might be because they reach plateaus around the MLE where the quadratic approximation is bad. One easy way to mitigate this, which worked for me in the past, was to evaluate convergence in these cases by step (no batching!):
if loss change is negative, update parameters and continue
else do not update parameters and return
What do you think @Hoeze ?
I think the way this is coded now it we would have to perform the update step, check for this criterium
and revert the parameter update and return if the objective is not improved.
Current implementation of negative binomial distribution uses log_mu and log_r.
Directly providing log_mu and log_r saves computation time and improves numeric stability.
The gradient of a_only / b_only optimizers currently calculates the full params
gradient, i.e. using one of those optimizers does not improve runtime compared to the full optimizer.
However, it would be sufficient to calculate only the gradient of a
or b
in this case.
Debug should inherit all flags from info I think?
Apparently, tensorflow via conda comes with MKL where as it does not via pip:
https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c
This is used in IRLS implementations, check whether stable.
I had to use this double transpose so that the column wise addition of the size factor vector to the mean model matrix works:
I think this is probably still faster then tiling and is more memory efficient, happy to hear thoughts on this syntax though. @Hoeze you can close if you think it is not too bad.
So far I'm doing this:
from batchglm.pkg_constants import TF_CONFIG_PROTO
TF_CONFIG_PROTO.inter_op_parallelism_threads = 1
But every time tensorflow uses the entirety of the server.
Also, running diffxpy with the wald test function using size_factors and constraints was running for about 18 hours on 24 cores for a dataset of ~500 cells. The slowdown is probably due to the huge memory use on 24 cores... I think 272 GB of swap memory was used on a server with 126 GB of RAM
This is confusing for user of diffxpy who do not want to dive into details.
This will be faster than the TF ops with the closed form jacobians, can solve similarly to newton-rhapson.
Currently, using scipy.sparse.csr_matrix is computationally inefficient and hard to integrate with Dask and xarray.
pydata/sparse tries to replace scipy.sparse with a more modern and easier to integrate library. Dask and XArray will depend on this library to support sparse formats.
While scipy.sparse.csr_matrix itself seems to be currently more efficient due to usage of C/C++ CSR format, pydata/sparse could be still the faster option to use because Dask could directly make use of specialized sparse methods.
Using scipy.sparse, we cannot make use of those faster methods without having to wrap each single operation by hand. Also, we are copying a lot of data only to get it from scipy -> dask -> xarray -> Tensorflow.
See also this paper:
http://conference.scipy.org/proceedings/scipy2018/pdfs/hameer_abbasi.pdf
High detail variance models may be hard to fit. We should issue warnings if extreme values occur so that the user is aware that this may have gone wrong.
This
batchglm/batchglm/train/tf/nb_glm/estimator.py
Line 1062 in b4a335f
This didnt happen before. Happens after first step.
Right now, size factors are ignored in init and only used in BasicModelGraph. This is important as some closed forms may not hold?
Traceback (most recent call last): File "<stdin>", line 5, in <module> File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 770, in test_lrt training_strategy=training_strategy, File "/Users/david.fischer/gitDevelopment/diffxpy/diffxpy/base.py", line 653, in _fit estim = test_model.Estimator(input_data=input_data, init_model=init_model, **constructor_args) File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 811, in __init__ extended_summary=extended_summary File "/Users/david.fischer/gitDevelopment/batchglm/batchglm/train/tf/nb_glm/estimator.py", line 479, in __init__ param_nonzero_a = tf.broadcast_to(feature_isnonzero, [num_design_loc_params, num_features]) AttributeError: module 'tensorflow' has no attribute 'broadcast_to'
Should this call be tf.contrib.framework.broadcast_to() instead of tf.broadcast_to() ?
This should be fixable. Maybe no obs for one parameter?
What is this
batchglm/batchglm/models/nb_glm/base.py
Line 66 in 8f93f8d
From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb_glm/estimator.py:370: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True).
From /home/hoelzlwimmerf/Masterarbeit/TF-Helmholtz/batchglm/batchglm/impl/tf/nb/util.py:46: NegativeBinomial.init (from tensorflow.contrib.distributions.python.ops.negative_binomial) is deprecated and will be removed after 2018-10-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributions instead of tf.contrib.distributions.
Disp. model should not be trained with this call, but it is:
test = de.test.wald( data=adata_clust, dmat_loc = dmat_astro_loc.data_vars['design'], dmat_scale = dmat_astro_scale.data_vars['design'], constraints_loc = constraint_mat, constraints_scale = None, size_factors=adata_clust.obs['size_factors'].values, coef_to_test=['condition[T.True]'], training_strategy='QUICK', quick_scale=True, batch_size=None, dtype='float64' )
dmat_scale is only the intercept.
see also #4
miniconda3/lib/python3.6/site-packages/tensorflow/python/util/tf_inspect.py:75: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
return _inspect.getargspec(target)
Decide whether to keep separate fitting by group or to fit all groups in single model to reduce overhead.
@davidsebfischer The gradient has to be adjusted for:
It has to be changed among others at the following lines:
batchglm/batchglm/train/tf/nb_glm/estimator.py
Lines 321 to 322 in 947e8b0
batchglm/batchglm/train/tf/nb_glm/estimator.py
Lines 364 to 365 in 947e8b0
gradient = [..]
like in the following part:batchglm/batchglm/train/tf/nb_glm/estimator.py
Lines 328 to 336 in 947e8b0
As this stretches across classes, it would be nice to document this in an extra document as this is hard to understand from scratch.
It is possible to directly overwrite the gradient of Tensorflow operations.
See also this example.
Instead of directly specifying the Jacobian for each combination of linker method and noise distribution, we could only exchange critical parts of the graph.
An example for this would be to directly specify the gradient of the NegativeBinomial distribution class.
This would eventually help as well to keep the gradient from getting NaN.
See https://www.tensorflow.org/performance/xla/jit.
Consider putting certain operations into XLA_jit scopes to fuse them.
When used correctly, this might greatly improve the performance of certain operations.
This gets possible due to tensorflow 1.12 getting XLA support by default.
Allow user to set batch size. Do this as part of training strategy? You can leave defaults as they are now.
Re-implement models without tensorflow (HIPS/autograd?) if optimization is not necessary, e.g. if only the hessian should be calculated.
Another option is to add a closed-form hessian.
Loss is increasing if training is called via estimator.train(optim_algo="NR") but not if newton type is called via train_sequence.
Necessary for mini-batch Newton-Raphson
shape mismatches after reformulating this to work after #26 was fixed.
pip install yields tensorflow1.11 right now, this seems to produce some issues with keras versioning. We should check this. Secondly, we need at least tf1.10 right now? This is not listed in the setup. Lastly, we might need new dask versions, again this popped up as an issue with a user.
SGD, ADAM, RMS-Prop, ...
Pro:
*_intercept
and *_slope
Con:
Why is this
a random initialisation within a very small interval rather than a constant? I do not see the benefit of introducing this stochasticity but it can break reproducibility. As this additive in log space, I would just set the parameters that did not occur in the full model to zero?Merge dev into master and do release 0.4 once all public release issues are done.
So I just loaded sphinx v1.8.1 and tried to build the docs using make html
, and I got the following error:
Extension error:
Could not import extension sphinx_autodoc_typehints (exception: No module named 'sphinx_autodoc_typehints')
make: *** [Makefile:20: html] Error 2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.