pnkraemer / tornadox Goto Github PK
View Code? Open in Web Editor NEWProbabilistic ODE solvers are fun, but are they fast?
License: MIT License
Probabilistic ODE solvers are fun, but are they fast?
License: MIT License
Let's try to get a decently fast EK1 implementation with diagonal Jacobians going!
How feasible would it be to implement more efficient functions for the diagonals of the Jacobian for the PDE problems? @schmidtjonathan? Right now it's not possible to use the DiagonalEK1
because the process runs out of memory. This might not be an issue if we're fine to only solve the PDE with the EK0, but if we decide that we should use the DiagonalEK1
we will need such a more efficient function.
In the DiagonalEK1, the Jacobian is still taken as
Jx = jnp.diag(ivp.df(t, y))
which assembles the full Jacobian. In mega high dimensions, this is bad.
There is probably a way to get only the diagonal elements via vmap((grad(f_components)), but I have not found a solution yet.
Both solvers take the same arguments, so they should be ordered the same and share their names. I don't overwrite the __init__
in the EK0 and solve
assumes exactly the naming of EK1, so I'd propose to adjust the ODEFilter
accordingly.
Would be great to have a solve(ivp, abstol, reltol, adaptive=True, method="EK1DIAG" etc.) method. Benchmarks will be easier to do and easier to read.
There is a trade-off in choosing an implementation that tracks Cholesky-factors of covariance matrices, and one that tracks "only" generic matrix-square-roots.
The latter can be transformed into the former with a QR decomposition.
This costs, however.
In future prediction steps, it does not matter whether one started with a Cholesky factor instead of a matrix-square-roots.
What are the reasons to use Cholesky factors instead of matrix square-roots? (I cannot think of many, other than that it is neater, which should not be a strong reason IMO). I will edit this list if I can come up with more.
jnp.linalg.solve()
) for computation of Kalman gain or for error estimatino/calirbation. With cholesky factors, we can use a triangular solve jax.scipy.linalg.solve_triangular()
. I remember that such a switch used to provide serious numerical stability improvements.As long as we do not need to invert the posterior covariances, sqrtms should be fine.
There appears to be too much of a clash with a different tornado package (juypter notebooks, pip). We should keep this in mind and perhaps rename.
Options?
tornadox
("x" for "jax")odefiltron
filtron
filtronx
cojo
cojoe
cojox
xX_T0rn4d0_Xx
The code corresponds to https://arxiv.org/abs/2110.11812. Would be good to have this acknowledged in the Readme or somewhere.
Maybe we can write some of the code in a way that it is out-of-the-box jittable.
Let's try to get a decently fast EK0 implementation going!
I think life will be easier if we allow different styles of jacobian arguments in the InitialValueProblem class.
My suggestions are
And perhaps in the future funny things like
The RK data should get to see the full jacobian, because it is there. Radau needs the information, and this way, we can make it not use FD approximations internally.
It seems to be rather efficient to represent the mean of an ODEFilter state as a (nu+1,d) matrix and not as a (d(nu+1),) vector.
For example, this turns the prediction from vec_trick(Phi,m)
to Phi @ m
(Phi is a nu+1xnu+1 matrix), and more importantly, projection from E0 @ mp
respectively vec_trick(E0 @ mp)
to m[0]
(which is faster, and probably more readable too!).
The above might also be useful for the EK1.
Should we do this? Or what would speak against it?
Maybe optional for now. We will see.
Let's try to get a decently fast EK1 implementation with full Jacobians and truncation going!
I think we should consider storing the mean vectors as matrices for the reference implementations ReferenceEK0 and ReferenceEK1 as well.
This is because benchmarks that use methods rather interchangeably always have code that looks like
if isinstance(solver, (ReferenceEK0, ReferenceEK1)):
m = solver.P0 @ mean
else:
m = mean[0]
respectively
try:
m = solver.P0 @ mean
except:
m = mean[0]
both of which unnecessarily inflate the benchmark code (and make it tedious to use solvers interchangeably).
I would definitely choose the matrix representation over the long-vector representation (I think it is not only much cheaper, but also more readable and less error-prone).
Attempting to solve it leads to a huge stack trace for me. If the issue is on my end feel free to close it, otherwise this needs to be fixed (if we want to include that one in the paper).
The complexity-dominant operation in the truncated EK1 is a calculation
jax.scipy.linalg.solve_cholesky((S_sqrtm, True), z)
which is used to update the mean.
Sparse Jacobians will imply sparse S_sqrtm
(or at least sparse S
), and a Krylov method like CG might make the whole shebang faster.
Right now we save dense covariance matrices, but when the dimension is high enough this is a bad idea. In fact, to solve a PDE I locally modified the ODEFilter.solve
function to only save the Kronecker factors. How should we best do this?
My ideas:
save_cov=True
keyword argument, which we can set to false in order to only save the means. This would be the simplest solution, have the best performance, and could be used for all solvers.My favorite right now might be 1, just because it would be the fastest and we might not even care about visualizing the uncertainties (especially for the KroneckerEK0). @pnkraemer what do you think?
There is a smarter first step selection method in https://github.com/google/jax/blob/main/jax/experimental/ode.py, which implements the actual hairer/wanner method, not a simplified version of it.
Because namedtuples are valid pytrees, and thus jit, vmap, and differentiable. In other words, they behave much better with jax than dataclasses
https://jax.readthedocs.io/en/latest/pytrees.html
Edit: relates to #37
Currently, the EK0 uses a full dxd(nu+1)
projection matrix. This is perfectly fine for now, but longterm we should do this more efficiently.
The reason is that this inefficient implementation leads to a solver that costs quadratically in the ODE dimension, whereas the EK0 should only cost at most linearly in d.
There are efficient projection operators in 1d. Let's construct them for higher dimensions and use them there!
Taylor mode is nothing for suuuper high dimensions. IIRC, RK init scales like the KroneckerEK0. This will probably break a plateau, at least for low(ish) orders. (Is there a use case for solving a 1e6-dimensional ODE with a 10th order method???)
to evaluate the performance of the solver properly, we should track nfevals
and njevals
, num_steps. (function evaluations, jacobian evaluations, number of steps)
I have concerns about the tornado.ivpsolve.solve()
function. I have a feeling that I is a probnum relict that we just copied over without thinking about it solving an actual problem we have.
More specifically, I am worried about the following:
It also leads to weird duplications:
All in all, I think this is unnecessary duplication.
My counter proposal would be to remove the ivpsolve() function and do everything with the ODEFilter() objects, i.e. move a solve() method there, as well as a simulate_final_state() method (which discards the intermediate calculations), and add some default behaviour there
This way we could write code like
solver = KroneckerEK0() # dummy for ODEFilters
solver.solve(ivp)
but at the same time allow
solver = KroenckerEK0(num_derivatives=11, initialization=TaylorMode(), steprule=Adaptive())
solver.simulate_final_state(ivp)
and -- more importantly -- if we refactor functionality there, we only have to touch a single place of code!
Sorry for the long post, I needed to get this out somehow.
I think it is cumbersome to have to add the ode dimension as a solver argument.
Instead, we can leave this empty and wait with assembling the prior until initialize() is called -- which sees the ODE dimensioN!
As recently discussed, we want to write the paper for diagonal diffusions (ร la "we assume d independent IWP priors"). The theory will be written for this case, at least for the EK0 and DiagonalEK1, and therefore we should also have this in code.
Most importantly, if this ends up working badly we want to figure it out ASAP since we'd need to change the story once more!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.