pnkraemer / tornadox Goto Github PK

View Code? Open in Web Editor NEW

19.0 3.0 0.0 467 KB

Probabilistic ODE solvers are fun, but are they fast?

License: MIT License

Python 97.17% Jupyter Notebook 2.83%

probabilistic-numerics differential-equation-solvers

tornadox's People

Contributors

Stargazers

Watchers

tornadox's Issues

EK1 "Diagonal"

Let's try to get a decently fast EK1 implementation with diagonal Jacobians going!

More efficient diagonal Jacobians for the PDEs

How feasible would it be to implement more efficient functions for the diagonals of the Jacobian for the PDE problems? @schmidtjonathan? Right now it's not possible to use the DiagonalEK1 because the process runs out of memory. This might not be an issue if we're fine to only solve the PDE with the EK0, but if we decide that we should use the DiagonalEK1 we will need such a more efficient function.

Diagonal EK1 Todo

In the DiagonalEK1, the Jacobian is still taken as

Jx = jnp.diag(ivp.df(t, y))

which assembles the full Jacobian. In mega high dimensions, this is bad.
There is probably a way to get only the diagonal elements via vmap((grad(f_components)), but I have not found a solution yet.

Inconsistent `init` in EK1 and ODEFilter

Both solvers take the same arguments, so they should be ordered the same and share their names. I don't overwrite the __init__ in the EK0 and solve assumes exactly the naming of EK1, so I'd propose to adjust the ODEFilter accordingly.

InitialValueProblems

solve() method

Would be great to have a solve(ivp, abstol, reltol, adaptive=True, method="EK1DIAG" etc.) method. Benchmarks will be easier to do and easier to read.

Cholesky factors or generic matrix-square-roots

There is a trade-off in choosing an implementation that tracks Cholesky-factors of covariance matrices, and one that tracks "only" generic matrix-square-roots.

The latter can be transformed into the former with a QR decomposition.
This costs, however.
In future prediction steps, it does not matter whether one started with a Cholesky factor instead of a matrix-square-roots.

What are the reasons to use Cholesky factors instead of matrix square-roots? (I cannot think of many, other than that it is neater, which should not be a strong reason IMO). I will edit this list if I can come up with more.

Full jacobians lead to a dense(ish) observation matrix H, and S can either be defined with a square-root or with a Cholesky factor. In the former case, we need to use a LU decomposition to solve it (jnp.linalg.solve()) for computation of Kalman gain or for error estimatino/calirbation. With cholesky factors, we can use a triangular solve jax.scipy.linalg.solve_triangular(). I remember that such a switch used to provide serious numerical stability improvements.

As long as we do not need to invert the posterior covariances, sqrtms should be fine.

Package name

There appears to be too much of a clash with a different tornado package (juypter notebooks, pip). We should keep this in mind and perhaps rename.

Options?

tornadox ("x" for "jax")
odefiltron
filtron
filtronx
cojo
cojoe
cojox
xX_T0rn4d0_Xx

Citation

The code corresponds to https://arxiv.org/abs/2110.11812. Would be good to have this acknowledged in the Readme or somewhere.

ODESolver interface

Integrated wiener processes + discretise + preconditioning

JIT

Maybe we can write some of the code in a way that it is out-of-the-box jittable.

https://jax.readthedocs.io/en/latest/jax.html#jax.jit

EK0

Let's try to get a decently fast EK0 implementation going!

Diagonal Jacobians (and more?)

I think life will be easier if we allow different styles of jacobian arguments in the InitialValueProblem class.
My suggestions are

df as function: df(t,y)-> jacobian (currently this exists)
df_diag as a function: df(t,y) -> diagonal-of-jacobian (this makes DiagonalEK1 life easier -- see #61)

And perhaps in the future funny things like

jvp: jacobian-vector-product (and vjp as well -- who knows)

RK init + Radau should get the jacobian

The RK data should get to see the full jacobian, because it is there. Radau needs the information, and this way, we can make it not use FD approximations internally.

Representation of ODEFilter means?

It seems to be rather efficient to represent the mean of an ODEFilter state as a (nu+1,d) matrix and not as a (d(nu+1),) vector.
For example, this turns the prediction from vec_trick(Phi,m) to Phi @ m (Phi is a nu+1xnu+1 matrix), and more importantly, projection from E0 @ mp respectively vec_trick(E0 @ mp) to m[0] (which is faster, and probably more readable too!).

The above might also be useful for the EK1.

Should we do this? Or what would speak against it?

Square-root backward implementations

Maybe optional for now. We will see.

EK1 "Truncated"

Let's try to get a decently fast EK1 implementation with full Jacobians and truncation going!

Reference means as matrix

I think we should consider storing the mean vectors as matrices for the reference implementations ReferenceEK0 and ReferenceEK1 as well.
This is because benchmarks that use methods rather interchangeably always have code that looks like

if isinstance(solver, (ReferenceEK0, ReferenceEK1)):
    m = solver.P0 @ mean
else:
    m = mean[0]

respectively

try:
    m = solver.P0 @ mean
except:
    m = mean[0]

both of which unnecessarily inflate the benchmark code (and make it tedious to use solvers interchangeably).

I would definitely choose the matrix representation over the long-vector representation (I think it is not only much cheaper, but also more readable and less error-prone).

Continuous integration

Lorenz96 is not working

Attempting to solve it leads to a huge stack trace for me. If the issue is on my end feel free to close it, otherwise this needs to be fixed (if we want to include that one in the paper).

Square-root forward implementations

Projection matrices

Krylov methods in truncated EK1

The complexity-dominant operation in the truncated EK1 is a calculation

jax.scipy.linalg.solve_cholesky((S_sqrtm, True), z)

which is used to update the mean.

Sparse Jacobians will imply sparse S_sqrtm (or at least sparse S), and a Krylov method like CG might make the whole shebang faster.

How to save covariances for very high-dimensional problems?

Right now we save dense covariance matrices, but when the dimension is high enough this is a bad idea. In fact, to solve a PDE I locally modified the ODEFilter.solve function to only save the Kronecker factors. How should we best do this?

My ideas:

Add a save_cov=True keyword argument, which we can set to false in order to only save the means. This would be the simplest solution, have the best performance, and could be used for all solvers.
Define covariance classes for block diagonals and Kronecker-structured matrices. Might be the most elegant, but is more implmementation effort.
Just hard-code the KroneckerEK0 to save only the Kronecker factor, and handle these later if we ever want to look at the uncertainties (though they are not particularly interesting to look at anyways I'd suspect).

My favorite right now might be 1, just because it would be the fastest and we might not even care about visualizing the uncertainties (especially for the KroneckerEK0). @pnkraemer what do you think?

First step

There is a smarter first step selection method in https://github.com/google/jax/blob/main/jax/experimental/ode.py, which implements the actual hairer/wanner method, not a simplified version of it.

AdaptiveSteps and ConstantSteps

MultivariateNormal

Consider replacing dataclasses with namedtuples

Because namedtuples are valid pytrees, and thus jit, vmap, and differentiable. In other words, they behave much better with jax than dataclasses

https://jax.readthedocs.io/en/latest/pytrees.html

Edit: relates to #37

More efficient projection matrices

Currently, the EK0 uses a full dxd(nu+1) projection matrix. This is perfectly fine for now, but longterm we should do this more efficiently.
The reason is that this inefficient implementation leads to a solver that costs quadratically in the ODE dimension, whereas the EK0 should only cost at most linearly in d.

There are efficient projection operators in 1d. Let's construct them for higher dimensions and use them there!

RK init

Taylor mode is nothing for suuuper high dimensions. IIRC, RK init scales like the KroneckerEK0. This will probably break a plateau, at least for low(ish) orders. (Is there a use case for solving a 1e6-dimensional ODE with a 10th order method???)

Stats

to evaluate the performance of the solver properly, we should track nfevals and njevals, num_steps. (function evaluations, jacobian evaluations, number of steps)

Solve interface

I have concerns about the tornado.ivpsolve.solve() function. I have a feeling that I is a probnum relict that we just copied over without thinking about it solving an actual problem we have.

More specifically, I am worried about the following:

There is too much logic hidden in the function! Alone the "infer whether the user wants fixed or adaptive steps" bit eats up quite a lot of logic and depends on a few parameters being set, and a few others not being set at the same time.
The output type depends on input parameters: if save_everystep is True, the output is a stack of states; if it is false, it is not. I don't like this.
If our experiment code uses the solve() function, but we spontaneously decide that we want to track a different statistic about the solution additionally (like number of steps), we have to change the whole setup and move from the solve() method to the class setup. It would've been better to be nudged towards the class setup in the first place.

It also leads to weird duplications:

If we spontaneously decide to use a different mode of initialisation, we have to change it in the ode filter as well as in the convenience function, and make the same change in two places
If we add another solver, we have to not only write it but add it to the registry in solve(), come up with a string description, and make sure the tests of ivpsolve() understand that there is a new sheriff in town

All in all, I think this is unnecessary duplication.

My counter proposal would be to remove the ivpsolve() function and do everything with the ODEFilter() objects, i.e. move a solve() method there, as well as a simulate_final_state() method (which discards the intermediate calculations), and add some default behaviour there

maybe no more ode_dimension (the prior can be assembled in initialize())
init strategy gets a default (I suggest either TM all the time; or RK for nu<5, and TM for nu >= 5)
steprule get a default (adaptive with rtol and atol as in scipy (times 10 each))
order maybe gets a default, too (4?)

This way we could write code like

solver = KroneckerEK0()  # dummy for ODEFilters
solver.solve(ivp)

but at the same time allow

solver = KroenckerEK0(num_derivatives=11, initialization=TaylorMode(), steprule=Adaptive())
solver.simulate_final_state(ivp)

and -- more importantly -- if we refactor functionality there, we only have to touch a single place of code!

Sorry for the long post, I needed to get this out somehow.

pnkraemer / tornadox Goto Github PK

tornadox's People

Contributors

Stargazers

Watchers

tornadox's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs