GithubHelp home page GithubHelp logo

pnkraemer / tornadox Goto Github PK

View Code? Open in Web Editor NEW
19.0 3.0 0.0 467 KB

Probabilistic ODE solvers are fun, but are they fast?

License: MIT License

Python 97.17% Jupyter Notebook 2.83%
probabilistic-numerics differential-equation-solvers

tornadox's People

Contributors

nathanaelbosch avatar pnkraemer avatar schmidtjonathan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tornadox's Issues

EK1 "Diagonal"

Let's try to get a decently fast EK1 implementation with diagonal Jacobians going!

More efficient diagonal Jacobians for the PDEs

How feasible would it be to implement more efficient functions for the diagonals of the Jacobian for the PDE problems? @schmidtjonathan? Right now it's not possible to use the DiagonalEK1 because the process runs out of memory. This might not be an issue if we're fine to only solve the PDE with the EK0, but if we decide that we should use the DiagonalEK1 we will need such a more efficient function.

Diagonal EK1 Todo

In the DiagonalEK1, the Jacobian is still taken as

Jx = jnp.diag(ivp.df(t, y))

which assembles the full Jacobian. In mega high dimensions, this is bad.
There is probably a way to get only the diagonal elements via vmap((grad(f_components)), but I have not found a solution yet.

Inconsistent `__init__` in EK1 and ODEFilter

Both solvers take the same arguments, so they should be ordered the same and share their names. I don't overwrite the __init__ in the EK0 and solve assumes exactly the naming of EK1, so I'd propose to adjust the ODEFilter accordingly.

solve() method

Would be great to have a solve(ivp, abstol, reltol, adaptive=True, method="EK1DIAG" etc.) method. Benchmarks will be easier to do and easier to read.

Cholesky factors or generic matrix-square-roots

There is a trade-off in choosing an implementation that tracks Cholesky-factors of covariance matrices, and one that tracks "only" generic matrix-square-roots.

The latter can be transformed into the former with a QR decomposition.
This costs, however.
In future prediction steps, it does not matter whether one started with a Cholesky factor instead of a matrix-square-roots.

What are the reasons to use Cholesky factors instead of matrix square-roots? (I cannot think of many, other than that it is neater, which should not be a strong reason IMO). I will edit this list if I can come up with more.

  • Full jacobians lead to a dense(ish) observation matrix H, and S can either be defined with a square-root or with a Cholesky factor. In the former case, we need to use a LU decomposition to solve it (jnp.linalg.solve()) for computation of Kalman gain or for error estimatino/calirbation. With cholesky factors, we can use a triangular solve jax.scipy.linalg.solve_triangular(). I remember that such a switch used to provide serious numerical stability improvements.

As long as we do not need to invert the posterior covariances, sqrtms should be fine.

Package name

There appears to be too much of a clash with a different tornado package (juypter notebooks, pip). We should keep this in mind and perhaps rename.

Options?

  • tornadox ("x" for "jax")
  • odefiltron
  • filtron
  • filtronx
  • cojo
  • cojoe
  • cojox
  • xX_T0rn4d0_Xx

EK0

Let's try to get a decently fast EK0 implementation going!

Diagonal Jacobians (and more?)

I think life will be easier if we allow different styles of jacobian arguments in the InitialValueProblem class.
My suggestions are

  • df as function: df(t,y)-> jacobian (currently this exists)
  • df_diag as a function: df(t,y) -> diagonal-of-jacobian (this makes DiagonalEK1 life easier -- see #61)

And perhaps in the future funny things like

  • jvp: jacobian-vector-product (and vjp as well -- who knows)

RK init + Radau should get the jacobian

The RK data should get to see the full jacobian, because it is there. Radau needs the information, and this way, we can make it not use FD approximations internally.

Representation of ODEFilter means?

It seems to be rather efficient to represent the mean of an ODEFilter state as a (nu+1,d) matrix and not as a (d(nu+1),) vector.
For example, this turns the prediction from vec_trick(Phi,m) to Phi @ m (Phi is a nu+1xnu+1 matrix), and more importantly, projection from E0 @ mp respectively vec_trick(E0 @ mp) to m[0] (which is faster, and probably more readable too!).

The above might also be useful for the EK1.

Should we do this? Or what would speak against it?

EK1 "Truncated"

Let's try to get a decently fast EK1 implementation with full Jacobians and truncation going!

Reference means as matrix

I think we should consider storing the mean vectors as matrices for the reference implementations ReferenceEK0 and ReferenceEK1 as well.
This is because benchmarks that use methods rather interchangeably always have code that looks like

if isinstance(solver, (ReferenceEK0, ReferenceEK1)):
    m = solver.P0 @ mean
else:
    m = mean[0]

respectively

try:
    m = solver.P0 @ mean
except:
    m = mean[0]

both of which unnecessarily inflate the benchmark code (and make it tedious to use solvers interchangeably).

I would definitely choose the matrix representation over the long-vector representation (I think it is not only much cheaper, but also more readable and less error-prone).

Lorenz96 is not working

Attempting to solve it leads to a huge stack trace for me. If the issue is on my end feel free to close it, otherwise this needs to be fixed (if we want to include that one in the paper).

Krylov methods in truncated EK1

The complexity-dominant operation in the truncated EK1 is a calculation

jax.scipy.linalg.solve_cholesky((S_sqrtm, True), z)

which is used to update the mean.

Sparse Jacobians will imply sparse S_sqrtm (or at least sparse S), and a Krylov method like CG might make the whole shebang faster.

How to save covariances for very high-dimensional problems?

Right now we save dense covariance matrices, but when the dimension is high enough this is a bad idea. In fact, to solve a PDE I locally modified the ODEFilter.solve function to only save the Kronecker factors. How should we best do this?

My ideas:

  1. Add a save_cov=True keyword argument, which we can set to false in order to only save the means. This would be the simplest solution, have the best performance, and could be used for all solvers.
  2. Define covariance classes for block diagonals and Kronecker-structured matrices. Might be the most elegant, but is more implmementation effort.
  3. Just hard-code the KroneckerEK0 to save only the Kronecker factor, and handle these later if we ever want to look at the uncertainties (though they are not particularly interesting to look at anyways I'd suspect).

My favorite right now might be 1, just because it would be the fastest and we might not even care about visualizing the uncertainties (especially for the KroneckerEK0). @pnkraemer what do you think?

More efficient projection matrices

Currently, the EK0 uses a full dxd(nu+1) projection matrix. This is perfectly fine for now, but longterm we should do this more efficiently.
The reason is that this inefficient implementation leads to a solver that costs quadratically in the ODE dimension, whereas the EK0 should only cost at most linearly in d.

There are efficient projection operators in 1d. Let's construct them for higher dimensions and use them there!

RK init

Taylor mode is nothing for suuuper high dimensions. IIRC, RK init scales like the KroneckerEK0. This will probably break a plateau, at least for low(ish) orders. (Is there a use case for solving a 1e6-dimensional ODE with a 10th order method???)

Stats

to evaluate the performance of the solver properly, we should track nfevals and njevals, num_steps. (function evaluations, jacobian evaluations, number of steps)

Solve interface

I have concerns about the tornado.ivpsolve.solve() function. I have a feeling that I is a probnum relict that we just copied over without thinking about it solving an actual problem we have.

More specifically, I am worried about the following:

  • There is too much logic hidden in the function! Alone the "infer whether the user wants fixed or adaptive steps" bit eats up quite a lot of logic and depends on a few parameters being set, and a few others not being set at the same time.
  • The output type depends on input parameters: if save_everystep is True, the output is a stack of states; if it is false, it is not. I don't like this.
  • If our experiment code uses the solve() function, but we spontaneously decide that we want to track a different statistic about the solution additionally (like number of steps), we have to change the whole setup and move from the solve() method to the class setup. It would've been better to be nudged towards the class setup in the first place.

It also leads to weird duplications:

  • If we spontaneously decide to use a different mode of initialisation, we have to change it in the ode filter as well as in the convenience function, and make the same change in two places
  • If we add another solver, we have to not only write it but add it to the registry in solve(), come up with a string description, and make sure the tests of ivpsolve() understand that there is a new sheriff in town

All in all, I think this is unnecessary duplication.

My counter proposal would be to remove the ivpsolve() function and do everything with the ODEFilter() objects, i.e. move a solve() method there, as well as a simulate_final_state() method (which discards the intermediate calculations), and add some default behaviour there

  • maybe no more ode_dimension (the prior can be assembled in initialize())
  • init strategy gets a default (I suggest either TM all the time; or RK for nu<5, and TM for nu >= 5)
  • steprule get a default (adaptive with rtol and atol as in scipy (times 10 each))
  • order maybe gets a default, too (4?)

This way we could write code like

solver = KroneckerEK0()  # dummy for ODEFilters
solver.solve(ivp)

but at the same time allow

solver = KroenckerEK0(num_derivatives=11, initialization=TaylorMode(), steprule=Adaptive())
solver.simulate_final_state(ivp)

and -- more importantly -- if we refactor functionality there, we only have to touch a single place of code!

Sorry for the long post, I needed to get this out somehow.

ODE dimension init

I think it is cumbersome to have to add the ode dimension as a solver argument.
Instead, we can leave this empty and wait with assembling the prior until initialize() is called -- which sees the ODE dimensioN!

Migrate the EK0 and DiagonalEK1 to a diagonal diffusion model

As recently discussed, we want to write the paper for diagonal diffusions (ร  la "we assume d independent IWP priors"). The theory will be written for this case, at least for the EK0 and DiagonalEK1, and therefore we should also have this in code.

Most importantly, if this ends up working badly we want to figure it out ASAP since we'd need to change the story once more!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.