rgl-epfl / cholespy Goto Github PK

An easily integrable Cholesky solver on CPU and GPU

License: BSD 3-Clause "New" or "Revised" License

CMake 1.90% Makefile 0.14% Cuda 3.45% C 65.80% Python 5.23% C++ 16.58% Jupyter Notebook 6.91%

cholespy's Introduction

What is this repo?

This is a minimalistic, self-contained sparse Cholesky solver, supporting solving both on the CPU and on the GPU, easily integrable in your tensor pipeline.

When we were working on our "Large Steps in Inverse Rendering of Geometry" paper [1], we found it quite challenging to hook up an existing sparse linear solver to our pipeline, and we managed to do so by adding dependencies on large projects (i.e. cusparse and scikit-sparse), only to use a small part of its functionality. Therefore, we decided to implement our own library, that serves one purpose: efficiently solving sparse linear systems on the GPU or CPU, using a Cholesky factorization.

Under the hood, it relies on CHOLMOD for sparse matrix factorization. For the solving phase, it uses CHOLMOD for the CPU version, and uses the result of an analysis step run once when building the solver for fast solving on the GPU [2].

It achieves comparable performance as other frameworks, with the dependencies nicely shipped along.

Benchmark run on a Linux Ryzen 3990X workstation with a TITAN RTX.

The Python bindings are generated with nanobind, which makes it easily interoperable with most tensor frameworks (Numpy, PyTorch, JAX...)

Installing

With PyPI (recommended)

pip install cholespy

From source

git clone --recursive https://github.com/rgl-epfl/cholespy
pip install ./cholespy

Documentation

There is only one class in the module, with two variants: CholeskySolverF, CholeskySolverD. The only difference is that CholeskySolverF solves the system in single precision while CholeskySolverD uses double precision. This is mostly useful for solving on the GPU, as the CPU version relies on CHOLMOD, which only supports double precision anyway.

The most common tensor frameworks (PyTorch, NumPy, TensorFlow...) are supported out of the box. You can pass them directly to the module without any need for manual conversion.

Since both variants have the same signature, we only detail CholeskySolverF below:

cholespy.CholeskySolverF(n_rows, ii, jj, x, type)

Parameters:

n_rows - The number of rows in the (sparse) matrix.
ii - The first array of indices in the sparse matrix representation. If type is COO, then this is the array of row indices. If it is CSC (resp. CSR), then it is the array of column (resp. row) indices, such that row (resp. column) indices for column (resp. row) k are stored in jj[ii[k]:ii[k+1]] and the corresponding entries are in x[ii[k]:ii[k+1]].
jj - The second array of indices in the sparse matrix representation. If type is COO, then this is the array of column indices. If it is CSC (resp. CSR), then it is the array of row (resp. column) indices.
x - The array of nonzero entries.
type - The matrix representation type, of type MatrixType. Available types are MatrixType.COO, MatrixType.CSC and MatrixType.CSR.

cholespy.CholeskySolverF.solve(b, x)

Parameters

b - Right-hand side of the equation to solve. Can be a vector or a matrix. If it is a matrix, it must be of shape (n_rows, n_rhs). It must be on the same device as the tensors passed to the solver constructor. If using CUDA arrays, then the maximum supported value for n_rhs is 128.
x - Placeholder for the solution. It must be on the same device and have the same shape as b.

x and b must have the same dtype as the solver used, i.e. float32 for CholeskySolverF or float64 for CholeskySolverD. Since x is modified in place, implicit type conversion is not supported.

Example usage

from cholespy import CholeskySolverF, MatrixType
import torch

# Identity matrix
n_rows = 20
rows = torch.arange(n_rows, device='cuda')
cols = torch.arange(n_rows, device='cuda')
data = torch.ones(n_rows, device='cuda')

solver = CholeskySolverF(n_rows, rows, cols, data, MatrixType.COO)

b = torch.ones(n_rows, device='cuda')
x = torch.zeros_like(b)

solver.solve(b, x)
# b = [1, ..., 1]

References

[1] Nicolet, B., Jacobson, A., & Jakob, W. (2021). Large steps in inverse rendering of geometry. ACM Transactions on Graphics (TOG), 40(6), 1-13.

[2] Naumov, M. (2011). Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Corp., Westford, MA, USA, Tech. Rep. NVR-2011, 1.

cholespy's People

Contributors

Stargazers

Watchers

Forkers

baldrlector chengwei920412 laplacekorea hyeonjang jackzhousz boykovdn peterzs ndfcampbell dendenxu chaphlagical phillcli leonsun01 tansey-lab samuelpmish steelwall2014 kacperkan armanmaesumi jonathsch

cholespy's Issues

cholespy does no longer compile due to changing nb::tensor to nb::ndarray in nanobind

I had to change the corresponding specifications in main.cpp

#include <nanobind/nanobind.h>
#include <nanobind/ndarray.h>

and

nb::tensor -> nb::ndarray

to make it work again.

Batched implementation

Hey,
First, thanks a lot for the repo!
Is there a way to use cholespy with a batch of matrices? If not, what would you suggest as the best approach?
Thanks!

Incorrect results for some problems when using cupy in CholeskySolverF

Thanks for publishing this project! I want to see if I can use it for solving least squares problems directly with A.T.dot(A.dot(x)) = A.T.dot(b). To see if this works I played around with some simple test with some made up values

A = array(
    [[1., 2., 3.],
     [4., 5., 6.],
     [7., 8., 9.],
     [7., 8., 9.]]
)
b = array([0.1, 0.5, -0.2, -0.1])

I then pass the details from a sparse COO representation of A.T.dot(A) to instantiate the CholeskySolverF:

solver = CholeskySolverF(
   3,
   array([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype=int32),
   array([0, 1, 2, 0, 1, 2, 0, 1, 2], dtype=int32),
   array([115., 134., 153., 134., 157., 180., 153., 180., 207.]),
   MatrixType.COO,
)
x = zeros_like(b)
# b = A.T.dot(b)
solver.solve(array([-8.32667268e-17,  3.00000000e-01,  6.00000000e-01]), x)

When I use numpy for this entire procedure it seems to work fine and the test passes. The answer I get is [-0.29087737, 0.11811837, 0.11518325]. For cupy I get a clearly incorrect answer which looks a lot like some overflow issue is going on: [ 65558.29 , -131117.05 , 65558.695]. By using CholeskySolverD both numpy and cupy give me a reasonable answer.

I thought I would report this here. Perhaps if there runaway numerical errors occurring the code could give a warning/error?

Moving solver between devices (multiple GPUs)

Hi, thank you for the great package.

I'm wondering if it's possible to move a CholeskySolverF between different devices. Currently the solver is placed on the device corresponding to its input data.

In my context I have many cholesky solver objects in a large dataset, and I need to readily move them to different gpus inside a torch training loop. Is this possible?

Thanks.

CHOLMOD error: problem too large

I'm trying to reproduce the tutorial (here) of the paper "Large Steps in Inverse Rendering of Geometry" with a mesh that contains 2mln vertices and 21mln triangles. However, when running the Cholesky decomposition to solve the system (with 26mln nnz in total), I get the following error:

CHOLMOD error: problem too large. file: /project/ext/suitesparse-metis-for-windows/SuiteSparse/CHOLMOD/Include/../Core/cholmod_change_factor.c line: 536
CHOLMOD error: invalid xtype. file: /project/ext/suitesparse-metis-for-windows/SuiteSparse/CHOLMOD/Include/../Core/cholmod_factor.c line: 618
CHOLMOD error: argument missing. file: /project/ext/suitesparse-metis-for windows/SuiteSparse/CHOLMOD/Include/../Core/cholmod_transpose.c line: 897

Why is this the case? Is the matrix too "dense" in the end? I've found this issue and one answer suggests changing the storage of indices to long int. However, I'm not sure if that would solve the issue.

Does cholespy keep track of gradients during the solve?

Hello! I am using Cholespy to compute the gradient dx/db for backpropagation after solving a sparse linear system Lx=b. But I notice the x.grad_fn will be None after the cholespy solve.

Just wondering, does cholespy support keeping track of the gradients during the solve? If not, do you have any suggestion about other packages that support that? Thanks!

Thank you so much for providing this awesome library!! :)

Compile error -> replace nb::any in main.cpp with -1

The removal of the alias nb::any = -1 from nanobind leads to a compile error in cholespy. Replacing nb::any in main.cpp fixes the problem

[CPU] cholespy very slow compared to scikit-sparse (factor ~15)

Hi, really nice to have a sparse cholesky solver which is compatible with windows out of the box!

Can you please verify I'm doing everything correctly? I have a lot longer runtime compared to scikit-sparse ~ factor 15.

I am not using any TPU / GPU, just plane CPU and numpy / scipy.

I use a lower triangle sparse matrix K_iso in CSC format (I also tested COO, same results) and a sparse load vector f_csc

K_iso
<39624x39624 sparse array of type '<class 'numpy.float64'>'
	with 848667 stored elements in Compressed Sparse Column format>
f_csc
<39624x1 sparse array of type '<class 'numpy.float64'>'
	with 3033 stored elements in Compressed Sparse Column format>

scikit-sparse run takes 0.82 s

from timeit import default_timer
from sksparse.cholmod import cholesky

start_time = default_timer()
factor = cholesky(K_iso)
u_iso = factor.solve_A(f_csc)
print(f"Done ({default_timer() - start_time:.2f} s)")

# Done (0.82 s)

cholespy run (double precision) takes 13.19 s - of which CholeskySolverD takes allmost time (13.18 s)

from timeit import default_timer
from cholespy import CholeskySolverD, MatrixType

x = np.empty(K_iso.shape[0])
f = f_csc.todense().squeeze()

start_time = default_timer()
solver = CholeskySolverD(K_iso.shape[0], K_iso.indptr, K_iso.indices, K_iso.data, MatrixType.CSC)
solver.solve(f, x)
print(f"Done ({default_timer() - start_time:.2f} s)")

# Done (13.19 s)

The result is exactly the same

np.allclose(x, u_iso.todense().squeeze())

# True

Saving the decomposition to disk

This is great! Is there support for saving the cholesky decomposition object to disk, so it can be cached for each sample in a training set, and then loaded into memory to immediately compute the solve without performing the decomposition again for the same training sample?

[BUG REPORT] "an illegal memory access was encountered" and "nanobind leak"

When I used joblib.Parallel with loky backend to launch multiple jobs in parallel, the below error occurred:

cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:473.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:474.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:475.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:476.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:477.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:478.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:479.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:480.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:481.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:482.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:483.
cuda_check(): API error = 0700 (CUDA_ERROR_ILLEGAL_ADDRESS): "an illegal memory access was encountered" in /project/src/cholesky_solver.cpp:484.

Also, the GPU memory allocation was strange: multiple processes allocated memory on GPU 0.

I tried to delete the corresponding code but it did not work 😢.
Would your mind give any suggestions? Thanks in advance!

ValueError: Sparse CSR matrix: Invalid size for row pointer array

Sorry, no problem now. Please delete.