Comments (13)
The only really feasible way to do this is to set up a second entirely disjoint instance of LBT. This would have to be an entirely separate package and probably JLL though. Which seems pretty crazy to me rather than just setting up an @info
telling the user to use MKL on x86.
Why exactly would we want to give users better performance on just UMFPACK when they could have it for every BLAS call already with LBT?
from sparsearrays.jl.
Is the answer for LinearSolve.jl to just start doing using MKL
?
from sparsearrays.jl.
If it's available for that plat yeah. I really do think we're shooting ourselves in the foot trying to pull out specific MKL functions, when the user probably wants it all. I'm not aware of any functionality where OpenBLAS beats MKL are you?
Whether that should be done in LinSolve or whether we should explicitly inform the user to do it themselves idk
from sparsearrays.jl.
At least for this issue, it would be good to have a simple example that shows where UMFPACK + OpenBLAS is slow. I assume you are using 1 openBLAS thread.
from sparsearrays.jl.
The point is that the threading performance is what is bad, so it has bad scaling. There is currently no situation where UMFPACK benchmarks as the best method https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/SparsePDE/ and benchmarks show very poor performance for OpenBLAS https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/LUFactorization/
from sparsearrays.jl.
My question is - does it get better with openblas threading turned off?
from sparsearrays.jl.
I'm not sure if UMFPACK is multithreaded @ViralBShah. I think it relies on threaded BLAS for performance.
from sparsearrays.jl.
Here's Tim's answer on the topic: DrTimothyAldenDavis/SuiteSparse#432 (comment)
Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info
for now in Base.
The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default. Even then, AMD has its own BLAS, and all this will fail on Mac ARM. Of course, I feel that the kind of person who wants faster LinearSolve performance, also generally wants faster overall BLAS performance, and it may be best to ask people to pick a better BLAS by default. Eventually we should have a global preference to set the BLAS, rather than doing it through startup.jl.
from sparsearrays.jl.
Well, I did the LinearSolve.jl change and 🤷 I'm glad we got some information back. I got about 50 happy responses of people saying PDE stuff is magically faster, but 2 bug reports which effectively blow it all down SciML/LinearSolve.jl#427 and will cause it to have to revert. Pretty sad, so this issue has at least been proven to be unsolvable without changes to the binaries and a cause of major (100x) slowdowns in many real-world use cases. And it's pretty clear that the coming parallel UMFPACK will be super good but probably still be slower than KLU if you use OpenBLAS, it's that bad 🤷. So, we really do need a way to build with MKL without LBT since we cannot use the global trigger in general.
The other path would be to actually fix MKL.jl to cover the whole interface (i.e. fixing JuliaLinearAlgebra/MKL.jl#138) and then having LinearAlgebra.jl do this. LinearAlgebra.jl should just check the CPU and load the "right" BLAS.
I'm not aware of any functionality where OpenBLAS beats MKL are you?
Matrix multiplications on AMD Epyc chips. That came up in two responses. Everyone else got either nothing changed or a speedup.
The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default.
I think that is the case.
Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info for now in Base.
We should throw a warning if anyone does A\b
with a sparse matrix and has OpenBLAS on. Tim Davis is pretty explicit that it's effectively not correct behavior in his eyes and it would be good to tell users that they are missing a ton of performance in a very known and fixable way. In the meantime, since we are side-stepping LinearAlgebra anyways, we will at least add the warnings to LinearSolve.jl directly.
from sparsearrays.jl.
I think those issues are addressable and we should go about fixing them. This is probably the first large scale push for MKL and it is not surprising it uncovered more issues.
Incomplete interface issues should be filed in libblastrampoline. We had picked every single thing we could find (but naturally there might be a few missed), and we even allow for BLAS extensions.
Packages that are not linked against LBT probably should be recompiled. However, if they call openblas directly, it really shouldn't be an issue. The SLICOT one is a bit puzzling, but I'll try to look into that.
from sparsearrays.jl.
Indeed, this was just the first large attempt to break the shackles of OpenBLAS. We will lick our wounds, fix up a few things, and come back to it with a improved packages in a bit. I personally want LinearAlgebra to start shipping MKL on appropriate CPUs within the next year
from sparsearrays.jl.
Part of the fix might be this: JuliaLinearAlgebra/MKL.jl#140
from sparsearrays.jl.
The OpenBLAS perf issue is resolved and also backported. For the SuiteSparse that ships as an stdlib, we won't be able to do this.
If you really want it, maybe there can be an external package, but I have no idea how reliably that will work.
from sparsearrays.jl.
Related Issues (20)
- Single precision support in CHOLMOD HOT 3
- Cholesky F.PtL \ Av where Av is a view does not work HOT 1
- Base.stack is underperforming for SparseArrays HOT 3
- Regression in invalidations caused by SparseArrays HOT 3
- Elementwise multiplication by a view of a dense matrix gives a dense matrix
- `findmin(A; dims=1)` is much slower than manually looping over. HOT 1
- Sparse array of string types HOT 17
- Memory Mapped SparseArrays HOT 3
- Extra allocations when using generalized `mul!` operation
- Attempting to run sparse `qr` produces StackOverflow when run on a sparse matrix of `ForwardDiff.Dual`. HOT 6
- Inconsistent addition between sparse and dense HOT 1
- `ldiv` of `LUFactorization` can throw `SingluarException` HOT 1
- Thread-safe dropstored! HOT 1
- Merge SparseMatricesCSR.jl in HOT 2
- Support zero-based indices HOT 3
- Windows threading tests fail in GitHub Actions CI but pass in Buildkite CI
- Problem when running old benchmarks in Oceananigans HOT 6
- Sparse matrix format interfaces HOT 3
- Clarify Cholmod version incompatibility message at build time and run time HOT 7
- Row-wise and column-wise scaling of a sparse matrix runs out of memory HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparsearrays.jl.