GithubHelp home page GithubHelp logo

Comments (13)

rayegun avatar rayegun commented on June 25, 2024

The only really feasible way to do this is to set up a second entirely disjoint instance of LBT. This would have to be an entirely separate package and probably JLL though. Which seems pretty crazy to me rather than just setting up an @info telling the user to use MKL on x86.

Why exactly would we want to give users better performance on just UMFPACK when they could have it for every BLAS call already with LBT?

from sparsearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 25, 2024

Is the answer for LinearSolve.jl to just start doing using MKL?

from sparsearrays.jl.

rayegun avatar rayegun commented on June 25, 2024

If it's available for that plat yeah. I really do think we're shooting ourselves in the foot trying to pull out specific MKL functions, when the user probably wants it all. I'm not aware of any functionality where OpenBLAS beats MKL are you?

Whether that should be done in LinSolve or whether we should explicitly inform the user to do it themselves idk

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

At least for this issue, it would be good to have a simple example that shows where UMFPACK + OpenBLAS is slow. I assume you are using 1 openBLAS thread.

from sparsearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 25, 2024

The point is that the threading performance is what is bad, so it has bad scaling. There is currently no situation where UMFPACK benchmarks as the best method https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/SparsePDE/ and benchmarks show very poor performance for OpenBLAS https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/LUFactorization/

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

My question is - does it get better with openblas threading turned off?

from sparsearrays.jl.

rayegun avatar rayegun commented on June 25, 2024

I'm not sure if UMFPACK is multithreaded @ViralBShah. I think it relies on threaded BLAS for performance.

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

Here's Tim's answer on the topic: DrTimothyAldenDavis/SuiteSparse#432 (comment)

Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info for now in Base.

The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default. Even then, AMD has its own BLAS, and all this will fail on Mac ARM. Of course, I feel that the kind of person who wants faster LinearSolve performance, also generally wants faster overall BLAS performance, and it may be best to ask people to pick a better BLAS by default. Eventually we should have a global preference to set the BLAS, rather than doing it through startup.jl.

from sparsearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 25, 2024

Well, I did the LinearSolve.jl change and 🤷 I'm glad we got some information back. I got about 50 happy responses of people saying PDE stuff is magically faster, but 2 bug reports which effectively blow it all down SciML/LinearSolve.jl#427 and will cause it to have to revert. Pretty sad, so this issue has at least been proven to be unsolvable without changes to the binaries and a cause of major (100x) slowdowns in many real-world use cases. And it's pretty clear that the coming parallel UMFPACK will be super good but probably still be slower than KLU if you use OpenBLAS, it's that bad 🤷. So, we really do need a way to build with MKL without LBT since we cannot use the global trigger in general.

The other path would be to actually fix MKL.jl to cover the whole interface (i.e. fixing JuliaLinearAlgebra/MKL.jl#138) and then having LinearAlgebra.jl do this. LinearAlgebra.jl should just check the CPU and load the "right" BLAS.

I'm not aware of any functionality where OpenBLAS beats MKL are you?

Matrix multiplications on AMD Epyc chips. That came up in two responses. Everyone else got either nothing changed or a speedup.

The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default.

I think that is the case.

Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info for now in Base.

We should throw a warning if anyone does A\b with a sparse matrix and has OpenBLAS on. Tim Davis is pretty explicit that it's effectively not correct behavior in his eyes and it would be good to tell users that they are missing a ton of performance in a very known and fixable way. In the meantime, since we are side-stepping LinearAlgebra anyways, we will at least add the warnings to LinearSolve.jl directly.

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

I think those issues are addressable and we should go about fixing them. This is probably the first large scale push for MKL and it is not surprising it uncovered more issues.

Incomplete interface issues should be filed in libblastrampoline. We had picked every single thing we could find (but naturally there might be a few missed), and we even allow for BLAS extensions.

Packages that are not linked against LBT probably should be recompiled. However, if they call openblas directly, it really shouldn't be an issue. The SLICOT one is a bit puzzling, but I'll try to look into that.

from sparsearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on June 25, 2024

Indeed, this was just the first large attempt to break the shackles of OpenBLAS. We will lick our wounds, fix up a few things, and come back to it with a improved packages in a bit. I personally want LinearAlgebra to start shipping MKL on appropriate CPUs within the next year

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

Part of the fix might be this: JuliaLinearAlgebra/MKL.jl#140

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on June 25, 2024

The OpenBLAS perf issue is resolved and also backported. For the SuiteSparse that ships as an stdlib, we won't be able to do this.

If you really want it, maybe there can be an external package, but I have no idea how reliably that will work.

from sparsearrays.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.