GithubHelp home page GithubHelp logo

Comments (6)

sevagh avatar sevagh commented on May 31, 2024

It's probably really easy, actually. Part of my CMakeLists file involves a few different BLAS vendor libraries: https://github.com/sevagh/demucs.cpp/blob/main/CMakeLists.txt#L35

I believe NVBLAS is a cuBLAS wrapper that automatically handles host-device transfers. So if you link against NVBLAS in this codebase, it should work right away.

from demucs.cpp.

sevagh avatar sevagh commented on May 31, 2024

OK, not that easy - NVBLAS (i.e. cuBLAS) provides sgemm (matrix-matrix multiply) but not sgemv (matrix-vector), and those are the two blas functions demucs.cpp uses - you have to combine it with a CPU BLAS library.

48cbb29

The gains are minuscule - same or worse wall time (for both the multithreaded and single-threaded versions) - using fewer CPU cores though. I think it's not worth it (but you should try anyway).

from demucs.cpp.

sevagh avatar sevagh commented on May 31, 2024

It's about what I expect. In a lot of places in demucs.cpp, I use loops instead of broadcasts. It uses less memory with smaller matrices vs., say, growing a bias tensor from (512) to (2048, 512), but is explicitly avoiding the stuff GPUs are good at.

from demucs.cpp.

EliteScientist avatar EliteScientist commented on May 31, 2024

Thanks. My POC is in Python using demucs. On my Desktop GPU (RTX 3090) it processes a 5min song in less than 10 seconds. This takes over 4~8 mins on my laptop's i9 cpu.

Using demucs with the default args requires 7GB of video memory according to the docs. You can tweak some settings to get it to run with 3GB of video memory.

I'm using the htdemucs_ft model in my POC.

from demucs.cpp.

sevagh avatar sevagh commented on May 31, 2024

I can't handwrite code that's faster than PyTorch, nor was that ever the goal of demucs.cpp. I don't know why people keep reminding me that the real PyTorch version of demucs is faster than demucs.cpp - I know this very well! I wrote it by comparing them side by side.

PyTorch Demucs is the real deal. But it can't run in WASM or on an Android phone - this codebase can. Conclude what you want.

from demucs.cpp.

EliteScientist avatar EliteScientist commented on May 31, 2024

Hey, sorry I wasn't trying to compare yours to PyTorch. What I was trying to say is that PyTorch is extremely slow running demucs on the CPU and significantly faster running demucs on the GPU.

And that when I ran it in PyTorch on a system that didn't have enough video memory, it defaulted to CPU.

I say this to say that there may be a significant speed improvement using the GPU with your library. I believe your library has the potential to be much faster than PyTorch. This is why I'm trying to go away from PyTorch and to a native platform.

from demucs.cpp.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.