Comments (6)
It's probably really easy, actually. Part of my CMakeLists file involves a few different BLAS vendor libraries: https://github.com/sevagh/demucs.cpp/blob/main/CMakeLists.txt#L35
I believe NVBLAS is a cuBLAS wrapper that automatically handles host-device transfers. So if you link against NVBLAS in this codebase, it should work right away.
from demucs.cpp.
OK, not that easy - NVBLAS (i.e. cuBLAS) provides sgemm (matrix-matrix multiply) but not sgemv (matrix-vector), and those are the two blas functions demucs.cpp uses - you have to combine it with a CPU BLAS library.
The gains are minuscule - same or worse wall time (for both the multithreaded and single-threaded versions) - using fewer CPU cores though. I think it's not worth it (but you should try anyway).
from demucs.cpp.
It's about what I expect. In a lot of places in demucs.cpp, I use loops instead of broadcasts. It uses less memory with smaller matrices vs., say, growing a bias tensor from (512) to (2048, 512), but is explicitly avoiding the stuff GPUs are good at.
from demucs.cpp.
Thanks. My POC is in Python using demucs. On my Desktop GPU (RTX 3090) it processes a 5min song in less than 10 seconds. This takes over 4~8 mins on my laptop's i9 cpu.
Using demucs with the default args requires 7GB of video memory according to the docs. You can tweak some settings to get it to run with 3GB of video memory.
I'm using the htdemucs_ft model in my POC.
from demucs.cpp.
I can't handwrite code that's faster than PyTorch, nor was that ever the goal of demucs.cpp. I don't know why people keep reminding me that the real PyTorch version of demucs is faster than demucs.cpp - I know this very well! I wrote it by comparing them side by side.
PyTorch Demucs is the real deal. But it can't run in WASM or on an Android phone - this codebase can. Conclude what you want.
from demucs.cpp.
Hey, sorry I wasn't trying to compare yours to PyTorch. What I was trying to say is that PyTorch is extremely slow running demucs on the CPU and significantly faster running demucs on the GPU.
And that when I ran it in PyTorch on a system that didn't have enough video memory, it defaulted to CPU.
I say this to say that there may be a significant speed improvement using the GPU with your library. I believe your library has the potential to be much faster than PyTorch. This is why I'm trying to go away from PyTorch and to a native platform.
from demucs.cpp.
Related Issues (11)
- Amount of time to demux an audio file HOT 4
- Feature request - better progress reporting and logging HOT 5
- Feature request - optional logging HOT 1
- CMakeLists.txt demucs.cpp.test target - missing dependency gtest HOT 2
- Support Demucs v3 (hdemucs_mmi) HOT 1
- unknown target CPI 'apple-m1' HOT 2
- demucs_mt.cpp.main hard wired for 4-source HOT 2
- Memory access error with MT on mac HOT 2
- Two stem model HOT 3
- How to apply it in WebAssembly? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from demucs.cpp.