Comments (12)
Could it be related to this?
https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile#L4
This seems to specify the compute capability of the GPU and 61 matches the P4 indeed... https://en.wikipedia.org/wiki/CUDA
Would it be possible to support more types?
http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
from enoki.
Hi,
you need a Maxwell-class GPU or newer to run Mitsuba's inverse renderer (e.g. GeForce 1080). The bottleneck here is actually not Enoki (it could likely compile and run with a much lower compute capability), but OptiX which requires Maxwell/Turing for the "RTGeometryTriangles" primitive that we depend on.
Best,
Wenzel
from enoki.
Hi Wenzel
Glad to receive a reaction from the legend himself!
I was reading this page: http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation.
If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT compiler by the CUDA driver.
and was wondering if this script: https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile#L4 shouldn't contain the -arch
flag to make it more performant?
from enoki.
We compile to PTX (compute_..
) instead of specific device code (sm_..
) because this enables the resulting shared library to be moved between systems that potentially have different GPUs. The downside is minimal, a few hundred milliseconds for JIT compilation the first time that Enoki is used. (The resulting native code is cached in ~/.nv
in your home directory)
from enoki.
Aha ok, thanks for the explanation! You say a Maxwell class GPU (or newer) is needed for Optix, but all the GPU's I tested (except K80) are Maxwell or better...
I have the feeling that the code only runs on a GPU with compute capability 61, as the P4 or 1080...
from enoki.
Aha, that could well be -- maybe we're setting the flag too strictly. Can you try just manually setting it to something smaller? What is the C.C. of the other maxwell devices?
from enoki.
I've updated the table in my initial post with the C.C's.
I've changed https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile this file (61
-> 50
) and did
make clean
make all
Now I'm rebuilding mitsuba2 on my laptop (GeForce 940M, Maxwell, with CC=5.0) 🤞
I know this hardware is not ideal, but I just want to do some basic tests at this point...
from enoki.
Still exactly the same error...
PTX linker error:
ptxas fatal : SM version specified by .target is higher than default SM version assumed
cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.
I'm wondering where the "SM version" is still defined....
from enoki.
In CMake, there's a variable called ENOKI_CUDA_COMPUTE_CAPABILITY
which you could try and set to match your change in the Makefile you pointed out:
Line 15 in 02b224d
from enoki.
Hey Merlin
Thanks! That did the job (together with the other fix)!
I can now run the cbox example on my laptop's Geforce 940M! :)
Should I create a PR for this, or do you guys prefer keeping it as is?
from enoki.
Yes, that would be great -- please make a PR that downgrades the compute capabilities to the minimal version known to work.
from enoki.
mitsuba-renderer/mitsuba2#53
#71
from enoki.
Related Issues (20)
- Force enoki::Array<float, 3> to be 12 bytes HOT 4
- [General question] Figuring out the correct index type for `gather` operations HOT 2
- Double precision for Enoki autodiff (not supported yet) HOT 3
- sign documentation is wrong HOT 1
- Using CUDA backend of Enoki in a multithreaded environment HOT 1
- Difference between this enoki and wjakob/enoki HOT 2
- The behavior of enoki::hsum does not correspond to the documentation HOT 5
- Enoki cannot build on VS2019 16.10 HOT 12
- How to use binary_search overloads in python.
- Python ImportError on Windows with enoki.cuda HOT 2
- build issues: undefined reference to `clock_gettime@GLIBC_2.17'
- AMD GPU code generation? HOT 2
- Runtime dynamic dispatch of functions using Enoki? HOT 1
- SYCL support HOT 1
- Error: zero size memory allocation when calling 'cuda_partition' HOT 1
- Enoki does not generate fma instruction for fmadd with Array<float, 1> and Clang
- using enoki with custom matrix class HOT 1
- simple enoki example does not compile with Intel compiler HOT 1
- Vectorized RNG repeats values with nested arrays
- Installation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enoki.