Ginkgo's build system doesn't specify the GPU architectures on nvcc's command line. Fo

Simplify specifying CUDA architectures in build system about ginkgo HOT 5 CLOSED

ginkgo-project commented on August 17, 2024

Simplify specifying CUDA architectures in build system

from ginkgo.

Comments (5)

venovako commented on August 17, 2024

For now, I use CMAKE_CUDA_FLAGS and set it to, e.g., -arch=sm_60.
Please check the attached file (C file uploaded as TXT due to GitHub constraints).
It is a small program that uses only CUDA Driver API, and prints out the flags for all NVidia GPU cards present in the system (there might be a couple of them, not all with the same SM capabilities).
If CUDA driver cannot be properly initialized, the program will print a message and return the error code 1 (otherwise 0).
The program was integrated into GNU Autotools build system for another open-source project, but I have no clue how to do it for CMake (I'm not an expert in build systems at all).
A small help on building the program is placed in the comment at the top of the file.

nvcc_arch_sm.c.txt

from ginkgo.

venovako commented on August 17, 2024

P.S. Such an approach works fine if the GPU architecture(s) on the build machine are the same or a superset of the ones on the target machine(s). However, that might not always be the case. A tipical example: the frontend node of a cluster has some low-end GPU, and the compute nodes have newer GPUs. Therefore, setting the variable manually has to be possible at all times. I don't have a clever idea how to have both the auto-detection by a program similar to the one proposed above, and an option of manually setting the flags. Maybe there should be a boolean variable, e.g., GINKGO_AUTODETECT_GPUS that is initially ON and calls the flag-detection program. If you set it manually to OFF, then you are on your own to supply the correct architecture flags in CMAKE_CUDA_FLAGS.

from ginkgo.

gflegar commented on August 17, 2024

That's why I suggested running the automatic detection only if the user doesn't set the flags. So, if you're installing it on you local system, you just don't set the flags and automatic detection will set them for you. If you're on a head node of the server, you just set the architectures to whatever you want, and by doing this you prevent the hardware detector for setting them for you.

P.S. I also already have a small C++ program (using the runtime API) that prints the flags for detecter architectures (no special flags needed, just compile it with nvcc, or alternatively with any C++ compiler, and link it with cuda runtime libraries).

get_cuda_arch.cpp.txt

from ginkgo.

venovako commented on August 17, 2024

Sure. Please just consider one thing: the user might set CMAKE_CUDA_FLAGS for some other unrelated reason (e.g., setting the compiler, linker, or ptxas options), and not put -arch or -gencode or something similar at all there. Should (s)he then loose the auto-detection feature?

The program I proposed is by no means perfect, and please ignore it if you wish (it was there only to illustrate an idea). Using CUDA Driver API there made sense in case that you have only the driver installed, and not the whole CUDA Toolkit (which provides the CUDA Runtime libraries).

from ginkgo.

gflegar commented on August 17, 2024

Setting the CMAKE_CUDA_FLAGS shouldn't prevent auto-detection. What I had in mind is adding another flag, which the build system will use to populate CMAKE_CUDA_FLAGS internally. For example, something like cmake -DCUDA_ARCHITECTURES="3.5;6.0;7.0" .... If the flag is not set, the build system uses auto-detection to populate it. Afterwards, it loops through the architectures set in this flag, verifies that each specified architecture is valid and supported by Ginkgo, and adds the required flags to CMAKE_CUDA_FLAGS.

As for runtime/driver API, yes, for some projects that don't need the CUDA toolkit, the driver API might be a way to go. However, in our case, we can rely on the whole CUDA toolkit being there, as otherwise we cannot build Ginkgo anyway. And using the runtime API does result in a simpler and easier to maintain code.

from ginkgo.

Simplify specifying CUDA architectures in build system about ginkgo HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs