Comments (26)
Hi Matt, please try setting env var HIP_PLATFORM to "hcc" so hip will recognize the nanos.
On Mar 23, 2016, at 11:36 PM, Matthew Macy <[email protected]mailto:[email protected]> wrote:
I was able to do the tutorial on gpuopen.comhttp://gpuopen.com but found that hipGetDeviceCount was only returning 1 so the examples would only run on my primary GPU a GTX 980Ti. I also have an R9 Nano and an R9 Fury. The kfd driver exports 3 nodes under topology so the runtime should let me talk to them. I'm running Ubuntu 15. I was hoping to instrument hip_hcc.cpp to see what it was doing right here:
/*
- Build a table of valid compute devices.
*/
auto accs = hc::accelerator::get_all();
int deviceCnt = 0;
for (int i=0; i<accs.size(); i++) {
if (! accs[i].get_is_emulated()) {
deviceCnt++;
}
}; - printf("actual device count is %d\n", deviceCnt); // Make sure the hip visible devices are within the deviceCnt range for (int i = 0; i < g_hip_visible_devices.size(); i++) { if(g_hip_visible_devices[i] >= deviceCnt){ // Make sure any DeviceID after invalid DeviceID will be erased. g_hip_visible_devices.resize(i); break; } }
But I can't even get it to compile:
~/devel/HIP2$ make
./bin/hipcc -I/opt/hcc/include -std=c++11 -I/opt/hsa/include src/hip_hcc.cpp -c -O3 -o src/hip_hcc.o
src/hip_hcc.cpp:52:2: error: #error (USE_AM_TRACKER requries HCC version of 16074 or newer)
#error (USE_AM_TRACKER requries HCC version of 16074 or newer)
^
Died at ./bin/hipcc line 208.
Makefile:20: recipe for target 'src/hip_hcc.o' failed
make: *** [src/hip_hcc.o] Error 1
I made the following change to the Makefile in response to complaints. But it's still not doing anything. And it looks like it's trying to compile the code with nvcc:
mmacy@pandemonium:~/devel/HIP2$ hipcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/15
from hip.
I see - that tells it which compiler to use.
hipcc square.cpp
In file included from /home/mmacy/devel/HIP/src/hip_hcc.cpp:42:
In file included from /home/mmacy/devel/HIP/include/hip_runtime.h:54:
In file included from /home/mmacy/devel/HIP/include/hcc_detail/hip_runtime.h:41:
In file included from /home/mmacy/devel/HIP/include/hip_runtime_api.h:196:
/home/mmacy/devel/HIP/include/hcc_detail/hip_runtime_api.h:35:2: error: ("This version of HIP requires a newer version of HCC.");
#error("This version of HIP requires a newer version of HCC.");
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2354:17: warning: unused variable 'stream' [-Wunused-variable]
hipStream_t stream = ihipSyncAndResolveStream(hipStreamNull);
^
1 warning and 1 error generated.
remake-deps failed at /home/mmacy/devel/HIP/bin/hipcc line 179.
That doesn't work so well.
I've installed the most recent .deb from https://bitbucket.org/multicoreware/hcc/downloads.
I take it I need to download the hcc sources as well?
I see. Their latest .deb is 16045. Your sources require 16074 or later.
I'm trying the following to see if I get a working hcc:
https://bitbucket.org/multicoreware/hcc/wiki/Developer%20Information
from hip.
Progress. I'm running 16124. It looks like you're out of sync with hsa:
mmacy@pandemonium:/devel/hcc/build$ hcc --version/devel/hcc/build$ cd ../..
HCC clang version 3.5.0 (based on HCC 0.10.16124-89bbf6f-7e4cd9e LLVM 3.5.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix
mmacy@pandemonium:
mmacy@pandemonium:/devel$ cd HIP/samples/0_Intro//devel$ cd HIP/samples/0_Intro/square/
bit_extract/ square/
mmacy@pandemonium:
mmacy@pandemonium:~/devel/HIP/samples/0_Intro/square$ hipcc square.cpp
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2093:22: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, locked_srcp, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~~~~~~~~~~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2155:35: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, _pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~~~~~~~~~~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2208:39: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, srcp0, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
^~~~~~~~~~~~~~~~~~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2333:35: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
^~~~~~~~~~~~~~~~~~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
/home/mmacy/devel/HIP/src/hip_hcc.cpp:2463:39: error: no matching function for call to 'hsa_amd_memory_async_copy'
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
^~~~~~~~~~~~~~~~~~~~~~~~~
/opt/hsa/include/hsa_ext_amd.h:452:5: note: candidate function not viable: requires 7 arguments, but 8 were provided
hsa_amd_memory_async_copy(void* dst, const void* src, size_t size,
^
5 errors generated.
remake-deps failed at /home/mmacy/devel/HIP/bin/hipcc line 179.
from hip.
I don't know what the situation is with the ROCR_V2 API. The async memcpy in what I assume is the canonical hsa_ext_amd.h:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h
looks more like the old one.
I made the following changes to hip_hcc.cpp to get my square.cpp to compile using hcc as the HIP_PLATFORM:
index 57d55a1..776e7c6 100644
--- a/src/hip_hcc.cpp
+++ b/src/hip_hcc.cpp
@@ -2090,7 +2090,7 @@ void StagingBuffer::CopyHostToDevicePinInPlace(void* dst, const void* src, size_
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, locked_srcp, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status = hsa_amd_memory_async_copy(dstp, locked_srcp, theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
assert(0);
#endif
@@ -2152,7 +2152,7 @@ void StagingBuffer::CopyHostToDevice(void* dst, const void* src, size_t sizeByte
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _device->_hsa_agent, _pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _pinnedStagingBuffer[bufferIndex], theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dstp, _pinnedStagingBuffer[bufferIndex], theseBytes, _device->_hsa_agent, 0, NULL, _completion_signal[bufferIndex]);
#endif
@@ -2205,7 +2205,7 @@ void StagingBuffer::CopyDeviceToHost(void* dst, const void* src, size_t sizeByte
tprintf (TRACE_COPY2, "D2H: bytesRemaining0=%zu async_copy %zu bytes src:%p to staging:%p\n", bytesRemaining0, theseBytes, srcp0, _pinnedStagingBuffer[bufferIndex]);
hsa_signal_store_relaxed(_completion_signal[bufferIndex], 1);
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], _device->_hsa_agent, srcp0, _device->_hsa_agent, theseBytes, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], srcp0, theseBytes, _device->_hsa_agent, waitFor ? 1:0, waitFor, _completion_signal[bufferIndex]);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(_pinnedStagingBuffer[bufferIndex], srcp0, theseBytes, _device->_hsa_agent, 0, NULL, _completion_signal[bufferIndex]);
#endif
@@ -2330,7 +2330,7 @@ void ihipSyncCopy(ihipStream_t *stream, void* dst, const void* src, size_t sizeB
#if USE_ROCR_V2
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, depSignalCnt, depSignalCnt ? &depSignal:0x0, device->_copy_signal);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, 0, NULL, device->_copy_signal);
#endif
@@ -2460,7 +2460,7 @@ hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcp
tprintf (TRACE_SYNC, " copy-async, waitFor=%lu completion=#%lu(%lu)\n", depSignalCnt? depSignal.handle:0x0, ihip_signal->_sig_id, ihip_signal->_hsa_signal.handle);
- hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, device->_hsa_agent, src, device->_hsa_agent, sizeBytes, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
+ hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, depSignalCnt, depSignalCnt ? &depSignal:0x0, ihip_signal->_hsa_signal);
#else
hsa_status_t hsa_status = hsa_amd_memory_async_copy(dst, src, sizeBytes, device->_hsa_agent, 0, NULL, ihip_signal->_hsa_signal);
from hip.
Hi Matthew, can you try switch to "dev" branch on both ROCK-Kernel-Driver and ROCR-Runtime? You shall be able to find newer async_copy API which works with HIP over there.
from hip.
What do I do to just re-build the driver?
Thanks.
from hip.
And for that matter - how do I rebuild the runtime. There's no makefile in the root.
from hip.
Hi Matthew, you don't need to build them. On "dev" branch of ROCK-Kernel-Driver you can find a "package" directory which has ubuntu & fedora packages inside. And you can also find pre-built packages under "package" directory in ROCR-Runtime. Please do remember to switch to "dev" branch on both repositories though.
from hip.
OK. Great. Thanks. I'll do that in the morning and let you know how that goes. In the meantime the patched version works for me.
I do notice that AMD kernels are much slower than Nvidia kernels:
mmacy@pandemonium:~/devel/HIP/samples/0_Intro/square$ time !!
time ./a.out
deviceCount: 2
info: running on device Fiji
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
info: copy Host2Device
info: launch 'vector_square' kernel
info: copy Device2Host
info: check result
PASSED!
real 0m1.203s
user 0m0.088s
sys 0m0.184s
mmacy@pandemonium:~/devel/HIP.old/samples/0_Intro/square$ time ./square.hip.out
deviceCount: 1
info: running on device GeForce GTX 980 Ti
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
info: copy Host2Device
info: launch 'vector_square' kernel
info: copy Device2Host
info: check result
PASSED!
real 0m0.273s
user 0m0.028s
sys 0m0.244s
Is that fundamental? Or does your job dispatch interface just need refinement?
Thanks.
from hip.
Hi Matthew, there are many ongoing works to optimize all aspects of the stack. Please stay tuned. :)
from hip.
OK. I updated both the kernel and the runtime to the 316 build. When running the square.cpp example with HIP_PLATFORM=hcc (nvcc still works fine) I now get a kernel oops:
Mar 24 11:28:34 pandemonium kernel: [ 639.895604] nvidia_uvm: Loaded the UVM driver, major device number 245
Mar 24 11:29:06 pandemonium kernel: [ 671.693636] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.693749] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.696239] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.734321] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776858] BUG: unable to handle kernel paging request at ffffc90019ecd000
Mar 24 11:29:06 pandemonium kernel: [ 671.776863] IP: [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776879] PGD ffec8f067 PUD ffeca0067 PMD fcf5f2067 PTE 0
Mar 24 11:29:06 pandemonium kernel: [ 671.776883] Oops: 0002 [#1] SMP
Mar 24 11:29:06 pandemonium kernel: [ 671.776886] Modules linked in: nvidia_uvm(POE) vmw_vsock_vmci_transport vsock vmw_vmci rfcomm bnep binfmt_misc hid_logitech_hidpp btusb btbcm btintel bluetooth b43 mac80211 nls_iso8859_1 cfg80211 ssb intel_rapl iosf_mbi x86_pkg_temp_thermal eeepc_wmi intel_powerclamp coretemp asus_wmi sparse_keymap video mxm_wmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel nvidia(POE) aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core snd_hda_codec_realtek snd_usb_audio snd_hda_codec_generic snd_usbmidi_lib snd_seq_midi hid_logitech_dj snd_seq_midi_event snd_hda_codec_hdmi snd_rawmidi snd_hda_intel snd_hda_controller snd_hda_codec snd_seq snd_hda_core snd_hwdep snd_seq_device bcma snd_pcm snd_timer snd mei_me lpc_ich soundcore mei shpchp wmi tpm_infineon mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu psmouse amd_gnb_bus i2c_algo_bit e1000e ttm drm_kms_helper ahci ptp libahci drm pps_core
Mar 24 11:29:06 pandemonium kernel: [ 671.776940] CPU: 0 PID: 4040 Comm: a.out Tainted: P OE 4.1.0-201603162000-kfd-build-obsidian-82-generic #82
Mar 24 11:29:06 pandemonium kernel: [ 671.776943] Hardware name: iXsystems CSE-COR-AIR540/RAMPAGE V EXTREME, BIOS 1902 12/18/2015
Mar 24 11:29:06 pandemonium kernel: [ 671.776944] task: ffff880e71d93250 ti: ffff880ea4020000 task.ti: ffff880ea4020000
Mar 24 11:29:06 pandemonium kernel: [ 671.776946] RIP: 0010:[] [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776955] RSP: 0018:ffff880ea4023d48 EFLAGS: 00010286
Mar 24 11:29:06 pandemonium kernel: [ 671.776956] RAX: ffffc90019ecd000 RBX: ffff880ff0f92e00 RCX: 0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776957] RDX: 0000000002400000 RSI: ffff880ff3251e20 RDI: ffff880ff0f92a00
Mar 24 11:29:06 pandemonium kernel: [ 671.776959] RBP: ffff880ea4023d48 R08: ffff880ff0f92a00 R09: 0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776960] R10: ffff880f962af800 R11: 00007ffc39269e80 R12: ffff880ea4023dc0
Mar 24 11:29:06 pandemonium kernel: [ 671.776961] R13: ffff880fb1275018 R14: ffff880fb1275000 R15: ffff880ea4023dc0
Mar 24 11:29:06 pandemonium kernel: [ 671.776963] FS: 00007f8005ebc740(0000) GS:ffff880fff200000(0000) knlGS:0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776965] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 11:29:06 pandemonium kernel: [ 671.776966] CR2: ffffc90019ecd000 CR3: 0000000f3f50b000 CR4: 00000000001407f0
Mar 24 11:29:06 pandemonium kernel: [ 671.776968] Stack:
Mar 24 11:29:06 pandemonium kernel: [ 671.776969] ffff880ea4023d78 ffffffffc03d8883 ffff880ea4023dc0 fffffffffffffff2
Mar 24 11:29:06 pandemonium kernel: [ 671.776972] 000000000000001a 00000000fffffff2 ffff880ea4023e78 ffffffffc03d9ebf
Mar 24 11:29:06 pandemonium kernel: [ 671.776975] ffff880f962af800 ffffffffc03d8810 ffff880fb1275000 00007ffc39269e80
Mar 24 11:29:06 pandemonium kernel: [ 671.776977] Call Trace:
Mar 24 11:29:06 pandemonium kernel: [ 671.776986] [] kfd_ioctl_set_trap_handler+0x73/0xc0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776994] [] kfd_ioctl+0x2bf/0x4d0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777001] [] ? kfd_ioctl_get_process_apertures+0x2e0/0x2e0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777010] [] ? pte_alloc_one+0x30/0x50
Mar 24 11:29:06 pandemonium kernel: [ 671.777015] [] ? __pte_alloc+0xcc/0x180
Mar 24 11:29:06 pandemonium kernel: [ 671.777019] [] do_vfs_ioctl+0x2f8/0x510
Mar 24 11:29:06 pandemonium kernel: [ 671.777023] [] ? __do_page_fault+0x1b6/0x450
Mar 24 11:29:06 pandemonium kernel: [ 671.777026] [] SyS_ioctl+0x81/0xa0
Mar 24 11:29:06 pandemonium kernel: [ 671.777028] [] ? do_page_fault+0x30/0x80
Mar 24 11:29:06 pandemonium kernel: [ 671.777032] [] system_call_fastpath+0x16/0x75
Mar 24 11:29:06 pandemonium kernel: [ 671.777034] Code: 00 0f 1f 44 00 00 55 31 c0 48 89 e5 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 46 f0 48 89 e5 8b 80 f4 01 00 00 48 03 86 e0 00 00 00 <48> 89 10 48 89 48 08 31 c0 5d c3 66 66 2e 0f 1f 84 00 00 00 00
Mar 24 11:29:06 pandemonium kernel: [ 671.777061] RIP [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777068] RSP
Mar 24 11:29:06 pandemonium kernel: [ 671.777069] CR2: ffffc90019ecd000
Mar 24 11:29:06 pandemonium kernel: [ 671.777072] ---[ end trace 53807749a7eb2ed3 ]---
Should I go back to the 1/25 version of driver/runtime with my local patch or is this likely to be fixed?
I'm happy to provide more info if need be.
from hip.
I created an issue in with ROCK as that is probably where the current problem belongs.
from hip.
Hi,
Make sure you install debian files for ROCK dev branch.
For runtime, make sure to install ROCR dev branch.
Test the sample in /opt/hsa/sample.
If your sample is not passing, ROCR or ROCK is not working as it should be.
If it pass, get compiler (HCC and LLVM), follow https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra. Make sure you run conformance test given in the wiki for the repo.
Then, add /opt/hsa to HSA_PATH, /opt/hcc to HCC_PATH. Do the same for adding bin directories to PATH and lib to LD_LIBRARY_PATH.
Get hip and add its project directory to HIP_PATH and hipcc directory to PATH.
from hip.
See previous comment "OK. I updated both the kernel and the runtime to the 316 build." That's the dev kernel. I also installed the dev runtime so that hip_hcc.cpp will compile with the ROCR_V2 copy interface. And that is what is causing this panic.
from hip.
My sample passed fine until I tried the latest kernel and runtime. So all the other options are correct.
from hip.
Can you try running hsa sample?
from hip.
I'm no longer able to boot the dev kernel. It also complains of not properly detecting my graphics hardware - so needs to run in low-resolution, but instead never displays a login prompt. I'm not sure what I need to do to recover at this point. The default ubuntu kernel still works OK.
from hip.
Looking at the logs It seems I'm seeing further OOPS at boot now:
Mar 24 11:53:42 pandemonium rsyslogd: rsyslogd's userid changed to 104
Mar 24 11:53:43 pandemonium kernel: [ 13.184444] NVRM: Your system is not currently configured to drive a VGA console
Mar 24 11:53:43 pandemonium kernel: [ 13.184446] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
Mar 24 11:53:43 pandemonium kernel: [ 13.184447] NVRM: requires the use of a text-mode VGA console. Use of other console
Mar 24 11:53:43 pandemonium kernel: [ 13.184448] NVRM: drivers including, but not limited to, vesafb, may result in
Mar 24 11:53:43 pandemonium kernel: [ 13.184449] NVRM: corruption and stability problems, and is not supported.
Mar 24 11:53:43 pandemonium kernel: [ 13.312533] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Mar 24 11:53:43 pandemonium kernel: [ 13.312537] IP: [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312554] PGD 0
Mar 24 11:53:43 pandemonium kernel: [ 13.312556] Oops: 0000 [#1] SMP
Mar 24 11:53:43 pandemonium kernel: [ 13.312558] Modules linked in: vmw_vsock_vmci_transport vsock vmw_vmci b43 mac80211 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp eeepc_wmi asus_wmi sparse_keymap cfg80211 video kvm ssb mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_hdmi serio_raw snd_hda_intel sb_edac snd_hda_controller nvidia(POE) snd_hda_codec edac_core snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd lpc_ich bcma mei_me soundcore mei shpchp bnep wmi bluetooth tpm_infineon mac_hid binfmt_misc parport_pc ppdev lp parport nls_iso8859_1 amdkfd amd_iommu_v2 amdgpu psmouse amd_gnb_bus i2c_algo_bit ttm drm_kms_helper e1000e ahci libahci drm ptp pps_core
Mar 24 11:53:43 pandemonium kernel: [ 13.312584] CPU: 1 PID: 1335 Comm: Xorg Tainted: P OE 4.1.0-201603162000-kfd-build-obsidian-82-generic #82
Mar 24 11:53:43 pandemonium kernel: [ 13.312586] Hardware name: iXsystems CSE-COR-AIR540/RAMPAGE V EXTREME, BIOS 1902 12/18/2015
Mar 24 11:53:43 pandemonium kernel: [ 13.312587] task: ffff880fcf760000 ti: ffff880ff34d4000 task.ti: ffff880ff34d4000
Mar 24 11:53:43 pandemonium kernel: [ 13.312587] RIP: 0010:[] [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312595] RSP: 0018:ffff880ff34d7c68 EFLAGS: 00010286
Mar 24 11:53:43 pandemonium kernel: [ 13.312596] RAX: ffff880ff17a83c0 RBX: ffff880ff0d7d400 RCX: 0000000000000001
Mar 24 11:53:43 pandemonium kernel: [ 13.312597] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880ff17a8cc0
Mar 24 11:53:43 pandemonium kernel: [ 13.312597] RBP: ffff880ff34d7cd8 R08: 000000000001a920 R09: ffff880ff17a83c0
Mar 24 11:53:43 pandemonium kernel: [ 13.312598] R10: ffffffffc01cc11d R11: 00000000c0186443 R12: ffff880ff4a239c0
Mar 24 11:53:43 pandemonium kernel: [ 13.312599] R13: ffff880ff31ee110 R14: ffff880ff5c16200 R15: ffff880ff06f0000
Mar 24 11:53:43 pandemonium kernel: [ 13.312599] FS: 00007f705b545980(0000) GS:ffff880fff240000(0000) knlGS:0000000000000000
Mar 24 11:53:43 pandemonium kernel: [ 13.312600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 11:53:43 pandemonium kernel: [ 13.312601] CR2: 0000000000000010 CR3: 0000000ff6cd5000 CR4: 00000000001407e0
Mar 24 11:53:43 pandemonium kernel: [ 13.312602] Stack:
Mar 24 11:53:43 pandemonium kernel: [ 13.312602] ffff880ff34d7cd8 ffff880fe0b0c400 ffff880fe0b0c000 ffff880fe0b0c800
Mar 24 11:53:43 pandemonium kernel: [ 13.312604] 0000000200000000 ffff880ff382ee00 ffff880ff17a83c0 0000000000000002
Mar 24 11:53:43 pandemonium kernel: [ 13.312605] 0000000000000000 ffff880ff31ee110 ffff880ff382ee00 ffff880ff4a239c0
Mar 24 11:53:43 pandemonium kernel: [ 13.312606] Call Trace:
Mar 24 11:53:43 pandemonium kernel: [ 13.312615] [] amdgpu_bo_list_ioctl+0x279/0x3f0 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312623] [] drm_ioctl+0x379/0x6a0 [drm]
Mar 24 11:53:43 pandemonium kernel: [ 13.312630] [] ? amdgpu_bo_list_free+0x90/0x90 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312635] [] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312638] [] do_vfs_ioctl+0x2f8/0x510
Mar 24 11:53:43 pandemonium kernel: [ 13.312640] [] ? __do_page_fault+0x1b6/0x450
Mar 24 11:53:43 pandemonium kernel: [ 13.312642] [] SyS_ioctl+0x81/0xa0
Mar 24 11:53:43 pandemonium kernel: [ 13.312643] [] ? do_page_fault+0x30/0x80
Mar 24 11:53:43 pandemonium kernel: [ 13.312645] [] system_call_fastpath+0x16/0x75
Mar 24 11:53:43 pandemonium kernel: [ 13.312646] Code: 00 00 48 8b bb 08 01 00 00 e8 57 67 fe ff 84 c0 0f 85 2f ff ff ff 8b 45 b0 8d 48 01 48 8d 04 80 48 c1 e0 04 48 03 45 c0 48 8b 10 <8b> 52 10 83 fa 04 89 50 30 0f 84 2b 01 00 00 89 50 34 48 89 18
Mar 24 11:53:43 pandemonium kernel: [ 13.312659] RIP [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:43 pandemonium kernel: [ 13.312665] RSP
Mar 24 11:53:43 pandemonium kernel: [ 13.312666] CR2: 0000000000000010
Mar 24 11:53:43 pandemonium kernel: [ 13.312667] ---[ end trace df8106f7c32c327f ]---
Mar 24 11:53:43 pandemonium nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Mar 24 11:53:43 pandemonium nvidia-persistenced: Shutdown (1372)
Again moments later:
Mar 24 11:53:48 pandemonium nvidia-persistenced: Started (1640)
Mar 24 11:53:49 pandemonium kernel: [ 19.067885] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Mar 24 11:53:49 pandemonium kernel: [ 19.067889] IP: [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.067905] PGD 0
Mar 24 11:53:49 pandemonium kernel: [ 19.067907] Oops: 0000 [#2] SMP
Mar 24 11:53:49 pandemonium kernel: [ 19.067908] Modules linked in: hid_logitech_hidpp snd_usb_audio hid_logitech_dj snd_usbmidi_lib btusb btbcm btintel hid_generic usbhid hid vmw_vsock_vmci_transport vsock vmw_vmci b43
mac80211 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp eeepc_wmi asus_wmi sparse_keymap cfg80211 video kvm ssb mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_real
tek snd_hda_codec_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_hdmi serio_raw snd_hda_intel sb_edac snd_hda_controller nvidia(POE) snd_hda_codec edac_core snd_hda_core snd_hwdep snd_pcm snd
_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd lpc_ich bcma mei_me soundcore mei shpchp bnep wmi bluetooth tpm_infineon mac_hid binfmt_misc parport_pc ppdev lp parport nls_iso8859_1 amdkfd a
md_iommu_v2 amdgpu psmouse amd_gnb_bus i2c_algo_bit ttm drm_kms_helper e1000e ahci libahci drm ptp pps_core
Mar 24 11:53:49 pandemonium kernel: [ 19.067935] CPU: 0 PID: 1637 Comm: Xorg Tainted: P D OE 4.1.0-201603162000-kfd-build-obsidian-82-generic #82
Mar 24 11:53:49 pandemonium kernel: [ 19.067937] Hardware name: iXsystems CSE-COR-AIR540/RAMPAGE V EXTREME, BIOS 1902 12/18/2015
Mar 24 11:53:49 pandemonium kernel: [ 19.067938] task: ffff880fcf79c670 ti: ffff880ff6354000 task.ti: ffff880ff6354000
Mar 24 11:53:49 pandemonium kernel: [ 19.067939] RIP: 0010:[] [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.067946] RSP: 0018:ffff880ff6357c68 EFLAGS: 00010286
Mar 24 11:53:49 pandemonium kernel: [ 19.067947] RAX: ffff880ff14b9cc0 RBX: ffff880ff61dc800 RCX: 0000000000000001
Mar 24 11:53:49 pandemonium kernel: [ 19.067948] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880ff14b9d80
Mar 24 11:53:49 pandemonium kernel: [ 19.067948] RBP: ffff880ff6357cd8 R08: 000000000001a920 R09: ffff880ff14b9cc0
Mar 24 11:53:49 pandemonium kernel: [ 19.067949] R10: ffffffffc01cc11d R11: 00000000c0186443 R12: ffff880ff4370780
Mar 24 11:53:49 pandemonium kernel: [ 19.067950] R13: ffff880fefd27d90 R14: ffff880ff0ef0700 R15: ffff880ff06f0000
Mar 24 11:53:49 pandemonium kernel: [ 19.067951] FS: 00007fb06dc05980(0000) GS:ffff880fff200000(0000) knlGS:0000000000000000
Mar 24 11:53:49 pandemonium kernel: [ 19.067952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 11:53:49 pandemonium kernel: [ 19.067952] CR2: 0000000000000010 CR3: 0000000ff61e0000 CR4: 00000000001407f0
Mar 24 11:53:49 pandemonium kernel: [ 19.067953] Stack:
Mar 24 11:53:49 pandemonium kernel: [ 19.067954] ffff880ff6357cd8 ffff880fe0b0c400 ffff880fe0b0c000 ffff880fe0b0c800
Mar 24 11:53:49 pandemonium kernel: [ 19.067955] 0000000200000000 ffff880ff2704e00 ffff880ff14b9cc0 0000000000000002
Mar 24 11:53:49 pandemonium kernel: [ 19.067956] 0000000000000000 ffff880fefd27d90 ffff880ff2704e00 ffff880ff4370780
Mar 24 11:53:49 pandemonium kernel: [ 19.067958] Call Trace:
Mar 24 11:53:49 pandemonium kernel: [ 19.067967] [] amdgpu_bo_list_ioctl+0x279/0x3f0 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.067976] [] drm_ioctl+0x379/0x6a0 [drm]
Mar 24 11:53:49 pandemonium kernel: [ 19.067983] [] ? amdgpu_bo_list_free+0x90/0x90 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.067988] [] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.067991] [] do_vfs_ioctl+0x2f8/0x510
Mar 24 11:53:49 pandemonium kernel: [ 19.067993] [] ? __do_page_fault+0x1b6/0x450
Mar 24 11:53:49 pandemonium kernel: [ 19.067995] [] SyS_ioctl+0x81/0xa0
Mar 24 11:53:49 pandemonium kernel: [ 19.067996] [] ? do_page_fault+0x30/0x80
Mar 24 11:53:49 pandemonium kernel: [ 19.067998] [] system_call_fastpath+0x16/0x75
Mar 24 11:53:49 pandemonium kernel: [ 19.067999] Code: 00 00 48 8b bb 08 01 00 00 e8 57 67 fe ff 84 c0 0f 85 2f ff ff ff 8b 45 b0 8d 48 01 48 8d 04 80 48 c1 e0 04 48 03 45 c0 48 8b 10 <8b> 52 10 83 fa 04 89 50 30 0f 84 2b 01 00 00 89 50 34 48 89 18
Mar 24 11:53:49 pandemonium kernel: [ 19.068012] RIP [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:49 pandemonium kernel: [ 19.068019] RSP
Mar 24 11:53:49 pandemonium kernel: [ 19.068019] CR2: 0000000000000010
Mar 24 11:53:49 pandemonium kernel: [ 19.068031] ---[ end trace df8106f7c32c3280 ]---
Mar 24 11:53:49 pandemonium nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Mar 24 11:53:49 pandemonium nvidia-persistenced: Shutdown (1640)
Mar 24 11:53:54 pandemonium nvidia-persistenced: Started (1703)
Mar 24 11:53:54 pandemonium nvidia-persistenced: Failed to open PID file: File exists
Mar 24 11:53:54 pandemonium nvidia-persistenced: Shutdown (1710)
Mar 24 11:53:54 pandemonium nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Mar 24 11:53:54 pandemonium nvidia-persistenced: Shutdown (1703)
Mar 24 11:53:54 pandemonium nvidia-persistenced: Started (1741)
Mar 24 11:53:55 pandemonium kernel: [ 25.105818] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Mar 24 11:53:55 pandemonium kernel: [ 25.105822] IP: [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:55 pandemonium kernel: [ 25.105838] PGD 0
Mar 24 11:53:55 pandemonium kernel: [ 25.105839] Oops: 0000 [#3] SMP
Mar 24 11:53:55 pandemonium kernel: [ 25.105841] Modules linked in: hid_logitech_hidpp snd_usb_audio hid_logitech_dj snd_usbmidi_lib btusb btbcm btintel hid_generic usbhid hid vmw_vsock_vmci_transport vsock vmw_vmci b43 mac80211 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp eeepc_wmi asus_wmi sparse_keymap cfg80211 video kvm ssb mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_hdmi serio_raw snd_hda_intel sb_edac snd_hda_controller nvidia(POE) snd_hda_codec edac_core snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd lpc_ich bcma mei_me soundcore mei shpchp bnep wmi bluetooth tpm_infineon mac_hid binfmt_misc parport_pc ppdev lp parport nls_iso8859_1 amdkfd amd_iommu_v2 amdgpu psmouse amd_gnb_bus i2c_algo_bit ttm drm_kms_helper e1000e ahci libahci drm ptp pps_core
Mar 24 11:53:55 pandemonium kernel: [ 25.105869] CPU: 0 PID: 1738 Comm: Xorg Tainted: P D OE 4.1.0-201603162000-kfd-build-obsidian-82-generic #82
Mar 24 11:53:55 pandemonium kernel: [ 25.105870] Hardware name: iXsystems CSE-COR-AIR540/RAMPAGE V EXTREME, BIOS 1902 12/18/2015
Mar 24 11:53:55 pandemonium kernel: [ 25.105871] task: ffff880034a11e30 ti: ffff8800a6aec000 task.ti: ffff8800a6aec000
Mar 24 11:53:55 pandemonium kernel: [ 25.105872] RIP: 0010:[] [] amdgpu_bo_list_set+0x196/0x3d0 [amdgpu]
Mar 24 11:53:55 pandemonium kernel: [ 25.105880] RSP: 0018:ffff8800a6aefc68 EFLAGS: 00010286
Mar 24 11:53:55 pandemonium kernel: [ 25.105881] RAX: ffff880fefd90f00 RBX: ffff880ff0e2d800 RCX: 0000000000000001
Mar 24 11:53:55 pandemonium kernel: [ 25.105881] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880fefd90840
Mar 24 11:53:55 pandemonium kernel: [ 25.105882] RBP: ffff8800a6aefcd8 R08: 000000000001a920 R09: ffff880fefd90f00
Mar 24 11:53:55 pandemonium kernel: [ 25.105883] R10: ffffffffc01cc11d R11: 00000000c0186443 R12: ffff880ff47c9120
Mar 24 11:53:55 pandemonium kernel: [ 25.105883] R13: ffff880fefd27e90 R14: ffff880ff0d68100 R15: ffff880ff06f0000
Mar 24 11:53:55 pandemonium kernel: [ 25.105884] FS: 00007fbc416d6980(0000) GS:ffff880fff200000(0000) knlGS:0000000000000000
Mar 24 11:53:55 pandemonium kernel: [ 25.105885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 11:53:55 pandemonium kernel: [ 25.105886] CR2: 0000000000000010 CR3: 0000000ff6ea4000 CR4: 00000000001407f0
Mar 24 11:53:55 pandemonium kernel: [ 25.105886] Stack:
Mar 24 11:53:55 pandemonium kernel: [ 25.105887] ffff8800a6aefcd8 ffff880fe0b0c400 ffff880fe0b0c000 ffff880fe0b0c800
Mar 24 11:53:55 pandemonium kernel: [ 25.105889] 0000000200000000 ffff880ff363b400 ffff880fefd90f00 0000000000000002
Mar 24 11:53:55 pandemonium kernel: [ 25.105890] 0000000000000000 ffff880fefd27e90 ffff880ff363b400 ffff880ff47c9120
Mar 24 11:53:55 pandemonium kernel: [ 25.105891] Call Trace:
Mar 24 11:53:55 pandemonium kernel: [ 25.105900] [] amdgpu_bo_list_ioctl+0x279/0x3f0 [amdgpu]
And so on for all cpus.
from hip.
Do you have the GTX 980Ti, the R9 Nano and the R9 Fury all installed in the same system? If so, did you install the drivers for the GTX card before or after you installed the ROCK packages?
from hip.
They're all in the same system. I installed the GTX card a couple of weeks ago. The R9s date back to yesterday. I have made no changes to the Nvidia software/hardware configuration in a couple of weeks - i.e. well before doing anything with AMD.
from hip.
Attaching system profile info.
from hip.
The current status AFAICT is that the development driver won't work except in console-mode because Xorg's probing causes it to crash. So can anyone give me an ETA on when that will be fixed on github?
Thanks.
from hip.
Hi,
You can revert back to a previous release commit.
from hip.
It's not clear to me where the problem was introduced. Can you hazard a guess at which changeset to try? The last time packages were updated was Jan 26th which corresponds to what's in master. So I'll need to build my own kernel - which is fine with me provided Kconfig is complete.
from hip.
Hi,
You can try master branch package. obsidian 62 (if you want to run hcc badly).
from hip.
Closing this since the original issue should be occurring anymore.
@mattmacy Please try with a clean setup and reopen the issue if you face any problems.
from hip.
Related Issues (20)
- [Feature]: Out of Order Streams HOT 4
- Any suggestions for porting HIP to CUDA? HOT 1
- [Issue]: Cannot register Static Global Var on inline variable HOT 4
- [Issue]: failed call to hipInit: HIP_ERROR_InvalidDevice HOT 3
- [Issue]: python -c "import torch;print(torch.cuda.is_available())" returns False HOT 10
- stream create, copy and destroy example HOT 6
- [Documentation]: Fix NVIDIA build instructions HOT 1
- [Documentation]: Installation. ( hipcc-nvidia ) HOT 7
- [Issue]: Blender hangs when rendering
- HIP SDK 5.7 installer installs HIP runtime 5.2 HOT 4
- [Issue]: Asynchronous execution with hipExtModuleLaunchKernel HOT 6
- https://github.com/ROCm/HIP/issues/2209 should be reopened for hipcc returning incosistent values HOT 1
- [Issue]: how to close or bypass L1 cache HOT 2
- [Issue]: use of overloaded operator '/' is ambiguous (with operand types 'float2' (aka 'HIP_vector_type<float, 2>') and 'float2') HOT 2
- [Feature]: Do we now have planning that adds Dynamic Parallelism features? HOT 1
- return value of the device property HOT 1
- [Issue]: Windows + nvidia gpu = error: no ROCm-capable device is detected? HOT 1
- [Issue]: Could not find a configuration file for package "HIP" - requested 1.0, found 6.0.0
- HIP installation on Nvidia platform HOT 6
- [Issue]: Conversion of tiny-cuda-nn lib into HIP HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hip.