GithubHelp home page GithubHelp logo

rocm / rock-kernel-driver Goto Github PK

View Code? Open in Web Editor NEW
306.0 78.0 94.0 3.08 GB

AMDGPU Driver with KFD used by the ROCm project. Also contains the current Linux Kernel that matches this base driver

License: Other

Makefile 0.20% C 98.37% Assembly 0.72% C++ 0.02% Shell 0.35% Perl 0.09% Awk 0.01% Python 0.19% Yacc 0.01% Lex 0.01% UnrealScript 0.01% Gherkin 0.01% XS 0.01% Roff 0.01% Clojure 0.01% M4 0.01% sed 0.01% SmPL 0.01% Raku 0.01% MATLAB 0.01%

rock-kernel-driver's Introduction

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

rock-kernel-driver's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rock-kernel-driver's Issues

kernel oops when running hip kernel with dev branch ROCR/ROCK

When running the square.cpp sample I now see the following:

Mar 24 11:28:34 pandemonium kernel: [ 639.895604] nvidia_uvm: Loaded the UVM driver, major device number 245
Mar 24 11:29:06 pandemonium kernel: [ 671.693636] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.693749] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.696239] amdgpu: vram aperture is out of 40bit address base: 0x383fc0000000 limit 0x383fd0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.734321] amdgpu: vram aperture is out of 40bit address base: 0x383fe0000000 limit 0x383ff0000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776858] BUG: unable to handle kernel paging request at ffffc90019ecd000
Mar 24 11:29:06 pandemonium kernel: [ 671.776863] IP: [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776879] PGD ffec8f067 PUD ffeca0067 PMD fcf5f2067 PTE 0
Mar 24 11:29:06 pandemonium kernel: [ 671.776883] Oops: 0002 [#1] SMP
Mar 24 11:29:06 pandemonium kernel: [ 671.776886] Modules linked in: nvidia_uvm(POE) vmw_vsock_vmci_transport vsock vmw_vmci rfcomm bnep binfmt_misc hid_logitech_hidpp btusb btbcm btintel bluetooth b43 mac80211 nls_iso8859_1 cfg80211 ssb intel_rapl iosf_mbi x86_pkg_temp_thermal eeepc_wmi intel_powerclamp coretemp asus_wmi sparse_keymap video mxm_wmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel nvidia(POE) aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core snd_hda_codec_realtek snd_usb_audio snd_hda_codec_generic snd_usbmidi_lib snd_seq_midi hid_logitech_dj snd_seq_midi_event snd_hda_codec_hdmi snd_rawmidi snd_hda_intel snd_hda_controller snd_hda_codec snd_seq snd_hda_core snd_hwdep snd_seq_device bcma snd_pcm snd_timer snd mei_me lpc_ich soundcore mei shpchp wmi tpm_infineon mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu psmouse amd_gnb_bus i2c_algo_bit e1000e ttm drm_kms_helper ahci ptp libahci drm pps_core
Mar 24 11:29:06 pandemonium kernel: [ 671.776940] CPU: 0 PID: 4040 Comm: a.out Tainted: P OE 4.1.0-201603162000-kfd-build-obsidian-82-generic #82
Mar 24 11:29:06 pandemonium kernel: [ 671.776943] Hardware name: iXsystems CSE-COR-AIR540/RAMPAGE V EXTREME, BIOS 1902 12/18/2015
Mar 24 11:29:06 pandemonium kernel: [ 671.776944] task: ffff880e71d93250 ti: ffff880ea4020000 task.ti: ffff880ea4020000
Mar 24 11:29:06 pandemonium kernel: [ 671.776946] RIP: 0010:[] [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776955] RSP: 0018:ffff880ea4023d48 EFLAGS: 00010286
Mar 24 11:29:06 pandemonium kernel: [ 671.776956] RAX: ffffc90019ecd000 RBX: ffff880ff0f92e00 RCX: 0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776957] RDX: 0000000002400000 RSI: ffff880ff3251e20 RDI: ffff880ff0f92a00
Mar 24 11:29:06 pandemonium kernel: [ 671.776959] RBP: ffff880ea4023d48 R08: ffff880ff0f92a00 R09: 0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776960] R10: ffff880f962af800 R11: 00007ffc39269e80 R12: ffff880ea4023dc0
Mar 24 11:29:06 pandemonium kernel: [ 671.776961] R13: ffff880fb1275018 R14: ffff880fb1275000 R15: ffff880ea4023dc0
Mar 24 11:29:06 pandemonium kernel: [ 671.776963] FS: 00007f8005ebc740(0000) GS:ffff880fff200000(0000) knlGS:0000000000000000
Mar 24 11:29:06 pandemonium kernel: [ 671.776965] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 11:29:06 pandemonium kernel: [ 671.776966] CR2: ffffc90019ecd000 CR3: 0000000f3f50b000 CR4: 00000000001407f0
Mar 24 11:29:06 pandemonium kernel: [ 671.776968] Stack:
Mar 24 11:29:06 pandemonium kernel: [ 671.776969] ffff880ea4023d78 ffffffffc03d8883 ffff880ea4023dc0 fffffffffffffff2
Mar 24 11:29:06 pandemonium kernel: [ 671.776972] 000000000000001a 00000000fffffff2 ffff880ea4023e78 ffffffffc03d9ebf
Mar 24 11:29:06 pandemonium kernel: [ 671.776975] ffff880f962af800 ffffffffc03d8810 ffff880fb1275000 00007ffc39269e80
Mar 24 11:29:06 pandemonium kernel: [ 671.776977] Call Trace:
Mar 24 11:29:06 pandemonium kernel: [ 671.776986] [] kfd_ioctl_set_trap_handler+0x73/0xc0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.776994] [] kfd_ioctl+0x2bf/0x4d0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777001] [] ? kfd_ioctl_get_process_apertures+0x2e0/0x2e0 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777010] [] ? pte_alloc_one+0x30/0x50
Mar 24 11:29:06 pandemonium kernel: [ 671.777015] [] ? __pte_alloc+0xcc/0x180
Mar 24 11:29:06 pandemonium kernel: [ 671.777019] [] do_vfs_ioctl+0x2f8/0x510
Mar 24 11:29:06 pandemonium kernel: [ 671.777023] [] ? __do_page_fault+0x1b6/0x450
Mar 24 11:29:06 pandemonium kernel: [ 671.777026] [] SyS_ioctl+0x81/0xa0
Mar 24 11:29:06 pandemonium kernel: [ 671.777028] [] ? do_page_fault+0x30/0x80
Mar 24 11:29:06 pandemonium kernel: [ 671.777032] [] system_call_fastpath+0x16/0x75
Mar 24 11:29:06 pandemonium kernel: [ 671.777034] Code: 00 0f 1f 44 00 00 55 31 c0 48 89 e5 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 46 f0 48 89 e5 8b 80 f4 01 00 00 48 03 86 e0 00 00 00 <48> 89 10 48 89 48 08 31 c0 5d c3 66 66 2e 0f 1f 84 00 00 00 00
Mar 24 11:29:06 pandemonium kernel: [ 671.777061] RIP [] set_trap_handler+0x1a/0x30 [amdkfd]
Mar 24 11:29:06 pandemonium kernel: [ 671.777068] RSP
Mar 24 11:29:06 pandemonium kernel: [ 671.777069] CR2: ffffc90019ecd000
Mar 24 11:29:06 pandemonium kernel: [ 671.777072] ---[ end trace 53807749a7eb2ed3 ]---

ROCm Kernel (Ussing Head with GNOME X11/GTK) not working with Ubuntu 16.04 (RX480)

Hi folks,

I was running ROCm for a few months on this box, but in the last week it's started crashing more and more often, and yesterday it just started freezing shortly after boot-up, particularly if I touched the display settings for my monitors.

Because I wasn't confident the install was clean to begin with, and ROCm had been unstable and unpredictable for compute all that time, I decided to try a fresh install before reporting a bug.

First, I tried avoiding ROCm and just using amdgpu-pro, because I just want an OpenCL 1.2 compatible runtime for now (so, although Mesa supports Polaris and doesn't destroy my system, it won't do as it's 1.1 only :( ). That rendered the system so unstable I couldn't even login, so I wiped and reinstalled again.

So, I have installed Ubuntu GNOME 16.04 fresh, and then followed the ROCm install. But this time I had the presence of mind to make sure I could boot from grub with a stock kernel. When I re-started, the system hung again when I opened the display app, exactly as before.

What information can I collect about this hanging issue to help debug it? It's pretty consistently reproducible this past week, I just boot, login, and open the screen-orientation app to move my two monitors around, and it freezes. It also freezes without reason at any other time, but the display app seems to do it every time. I can boot into the ROCm Kernel, crash out, and get whatever information is needed to help fix this.

GPU Error 147 - Running ROCM Kernel 4.11.0-kfd-compute-rocm-rel-1.6-148

I am running a crypto currency miner with 6 - AMD RX480 and I have the errors below CONTINUOUSLY on the console when running the kernel in the subject. The errors cause degraded hashing performance on one card. If I remove that card no errors. I have tried different cards and different slots all have the same issues with the mentioned kernel. I also have had amdgpupro 17.10 and 17.30 on the system, same results with both with the mentioned kernel. Other ubuntu 16.04 kernels work fine.

See photo:
gpu_error_147

journalctl spammed with strange messages

Hi!
I'm running ROCm on Debian Unstable (aka. Sid) and because the DKMS-module doesn't work on my default kernel I've managed to get ROCm working by installing everything from the Ubuntu-repo except DKMS and just compiled the kernel from here instead. Everything seems to work fine, but when I check journalctl I see messages like this being spammed about 10 times per second or so. This does not happen when I boot my distros default kernel.

mar 23 16:26:44 kristoffer-debian-desktop kernel: evbug: Event. Dev: input3, Type: 0, Code: 0, Value: 0

How to reset driver/gpu

How can I reset the driver or GPU without rebooting the machine (which is very inconvenient for me)?

A bad kernel has crashed my GPU. vector_copy sample hangs for 10 seconds then crashes with Creating the queue failed.

I tried sudo rmmod amdgpu and sudo rmmod amdkfd.

"sudo rmmod amdgpu” terminated by signal SIGSEGV (Address boundary error)
"sudo rmmod amdkfd” terminated by signal SIGKILL (Forced quit)

after which vector_copy immediately reports Getting a gpu agent failed.

DKMS compilation error on 4.13.0

My system:

Linux stout 4.13.0-1-amd64 #1 SMP Debian 4.13.13-1 (2017-11-16) x86_64 GNU/Linux

gives me this compilation error in the DKMS log file:

/var/lib/dkms/rock/1.7.60-ubuntu/build/include/kcl/kcl_acpi.h:8:49: error: operator '<=' has no left operand
 #if (defined OS_NAME_RHEL) && (OS_VERSION_MAJOR <= 6)
                                                 ^~
In file included from /var/lib/dkms/rock/1.7.60-ubuntu/build/amd/amdgpu/../backport/backport.h:20:0,
                 from <command-line>:0:
/var/lib/dkms/rock/1.7.60-ubuntu/build/include/kcl/kcl_hwmon.h: In function ‘kcl_hwmon_device_register_with_groups’:
/var/lib/dkms/rock/1.7.60-ubuntu/build/include/kcl/kcl_hwmon.h:15:49: error: operator '<=' has no left operand

it seems the issue is that OS_VERSION_MAJOR is not defined. Either making sure it is defined or restructuring the conditional like

#ifdef OS_NAME_RHEL
#if OS_VERSION_MAJOR <= 6
#define KCL_NEED_RHEL_6_WORKAROUND
#endif
#endif

should do the trick.

ROC without IOMMU

Is it possible to use ROC without using amd_iommu_v2?

In the current implementation, amdkfd depends on amd_iommu_v2 kernel module, which is for IOMMU. Due to this dependency, I cannot use ROC in a KVM virtual machine, as QEMU doesn't support both IOMMU emulation and VFIO-passthrough at the same time.

Is it possible to use the framework without iommu module functionality?

Thanks.

ROCK-1.5 fails to boot

Trying t boot the recent ROC-1.5 kernel results in a string of error messages:

AMD-Vi: Event logged
AMD-Vi: Completion-Wait loop timed out

Most devices failed to operate (ata lnk errors, disk storage errors, ...).
I used fedora config and enabled DC during configuration.
Note that stock fedora kernel fails the same way on this machine[0]
ROCK kernel produces few extra errors in dmesg during startup (might be unrelated):

[    0.020470] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 10 CPUID: 0
[    0.020476] [Firmware Bug]: CPU0: Using firmware package id 1 instead of 0
[    0.020479] Last level iTLB entries: 4KB 512, 2MB 1024, 4MB 512
[    0.020480] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 512, 1GB 0
[    0.021477] Freeing SMP alternatives memory: 32K (ffffffffa1197000 - ffffffffa119f000)
[    0.025234] ftrace: allocating 31008 entries in 122 pages
[    0.037531] smpboot: APIC(10) Converting physical 1 to logical package 0
[    0.037533] smpboot: Max logical packages: 2
[    0.037940] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.149160] smpboot: CPU0: AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G (family: 0x15, model: 0x65, stepping: 0x1)
[    0.149164] Performance Events: Fam15h core perfctr, AMD PMU driver.
[    0.149169] ... version:                0
[    0.149169] ... bit width:              48
[    0.149170] ... generic registers:      6
[    0.149170] ... value mask:             0000ffffffffffff
[    0.149171] ... max period:             00007fffffffffff
[    0.149171] ... fixed-purpose events:   0
[    0.149171] ... event mask:             000000000000003f
[    0.150000] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.150137] x86: Booting SMP configuration:
[    0.150139] .... node  #0, CPUs:      #1
[    0.150300] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 11 CPUID: 1
[    0.150302] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 0
[    0.162359]  #2
[    0.162575] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 12 CPUID: 2
[    0.162576] [Firmware Bug]: CPU2: Using firmware package id 1 instead of 0
[    0.175608]  #3
[    0.175804] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 13 CPUID: 3
[    0.175805] [Firmware Bug]: CPU3: Using firmware package id 1 instead of 0

the machine is acer spire e 15 (e5-553g-f55f) using lates BIOS update (2017/04/25, v1.16)

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1448121

When to expect DKMS support

What is the status of DKMS support for allowing to use ROCm with pre-compiled kernels? A related question is when (if ever) will ROCm be in mainline?

An additional request is to make the ROCm module compatible with GrSecurity patches.

checkkconfigsymbols.py - Shell Injection

Shell code in %s may be injected in the execute function because shell is set to True.

Please set shell to "False" and make cmd a LIST to use with subprocess.

Kernel oops on amdgpu load (Polaris / Vega)

On a ppc64el system with this kernel and a WX7100 (Polaris) card, loading the amdgpu module results in a kernel oops. Note that the upstream Linux 4.15 amdgpu module works and allows a full graphical environment to load; the oops is specific to the AMD 4.13 kernel. Oops follows:

[   89.848698] checking generic (600c280010000 500000) vs hw (6000000000000 10000000)
[   89.848800] amdgpu 0000:01:00.0: enabling device (0140 -> 0142)
[   89.915446] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67C4 0x1002:0x0B0D 0x00).
[   89.965406] [drm] register mmio base: 0x00000000
[   89.965458] [drm] register mmio size: 262144
[   89.965502] [drm] PCI I/O BAR is not found.
[   89.965540] [drm] probing gen 2 caps for device 1014:4c1 = 300104/180001e
[   89.965584] [drm] probing mlw for device 1014:4c1 = 300104
[   89.965631] [drm] UVD is enabled in VM mode
[   89.965658] [drm] VCE enabled in VM mode
[   90.299090] [drm] PCI I/O BAR is not found. Using MMIO to access ATOM BIOS
[   90.299092] ATOM BIOS: 113-C9540101-100
[   90.299103] [drm] GPU post is not needed
[   90.299130] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 9-bit
[   90.299147] amdgpu: No suitable DMA available
[   92.836890] amdgpu 0000:01:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[   92.836969] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[   92.837021] [drm] Detected VRAM RAM=8192M, BAR=256M
[   92.837056] [drm] RAM width 256bits GDDR5
[   92.837183] [TTM] Zone  kernel: Available graphics memory: 7471346 kiB
[   92.837227] [TTM] Initializing pool allocator
[   92.837289] [drm] amdgpu: 8192M of VRAM memory ready
[   92.837325] [drm] amdgpu: 8192M of GTT memory ready.
[   92.837383] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   92.837555] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[   92.837607] amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
[   92.837651] amdgpu 0000:01:00.0: (-12) create WB bo failed
[   92.837829] [drm:amdgpu_device_init [amdgpu]] *ERROR* amdgpu_wb_init failed -12
[   92.837912] amdgpu 0000:01:00.0: amdgpu_init failed
[   92.838002] Unable to handle kernel paging request for data at address 0xc00c000085a80000
[   92.838066] Faulting instruction address: 0xc008000005a2f1cc
[   92.838122] Oops: Kernel access of bad area, sig: 11 [#1]
[   92.838166] SMP NR_CPUS=2048
[   92.838168] NUMA
[   92.838200] PowerNV
[   92.838257] Modules linked in: amdgpu(+) mfd_core ttm drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_algo_bit i2c_dev ghash_generic gf128mul ecb snd_hda_codec_hdmi snd_hda_intel xts snd_hda_codec joydev ofpart ctr evdev ipmi_powernv powernv_flash ipmi_devintf cbc snd_hda_core vmx_crypto mtd snd_hwdep ipmi_msghandler at24 opal_prd binfmt_misc snd_aloop snd_pcm snd_timer snd soundcore parport_pc lp parport ip_tables x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache hid_generic usbhid hid xhci_pci xhci_hcd usbcore tg3 ptp pps_core libphy
[   92.838719] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 4.13.0+ #1
[   92.838778] Workqueue: events work_for_cpu_fn
[   92.838823] task: c0000001d35c4700 task.stack: c0000001d35c8000
[   92.838876] NIP: c008000005a2f1cc LR: c0080000059a036c CTR: c008000005a2f178
[   92.838940] REGS: c0000001d35cb4c0 TRAP: 0300   Not tainted  (4.13.0+)
[   92.838993] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[   92.839003]   CR: 28002288  XER: 20040000
[   92.839079] CFAR: c008000005a2f1ac DAR: c00c000085a80000 DSISR: 42000000 SOFTE: 1
               GPR00: c0080000059a036c c0000001d35cb740 c008000005c5bde0 c000000009be0000
               GPR04: c00c000085a80000 0000000000000000 0000000000080000 0000000000000000
               GPR08: 0000000000000001 c008000005a2f178 0000000000000001 c008000004a1e5d8
               GPR12: c008000005a2f178 c00000000fb80000 c000000000128568 c000000009be2f20
               GPR16: c000000009be2f28 c000000009be2f18 c000000009be2f38 c000000009be2f40
               GPR20: c000000009be2f30 0000000000008000 0000000000000400 c000000009be2f38
               GPR24: c000000009be2f40 c000000009be2f30 c000000009be2f18 0000000000000000
               GPR28: 0000000000000000 0000000000000000 c00c000085a80000 0000000000080000
[   92.839738] NIP [c008000005a2f1cc] gmc_v8_0_gart_set_pte_pde+0x54/0x90 [amdgpu]
[   92.839914] LR [c0080000059a036c] amdgpu_gart_unbind+0xa4/0x130 [amdgpu]
[   92.839968] Call Trace:
[   92.839992] [c0000001d35cb740] [c000000009be2720] 0xc000000009be2720 (unreliable)
[   92.840138] [c0000001d35cb780] [c0080000059a036c] amdgpu_gart_unbind+0xa4/0x130 [amdgpu]
[   92.840290] [c0000001d35cb800] [c0080000059a06e8] amdgpu_gart_fini+0x40/0x70 [amdgpu]
[   92.840447] [c0000001d35cb830] [c008000005a30b98] gmc_v8_0_sw_fini+0x50/0x90 [amdgpu]
[   92.840593] [c0000001d35cb860] [c00800000597f1d0] amdgpu_fini+0x208/0x560 [amdgpu]
[   92.840741] [c0000001d35cb910] [c008000005985b5c] amdgpu_device_init+0xcc4/0x1590 [amdgpu]
[   92.840889] [c0000001d35cba30] [c0080000059880fc] amdgpu_driver_load_kms+0xb4/0x2d0 [amdgpu]
[   92.840976] [c0000001d35cbab0] [c0080000044cab7c] drm_dev_register+0x1d4/0x290 [drm]
[   92.841121] [c0000001d35cbb50] [c00800000597d880] amdgpu_pci_probe+0x128/0x1f0 [amdgpu]
[   92.841228] [c0000001d35cbbd0] [c0000000005d851c] local_pci_probe+0x6c/0x140
[   92.841296] [c0000001d35cbc60] [c0000000001199d8] work_for_cpu_fn+0x38/0x60
[   92.843968] [c0000001d35cbc90] [c00000000011ead8] process_one_work+0x248/0x520
[   92.848119] [c0000001d35cbd30] [c00000000011f030] worker_thread+0x280/0x5d0
[   92.851012] [c0000001d35cbdc0] [c00000000012870c] kthread+0x1ac/0x1c0
[   92.851102] [c0000001d35cbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c
[   92.851209] Instruction dump:
[   92.851231] 7cdf3378 7c9e2378 7cbd2b78 7cfc3b78 48000008 e8410018 7be6c6c4 7bbd1828
[   92.852725] 78c64602 7fdeea14 7cc6e378 7c0004ac <f8de0000> 39200001 38600000 992d019c
[   92.852815] ---[ end trace 2915333da62340c0 ]---

EDIT: lspci output for the AMD card:

        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100]
        Flags: fast devsel, IRQ 24, NUMA node 0
        Memory at 6000000000000 (64-bit, prefetchable) [size=256M]
        Memory at 6000010000000 (64-bit, prefetchable) [size=2M]
        I/O ports at <unassigned> [disabled]
        Memory at 600c000000000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at 600c000040000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [370] L1 PM Substates
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

linux-firmware rpm issue

There seems to be a problem with the AMD HSA firmware package for Fedora (I did not check whether this also occurs in Ubuntu):
http://packages.amd.com/rocm/yum/rpm/linux-firmware-201705121727-1.noarch.rpm contains exclusively AMD related firmware, and is thus no replacement or update to the standard linux-firmware rpm (currently linux-firmware-20170313-72.git695f2d6d.fc25.noarch.rpm). Installing (or seemingly just updating) this package leaves the system devoid of any non-AMD firmware.
Due to file name collisions the two rpms cannot installed simultaneously either. There should be a single rpm with the latest firmware from both rpms.

Separated driver from sources as a patch?

Hello I'm looking for include the driver into my gentoo kernel sources and I would like to automatize it.

Is possible to make the driver implementation with patches and somehow includes?

KFD working and patches released on latestest intel nightly for beaver crack

AMD IOMMU initializing GPU failed at if(!dev_data->passthrough) when amdkfd loads GPU device

Hi,

I've been trying to get Vega in Raven Ridge APU running (on HP new ENVY 360). So far, I tracked down where it fails: kfd failed to initialize iommu because the device struct data indicates that passthrough is not supported. Specifically, it fails at line 2306, file drivers/iommu/amd_iommu.c.

Any suggestions on fixing this?

dmesg snippets are attached, showing kfd failed on iommu. Although werid, GPU is recognized as a discrete one.

[    1.096291] AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
[    1.097325] Parsing CRAT table with 1 nodes
[    1.097327] Ignoring ACPI CRAT on non-APU system
[    1.097328] Virtual CRAT table created for CPU
[    1.097329] Parsing CRAT table with 1 nodes
[    1.097329] Creating topology SYSFS entries
[    1.097336] Topology: Add CPU node
[    1.097336] Finished initializing topology
[    1.099346] kfd kfd: Initialized module
[    1.099494] checking generic (e0000000 7f0000) vs hw (e0000000 10000000)
[    1.099495] fb: switching to amdgpudrmfb from VESA VGA
[    1.837632] kfd kfd: Allocated 3969056 bytes on gart
[    1.837693] Virtual CRAT table created for GPU
[    1.837693] Parsing CRAT table with 1 nodes
[    1.837696] Creating topology SYSFS entries
[    1.837742] Topology: Add dGPU node [0x15dd:0x1002]
[    1.837862] kfd kfd: Reserved 2 pages for cwsr.
[    1.837871] kfd kfd: failed to initialize iommu
[    1.837874] kfd kfd: Error resuming kfd
[    1.837886] Creating topology SYSFS entries
[    1.838132] kfd kfd: device 1002:15dd NOT added due to errors
[    1.838142] [drm] Initialized amdgpu 3.20.0 20150101 for 0000:04:00.0 on minor 0

In addition, here is my vector_copy output:

Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded.
Generating function table for finalizer succeeded.
Getting a gpu agent failed.

I fixed a few problems in ROCT and ROCR here and there to make it reach this point of adding gpu agent. However, gpu_agents_ is a vector of size 0 because GPU is not initialized successfully.

Any help is appreciated! Thanks!

disabling compute cores

Hi,

I want to know if the driver provides some functionalities to turn off compute cores as required. I am doing some power studies and it would be good if this can be done. Thank you

remind user to boot to the kfd kernel

If grub2 loader is used, Ubuntu boots to the default old kernel after installing the kfd kernel driver, ROCR runtime won't work and the sample will crashes. To load the kfd kernel, users need to choose the "Advanced Options for Ubuntu" in grub menu then choose the kfd kernel.

Maybe README.md should add a note about that, otherwise it causes surprise for inexperienced users.

amdgpu support (slightly off topic - sorry)

I apologize in advance - I know that this isn't the right place for this. However, it's also a critical intermediate step to getting ROCK going on my R9 Nano and Fury X. And I've had success here before.

Googling is not bringing up an obvious amd equivalent of intel-gfx. Is there such a thing?

I'm trying to bring up the Linux 4.6 amdgpu driver essentially unmodified on FreeBSD using FreeBSD's linuxkpi compatibility shims on a Carrizo (thinkpad e545). I've gotten the 4.6 i915 driver working virtually everywhere using the same approach so there aren't glaring holes in the shims. Nonetheless, the bugs are in the shims, but manifest as driver failures and it's often difficult to trace problems back to linuxkpi shortcomings.

At this instant uvd_v6_0_ring_test_ring is failing - see output below. The driver has made it through the first 10 ring tests and uvd_v6_0_start. At first glance there's nothing obviously different about uvd_v6. Any pointers / suggestions would be much appreciated.

Thanks in advance.

Jun 6 08:00:03 trainwreck kernel: [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x0000:0x0000 0x00).
Jun 6 08:00:03 trainwreck kernel: [drm] register mmio base: 0xF1400000
Jun 6 08:00:03 trainwreck kernel: [drm] register mmio size: 262144
Jun 6 08:00:03 trainwreck kernel: [drm] doorbell mmio base: 0xF0000000
Jun 6 08:00:03 trainwreck kernel: [drm] doorbell mmio size: 8388608
Jun 6 08:00:03 trainwreck kernel: [drm] probing gen 2 caps for device 1002:9874 = 0/0
Jun 6 08:00:03 trainwreck kernel: [drm] probing mlw for device 1002:9874 = 0
Jun 6 08:00:03 trainwreck kernel: [drm:amdgpu_get_bios] ATOMBIOS detected
Jun 6 08:00:03 trainwreck kernel: ATOM BIOS: BR46529.100
Jun 6 08:00:03 trainwreck kernel: [drm:amdgpu_atom_allocate_fb_scratch] atom firmware requested 000fffe0 32kb
Jun 6 08:00:03 trainwreck kernel: [drm:gmc_v8_0_init_microcode]
Jun 6 08:00:03 trainwreck kernel: drmn0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
Jun 6 08:00:03 trainwreck kernel: drmn0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
Jun 6 08:00:03 trainwreck kernel: [drm] Detected VRAM RAM=1024M, BAR=256M
Jun 6 08:00:03 trainwreck kernel: [drm] RAM width 64bits UNKNOWN
Jun 6 08:00:03 trainwreck kernel: Zone kernel: Available graphics memory: 3625188 kiB
Jun 6 08:00:03 trainwreck kernel: Zone dma32: Available graphics memory: 2097152 kiB
Jun 6 08:00:03 trainwreck kernel: Initializing pool allocator
Jun 6 08:00:03 trainwreck kernel: [drm] amdgpu: 1024M of VRAM memory ready
Jun 6 08:00:03 trainwreck kernel: [drm] amdgpu: 1024M of GTT memory ready.
Jun 6 08:00:03 trainwreck kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Jun 6 08:00:03 trainwreck kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
Jun 6 08:00:03 trainwreck kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Jun 6 08:00:03 trainwreck kernel: [drm] Driver supports precise vblank timestamp query.
Jun 6 08:00:03 trainwreck kernel: drmn0: amdgpu: using MSI.
Jun 6 08:00:03 trainwreck kernel: [drm:drm_irq_install] irq=43
Jun 6 08:00:03 trainwreck kernel: [drm] amdgpu: irq initialized.
Jun 6 08:00:03 trainwreck kernel: [drm] amdgpu: dpm initialized
Jun 6 08:00:03 trainwreck kernel: [drm] Connector eDP-1: get mode from tunables:
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.modes.eDP-1
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.default_mode
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_connector_add] adding "eDP-1" to sysfs
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_hotplug_event] generating hotplug event
Jun 6 08:00:03 trainwreck kernel: [drm] Connector DP-1: get mode from tunables:
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.modes.DP-1
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.default_mode
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_connector_add] adding "DP-1" to sysfs
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_hotplug_event] generating hotplug event
Jun 6 08:00:03 trainwreck kernel: [drm] Connector DP-2: get mode from tunables:
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.modes.DP-2
Jun 6 08:00:03 trainwreck kernel: [drm] - kern.vt.fb.default_mode
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_connector_add] adding "DP-2" to sysfs
Jun 6 08:00:03 trainwreck kernel: [drm:drm_sysfs_hotplug_event] generating hotplug event
Jun 6 08:00:03 trainwreck kernel: [drm] AMDGPU Display Connectors
Jun 6 08:00:03 trainwreck kernel: [drm] Connector 0:
Jun 6 08:00:03 trainwreck kernel: [drm] eDP-1
Jun 6 08:00:03 trainwreck kernel: [drm] HPD1
Jun 6 08:00:03 trainwreck kernel: [drm] DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
Jun 6 08:00:03 trainwreck kernel: [drm] Encoders:
Jun 6 08:00:03 trainwreck kernel: [drm] LCD1: INTERNAL_UNIPHY
Jun 6 08:00:03 trainwreck kernel: [drm] Connector 1:
Jun 6 08:00:03 trainwreck kernel: [drm] DP-1
Jun 6 08:00:03 trainwreck kernel: [drm] HPD2
Jun 6 08:00:03 trainwreck kernel: [drm] DDC: 0x486c 0x486c 0x486d 0x486d 0x486e 0x486e 0x486f 0x486f
Jun 6 08:00:03 trainwreck kernel: [drm] Encoders:
Jun 6 08:00:03 trainwreck kernel: [drm] DFP1: INTERNAL_UNIPHY
Jun 6 08:00:03 trainwreck kernel: [drm] Connector 2:
Jun 6 08:00:03 trainwreck kernel: [drm] DP-2
Jun 6 08:00:03 trainwreck kernel: [drm] HPD3
Jun 6 08:00:03 trainwreck kernel: [drm] DDC: 0x4870 0x4870 0x4871 0x4871 0x4872 0x4872 0x4873 0x4873
Jun 6 08:00:03 trainwreck kernel: [drm] Encoders:
Jun 6 08:00:03 trainwreck kernel: [drm] DFP2: INTERNAL_UNIPHY1
Jun 6 08:00:03 trainwreck kernel: [drm:gfx_v8_0_init_microcode]
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 0 use gpu addr 0x0000000040000010, cpu addr 0x0xfffff801240db010
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 1 use gpu addr 0x0000000040000020, cpu addr 0x0xfffff801240db020
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 2 use gpu addr 0x0000000040000030, cpu addr 0x0xfffff801240db030
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 3 use gpu addr 0x0000000040000040, cpu addr 0x0xfffff801240db040
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 4 use gpu addr 0x0000000040000050, cpu addr 0x0xfffff801240db050
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 5 use gpu addr 0x0000000040000060, cpu addr 0x0xfffff801240db060
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 6 use gpu addr 0x0000000040000070, cpu addr 0x0xfffff801240db070
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 7 use gpu addr 0x0000000040000080, cpu addr 0x0xfffff801240db080
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 8 use gpu addr 0x0000000040000090, cpu addr 0x0xfffff801240db090
Jun 6 08:00:03 trainwreck kernel: [drm:sdma_v3_0_init_microcode]
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 9 use gpu addr 0x00000000400000a0, cpu addr 0x0xfffff801240db0a0
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff801240db0b0
Jun 6 08:00:03 trainwreck kernel: [drm] Found UVD firmware Version: 1.80 Family ID: 11
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x0000000000281ac0, cpu addr 0x0xfffff800e0281ac0
Jun 6 08:00:03 trainwreck kernel: [drm] Found VCE firmware Version: 50.17 Binary ID: 3
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 12 use gpu addr 0x00000000400000d0, cpu addr 0x0xfffff801240db0d0
Jun 6 08:00:03 trainwreck kernel: drmn0: fence driver on ring 13 use gpu addr 0x00000000400000e0, cpu addr 0x0xfffff801240db0e0
Jun 6 08:00:03 trainwreck kernel: [drm:amdgpu_ih_process] amdgpu_ih_process: rptr 0, wptr 48
Jun 6 08:00:03 trainwreck kernel: [drm:dce_v11_0_pageflip_irq] amdgpu_crtc->pflip_status = 0 != AMDGPU_FLIP_SUBMITTED(2)
Jun 6 08:00:03 trainwreck last message repeated 2 times
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 0 succeeded in 4 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 1 succeeded in 8 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 2 succeeded in 4 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 3 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 4 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 5 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 6 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 7 succeeded in 1 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 8 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 9 succeeded in 2 usecs
Jun 6 08:00:03 trainwreck kernel: [drm] ring test on 10 succeeded in 2 usecs
Jun 6 08:00:57 trainwreck syslogd: kernel boot file is /boot/kernel/kernel
Jun 6 08:00:57 trainwreck kernel: [drm:0xffffffff829a41a7s] ERROR amdgpu: ring 11 test failed (0xCAFEDEAD)

Question: 1.3 kernel version

Hey,

I've been having a lot of problems with this kernel, on both Ubuntu and Debian Jessie, can you tell me what kernel version the next release will be?

Also, can you tell me why I need a whole kernel at all? I like to use custom kernels for non-mainline schedulers and it really doesn't mix with that.

Cheers

ROC with KVM

Created a new issue from here, as the topic is slightly different.

I am trying to use ROC on a virtualized guest machine.

My machine is
CPU: Intel Core i7 6700
GPU: AMD Radeon RX 480
which satisfies AMD ROC requirements (PCIe 3.0, AtomicOps), and it actually works on the host.

in here, ROCm 1.3 officially announced that it supports kvm passthrough-ed virtual machine. However, in my setup, it is impossible due to the following error when I run the sample on the guest machine.

kfd kfd: skipped device 1002:67df, PCI rejects atomics

Virtualization setup is as follows.
Hypervisor: QEMU 2.7.92 + KVM (kernel 4.11)
QEMU script:

sudo vfio-bind 0000:01:00.0 0000:01:00.1 &&

sudo qemu-system-x86_64 -enable-kvm -cpu host,kvm=off \
-m 8192 -smp 8 -M q35 \
-vga none -nographic \
-usb \
-device usb-host,hostbus=1,hostaddr=2 \
-device vfio-pci,host=01:00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly,file=OVMF_CODE.fd \
-drive if=pflash,format=raw,file=OVMF_VARS.fd \
-hda ubuntu.qcow2 \
-net nic \
-net user,hostfwd=tcp::54321-:22

I didn't use ioh3420 virtualized pcie controller as if there is in the system, both guest machine and host machine freeze during loading amdgpu driver.

But AMD says it officially support virtualized environment, could you tell me more detailed setup that I missed?

Thanks!

I2C Interface like ATOMbios but through driver/cmdline

Is it possible to access the I2C Interfaces that control the Voltage Regulator through the i2c-dev interface ?

this should return the ID of the installed VRM on a sapphire nitro+ (43)
let a=0; while [ $a -lt 20 ] ; do let a=$a+1 ; i2cget -y $a 0x30 0x92 b ; done
but returns 00 (or 27 if used with 2 byte return)

Update README.md to explain the diff with 4.15rc2

Hi,

Now with the latest 4.15rc2 updates, it's a bit confusing to understand how to use ROCm. What is required to get OpenCL working with 4.15rc2 for example?

What is the kfd situation currently?
There are comments in README.md saying things like "not in upstream". These should be changed into "not in upstream as of " because when I see that the README was updated in August I know that this is probably untrue, and if the information can't be trusted then it loses value.

amdkfd module not on initrd by default

I have applied the ROCK-specific patches on top of 4.4 OpenSUSE
Tumbleweed kernel but they did not allow running HSA kernels because
that only works if andkfd module is loaded before amdgpu, which did
not happen because amdkfd was not on the initrd (modprobing amdkfd,
rebuilding initrd and rebooting worked but is of course bad).

After discussing this with Michal Marek, he came up with patch that is
now submitted upstream as
https://lists.freedesktop.org/archives/dri-devel/2016-August/116708.html
and which fixes the issue, my (experimental) SUSE RPMs work out of the
box with it. Therefore, I'd suggest to include it in the next round
of updates (unless you get it from upstream, of course).

Windows support (driver, runtime etc)

See subj.

Wondering why AMD always releases their frameworks on Linux / Unix platforms only.

Guys, do you have in mind that GPGPU developers can't build their products for Windows users (98% of them in mass markets)?

This problem relates to the problem of compiling C++11 - compatible code for AMD GPU as well. I see that AMD puts a lot of efforts to use HIP infrastructure to build nvcc - compatible code (with its own C++ implementation) but all of these efforts are useless considering no Windows support. Yet.

roc-1.6.x (and master) kernel crashes while loading amdgpu with Radeon RX Vega 64

config.txt
lspci.txt

Steps to reproduce

  1. Build the kernel using the attached configuration and commit cb19309257dfd95496e1a2b569b55f6f3881949f.
  2. Blacklist amdgpu kernel module from loading at boot time. If amdgpu is not blacklisted, then the kernel will crash at boot time.
  3. Connect a USB to serial adaptor (to receive kernel messages up to as late as possible), and append the following kernel arguments: systemd.journald.forward_to_console=1 console=ttyUSB0,115200. Boot the machine. My kernel configuration builds in the pl2303 driver, as this is the chip my adaptor uses.
  4. Run modprobe amdgpu.

Expected result

The kernel module should be loaded without crashing the kernel.

Actual results

Upon running modprobe amdgpu:

  • After a few seconds, the fan on the graphics card speeds up. Then it very briefly loses power roughly once every second.
  • The kernel freezes after a few seconds (at around the time the card's fan starts exhibiting the aforementioned symptoms). It does not respond to ICMP ping, or any other input I've tried.
  • Output such as the following is written to the serial port:
[   74.673193] AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
[   74.711966] Parsing CRAT table with 1 nodes
[   74.711979] Ignoring ACPI CRAT on non-APU system
[   74.711982] Virtual CRAT table created for CPU
[   74.711983] Parsing CRAT table with 1 nodes
[   74.711985] Creating topology SYSFS entries
[   74.712001] Topology: Add CPU node
[   74.712002] Finished initializing topology
[   74.713313] kfd kfd: Initialized module
[   74.713504] fb: switching to amdgpudrmfb from VESA VGA
[   74.713548] Console: switching to colour dummy device 80x25
[   74.713779] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1002:0x6B76 0xC1).
[   74.713782] amdgpu 0000:2a:00.0: valid rang is between 4 and 9
[   74.713801] [drm] register mmio base: 0xFE700000
[   74.713802] [drm] register mmio size: 524288
[   74.714218] [drm] probing gen 2 caps for device 1022:1471 = 700d03/e
[   74.714221] [drm] probing mlw for device 1022:1471 = 700d03
[   74.714229] [drm] UVD is enabled in VM mode
[   74.714230] [drm] UVD ENC is enabled in VM mode
[   74.714231] [drm] VCE enabled in VM mode
[   74.714242] amdgpu 0000:2a:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[   74.714258] ATOM BIOS: 113-D0500100-103
[   74.714263] [drm] GPU post is not needed
[   74.714278] [drm] vm size is 262144 GB, block size is 9-bit,fragment size is 9-bit
[   74.714283] amdgpu 0000:2a:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
[   74.714285] amdgpu 0000:2a:00.0: GTT: 256M 0x000000F5FF000000 - 0x000000F60EFFFFFF
[   74.714288] [drm] Detected VRAM RAM=8176M, BAR=256M
[   74.714289] [drm] RAM width 2048bits HBM
[   74.714341] [TTM] Zone  kernel: Available graphics memory: 30879975 kiB
[   74.714342] [TTM] Initializing pool allocator
[   74.714345] [TTM] Initializing DMA pool allocator
[   74.714358] [drm] amdgpu: 8176M of VRAM memory ready
[   74.714360] [drm] amdgpu: 32166M of GTT memory ready.
[   74.714366] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   74.714479] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[   74.714566] amdgpu 0000:2a:00.0: amdgpu: using MSI.
[   74.714639] [drm] amdgpu: irq initialized.
[   74.747722] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   74.749544] amdgpu 0000:2a:00.0: fence driver on ring 0 use gpu addr 0x000000f5ff400008, cpu addr 0xffff8b9337c21008
[   74.749594] amdgpu 0000:2a:00.0: fence driver on ring 1 use gpu addr 0x000000f5ff400010, cpu addr 0xffff8b9337c21010
[   74.749636] amdgpu 0000:2a:00.0: fence driver on ring 2 use gpu addr 0x000000f5ff400018, cpu addr 0xffff8b9337c21018
[   74.749679] amdgpu 0000:2a:00.0: fence driver on ring 3 use gpu addr 0x000000f5ff400028, cpu addr 0xffff8b9337c21028
[   74.749714] amdgpu 0000:2a:00.0: fence driver on ring 4 use gpu addr 0x000000f5ff400030, cpu addr 0xffff8b9337c21030
[   74.749750] amdgpu 0000:2a:00.0: fence driver on ring 5 use gpu addr 0x000000f5ff400038, cpu addr 0xffff8b9337c21038
[   74.749784] amdgpu 0000:2a:00.0: fence driver on ring 6 use gpu addr 0x000000f5ff400048, cpu addr 0xffff8b9337c21048
[   74.749819] amdgpu 0000:2a:00.0: fence driver on ring 7 use gpu addr 0x000000f5ff400050, cpu addr 0xffff8b9337c21050
[   74.749853] amdgpu 0000:2a:00.0: fence driver on ring 8 use gpu addr 0x000000f5ff400058, cpu addr 0xffff8b9337c21058
[   74.749871] amdgpu 0000:2a:00.0: fence driver on ring 9 use gpu addr 0x000000f5ff40006c, cpu addr 0xffff8b9337c2106c
[   74.750372] [drm] use_doorbell being set to: [true]
[   74.750405] amdgpu 0000:2a:00.0: fence driver on ring 10 use gpu addr 0x000000f5ff400074, cpu addr 0xffff8b9337c21074
[   74.750419] [drm] use_doorbell being set to: [true]
[   74.750450] amdgpu 0000:2a:00.0: fence driver on ring 11 use gpu addr 0x000000f5ff40007c, cpu addr 0xffff8b9337c2107c
[   74.751262] [drm] Found UVD firmware Version: 1.68 Family ID: 17
[   74.751270] [drm] PSP loading UVD firmware
[   74.751471] amdgpu 0000:2a:00.0: fence driver on ring 12 use gpu addr 0x000000f400911a60, cpu addr 0xffffb0d608d5aa60
[   74.751510] amdgpu 0000:2a:00.0: fence driver on ring 13 use gpu addr 0x000000f5ff4000ac, cpu addr 0xffff8b9337c210ac
[   74.751534] amdgpu 0000:2a:00.0: fence driver on ring 14 use gpu addr 0x000000f5ff4000bc, cpu addr 0xffff8b9337c210bc
[   74.751885] [drm] Found VCE firmware Version: 53.32 Binary ID: 4
[   74.751894] [drm] PSP loading VCE firmware
[   74.751923] amdgpu 0000:2a:00.0: fence driver on ring 15 use gpu addr 0x000000f5ff4000d4, cpu addr 0xffff8b9337c210d4
[   74.751964] amdgpu 0000:2a:00.0: fence driver on ring 16 use gpu addr 0x000000f5ff4000ec, cpu addr 0xffff8b9337c210ec
[   74.751990] amdgpu 0000:2a:00.0: fence driver on ring 17 use gpu addr 0x000000f5ff4000fc, cpu addr 0xffff8b9337c210fc
[   75.482436] amdgpu: [powerplay] Failed to send message: 0x5c
[   75.684165] amdgpu: [powerplay] [ACG_Enable] ACG BTC Returned Failed Status!
[   75.816447] AMD-Vi: Completion-Wait loop timed out
[   76.421612] AMD-Vi: Completion-Wait loop timed out
[   76.692748] clocksource: timekeeping watchdog on CPU4: Marking clocksource 'tsc' as unstable because the skew is too large:
[   76.692752] clocksource:                       'hpet' wd_now: 41b56863 wd_last: 4131334e mask: ffffffff
[   76.692755] clocksource:                       'tsc' cs_now: 4399a5f60e cs_last: 4351a54bf2 mask: ffffffffffffffff
[   76.692759] sched_clock: Marking unstable (76691535305, 1210257)<-(76983845838, -291100276)
[   76.692762] tsc: Marking TSC unstable due to clocksource watchdog
[   77.297902] amdgpu: [powerplay] Failed to send message: 0x4
[   77.297910] AMD-Vi: Event logged [
[   77.297913] IOTLB_INV_TIMEOUT device=2a:00.0 address=0x00000007faf3b450]
[   78.709929] AMD-Vi: Event logged [
[   78.709931] IOTLB_INV_TIMEOUT device=2a:00.0 address=0x00000007faf3b4f0]
[   78.911642] amdgpu: [powerplay] Failed to send message: 0x4
[   80.525382] amdgpu: [powerplay] Failed to send message: 0x4
[   82.139016] amdgpu: [powerplay] Failed to send message: 0x4

Environment

  • Radeon RX Vega 64 GPU
  • Asus ROG Strix X370-F gaming motherboard
  • AMD Ryzen 1700 CPU
  • Arch Linux

Attached files

  • config.txt: The kernel configuration used.
  • lspci.txt: The result of running lspci -v.

Notes

  • The same (or a very similar) bug occurs if commit bd5536a8fd36b2ad5037712e58c7808fc8b5a377 is used.
  • I found this bug while trying to run HIP programs on the GPU. I concluded that this kernel is required in order to avoid getting an assert failure from the runtime:
    ROCR-Runtime/src/core/runtime/runtime.cpp:162: void core::Runtime::RegisterAgent(core::Agent*): Assertion `system_regions_fine_.size() > 0' failed.
    This kernel's amdkfd is required over the upstream kernel's amdkfd (I tried 4.13, and upstream master from a few days ago) in order to give a non-zero mem_banks_count value from /sys/devices/virtual/kfd/kfd/topology/nodes/0/properties and to get past that assert.

Poor Display with kernel

I'm using rx480 with rock kernel.
When I boot with rock kernel, I have a very poor display with very low resolution.
This problem also applies to the console environment(tty1~5).

Null pointer dereference in update_stream_scaling_settings

This isn't really compute related but it happened with the ROCK kernel so I thought it would be best to report it:

Jun 14 06:27:34 mammut kernel: [25320.162048] [drm] Atomic commit: SET crtc id 0: [ffff9c40b1acd000]
Jun 14 06:27:34 mammut kernel: [25320.162050] [drm] dc_commit_streams: 1 streams
Jun 14 06:27:34 mammut kernel: [25320.162052] [drm] core_stream 0x62476400: src: 0, 0, 3840, 2160; dst: 0, 0, 3840, 2160, colorSpace:1
Jun 14 06:27:34 mammut kernel: [25320.162053] [drm] pix_clk_khz: 533250, h_total: 4000, v_total: 2222, pixelencoder:1, displaycolorDepth:2
Jun 14 06:27:34 mammut kernel: [25320.162053] [drm] sink name: U24E850, serial: 810373974
Jun 14 06:27:34 mammut kernel: [25320.162054] [drm] link: 1
Jun 14 06:27:34 mammut kernel: [25320.162091] [drm] dce_get_required_clocks_state: clocks unsupported
Jun 14 06:27:34 mammut kernel: [25320.163545] [drm] Link: 1 eDP panel mode supported: 0 eDP panel mode enabled: 0
Jun 14 06:27:34 mammut kernel: [25320.167649] [drm] [LKTN] [DP][ConnIdx:1] HBR2x4 pass VS=1, PE=0^
Jun 14 06:27:34 mammut kernel: [25320.168288] [drm] [Mode] [DP][ConnIdx:1] {3840x2160, 4000x2222@533250Khz}^
Jun 14 06:27:34 mammut kernel: [25320.179277] [drm] Atomic commit: SET crtc id 1: [ffff9c40b1ace000]
Jun 14 06:27:34 mammut kernel: [25320.179280] [drm] dc_commit_streams: 2 streams
Jun 14 06:27:34 mammut kernel: [25320.179281] [drm] core_stream 0x62476400: src: 0, 0, 3840, 2160; dst: 0, 0, 3840, 2160, colorSpace:1
Jun 14 06:27:34 mammut kernel: [25320.179282] [drm] pix_clk_khz: 533250, h_total: 4000, v_total: 2222, pixelencoder:1, displaycolorDepth:2
Jun 14 06:27:34 mammut kernel: [25320.179283] [drm] sink name: U24E850, serial: 810373974
Jun 14 06:27:34 mammut kernel: [25320.179283] [drm] link: 1
Jun 14 06:27:34 mammut kernel: [25320.179284] [drm] core_stream 0x2d771400: src: 0, 0, 3840, 2160; dst: 0, 0, 3840, 2160, colorSpace:1
Jun 14 06:27:34 mammut kernel: [25320.179285] [drm] pix_clk_khz: 533250, h_total: 4000, v_total: 2222, pixelencoder:1, displaycolorDepth:2
Jun 14 06:27:34 mammut kernel: [25320.179285] [drm] sink name: DELL P2415Q, serial: 808925260
Jun 14 06:27:34 mammut kernel: [25320.179285] [drm] link: 2
Jun 14 06:27:34 mammut kernel: [25320.179325] [drm] dce_get_required_clocks_state: clocks unsupported
Jun 14 06:27:34 mammut kernel: [25320.180762] [drm] Link: 2 eDP panel mode supported: 0 eDP panel mode enabled: 0
Jun 14 06:27:34 mammut kernel: [25320.184879] [drm] [LKTN] [DP][ConnIdx:2] HBR2x4 pass VS=1, PE=0^
Jun 14 06:27:34 mammut rtkit-daemon[1785]: Supervising 7 threads of 2 processes of 2 users.
Jun 14 06:27:34 mammut rtkit-daemon[1785]: Successfully made thread 3781 of process 2465 (n/a) owned by '1000' RT at priority 5.
Jun 14 06:27:34 mammut rtkit-daemon[1785]: Supervising 8 threads of 2 processes of 2 users.
Jun 14 06:27:34 mammut kernel: [25320.195473] [drm] GSL: Setting-up...
Jun 14 06:27:34 mammut kernel: [25320.195477] [drm] GSL: enabling trigger-reset
Jun 14 06:27:34 mammut kernel: [25320.195480] [drm] GSL: waiting for reset to occur.
Jun 14 06:27:34 mammut kernel: [25320.195505] [drm] GSL: reset occurred at wait count: 1
Jun 14 06:27:34 mammut kernel: [25320.195505] [drm] GSL: disabling trigger-reset.
Jun 14 06:27:34 mammut kernel: [25320.195507] [drm] GSL: Restoring register states.
Jun 14 06:27:34 mammut kernel: [25320.195509] [drm] GSL: Set-up complete.
Jun 14 06:27:34 mammut kernel: [25320.196985] [drm] [Mode] [DP][ConnIdx:1] {3840x2160, 4000x2222@533250Khz}^
Jun 14 06:27:34 mammut kernel: [25320.196986] [drm] [Mode] [DP][ConnIdx:2] {3840x2160, 4000x2222@533250Khz}^
Jun 14 06:27:34 mammut kernel: [25320.257333] [drm] link=2, dc_sink_in= (null) is now Disconnected
Jun 14 06:27:34 mammut kernel: [25320.257334] [drm] DCHPD: connector_id=2: Old sink=ffff9c40b011ac00 New sink= (null)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): EDID vendor "SAM", prod id 3279
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Using hsync ranges from config file
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Using vrefresh ranges from config file
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Printing DDC gathered Modelines:
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "3840x2160"x0.0 533.25 3840 3888 3920 4000 2160 2163 2168 2222 +hsync -vsync (133.3 kHz eP)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1920x1080"x0.0 148.50 1920 2008 2052 2200 1080 1084 1089 1125 +hsync +vsync (67.5 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "2560x1440"x0.0 241.50 2560 2608 2640 2720 1440 1443 1448 1481 +hsync -vsync (88.8 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "800x600"x0.0 36.00 800 824 896 1024 600 601 603 625 +hsync +vsync (35.2 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x800"x0.0 83.50 1280 1352 1480 1680 800 803 809 831 -hsync +vsync (49.7 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x720"x59.9 74.50 1280 1344 1472 1664 720 723 728 748 -hsync +vsync (44.8 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1600x900"x59.9 118.25 1600 1696 1856 2112 900 903 908 934 -hsync +vsync (56.0 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "720x576"x0.0 27.00 720 732 796 864 576 581 586 625 -hsync -vsync (31.2 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): EDID vendor "SAM", prod id 3279
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Using hsync ranges from config file
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Using vrefresh ranges from config file
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Printing DDC gathered Modelines:
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "3840x2160"x0.0 533.25 3840 3888 3920 4000 2160 2163 2168 2222 +hsync -vsync (133.3 kHz eP)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1920x1080"x0.0 148.50 1920 2008 2052 2200 1080 1084 1089 1125 +hsync +vsync (67.5 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "2560x1440"x0.0 241.50 2560 2608 2640 2720 1440 1443 1448 1481 +hsync -vsync (88.8 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "800x600"x0.0 36.00 800 824 896 1024 600 601 603 625 +hsync +vsync (35.2 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x800"x0.0 83.50 1280 1352 1480 1680 800 803 809 831 -hsync +vsync (49.7 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x720"x59.9 74.50 1280 1344 1472 1664 720 723 728 748 -hsync +vsync (44.8 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1600x900"x59.9 118.25 1600 1696 1856 2112 900 903 908 934 -hsync +vsync (56.0 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz e)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Modeline "720x576"x0.0 27.00 720 732 796 864 576 581 586 625 -hsync -vsync (31.2 kHz e)
Jun 14 06:27:34 mammut kernel: [25320.262186] [drm] Atomic commit: RESET. crtc id 1:[ffff9c40b1ace000]
Jun 14 06:27:34 mammut kernel: [25320.262202] [drm] dc_commit_streams: 1 streams
Jun 14 06:27:34 mammut kernel: [25320.262204] [drm] core_stream 0x62476400: src: 0, 0, 3840, 2160; dst: 0, 0, 3840, 2160, colorSpace:1
Jun 14 06:27:34 mammut kernel: [25320.262204] [drm] pix_clk_khz: 533250, h_total: 4000, v_total: 2222, pixelencoder:1, displaycolorDepth:2
Jun 14 06:27:34 mammut kernel: [25320.262205] [drm] sink name: U24E850, serial: 810373974
Jun 14 06:27:34 mammut kernel: [25320.262205] [drm] link: 1
Jun 14 06:27:34 mammut kernel: [25320.269479] [drm] link=1, dc_sink_in= (null) is now Disconnected
Jun 14 06:27:34 mammut kernel: [25320.269480] [drm] DCHPD: connector_id=1: Old sink=ffff9c40b0118c00 New sink= (null)
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): Allocate new frame buffer 3840x2160
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: (II) AMDGPU(0): => pitch 15360 bytes
Jun 14 06:27:34 mammut kernel: [25320.312179] [drm] dce_get_required_clocks_state: clocks unsupported
Jun 14 06:27:34 mammut kernel: [25320.312200] [drm] [Mode] [DP][ConnIdx:1] {3840x2160, 4000x2222@533250Khz}^
Jun 14 06:27:34 mammut kernel: [25320.312575] [drm] Atomic commit: RESET. crtc id 1:[ffff9c40b1ace000]
Jun 14 06:27:34 mammut kernel: [25320.314525] [drm] Atomic commit: RESET. crtc id 0:[ffff9c40b1acd000]
Jun 14 06:27:34 mammut kernel: [25320.314538] [drm] dc_commit_streams: 0 streams
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: New FB's refcnt was 0 in drmmode_fb_reference
Jun 14 06:27:34 mammut /usr/lib/gdm3/gdm-x-session[2118]: Old FB's refcnt was 0 in drmmode_fb_reference
Jun 14 06:27:34 mammut org.gnome.Shell.desktop[2445]: Window manager warning: Configuring CRTC 79 with mode 90 (3840 x 2160 @ 59,996624) at position 0, 0 and transform 0 failed
Jun 14 06:27:34 mammut kernel: [25320.362744] [drm:create_stream_for_sink [amdgpu]] ERROR Failed to create stream for sink!
Jun 14 06:27:34 mammut kernel: [25320.362792] [drm:create_stream_for_sink [amdgpu]] ERROR Failed to create stream for sink!
Jun 14 06:27:34 mammut kernel: [25320.362794] [drm] Atomic commit: SET crtc id 0: [ffff9c40b1acd000]
Jun 14 06:27:34 mammut kernel: [25320.362814] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR dm_dc_surface_commit: Failed to obtain stream on crtc (0)!
Jun 14 06:27:34 mammut kernel: [25320.362847] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR dm_dc_surface_commit: Failed to obtain stream on crtc (0)!
Jun 14 06:27:34 mammut kernel: [25320.363023] [drm:dc_stream_set_cursor_attributes [amdgpu]] ERROR DC: dc_stream is NULL!
Jun 14 06:27:34 mammut kernel: [25320.363043] [drm:dm_set_cursor [amdgpu]] ERROR DC failed to set cursor attributes
Jun 14 06:27:34 mammut kernel: [25320.363061] [drm:dc_stream_set_cursor_position [amdgpu]] ERROR DC: dc_stream is NULL!
Jun 14 06:27:34 mammut kernel: [25320.363079] [drm:dm_set_cursor [amdgpu]] ERROR DC failed to set cursor position
Jun 14 06:27:34 mammut kernel: [25320.363533] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Jun 14 06:27:34 mammut kernel: [25320.363553] IP: [] update_stream_scaling_settings+0x42/0x190 [amdgpu]
Jun 14 06:27:34 mammut kernel: [25320.363589] PGD 0
Jun 14 06:27:34 mammut kernel: [25320.363594]
Jun 14 06:27:34 mammut kernel: [25320.363598] Oops: 0000 [#1] SMP
Jun 14 06:27:34 mammut kernel: [25320.363606] Modules linked in: binfmt_misc eeepc_wmi asus_wmi sparse_keymap video joydev input_leds kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_hda_codec_hdmi gf128mul glue_helper ablk_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm serio_raw snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore ccp shpchp i2c_piix4 mac_hid i2c_designware_platform 8250_dw i2c_designware_core sbs sbshc max6650 parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor raid6_pq hid_generic usbhid hid amdkfd mxm_wmi amd_iommu_v2 amdgpu ttm drm_kms_helper syscopyarea sysfillrect sysimgblt psmouse fb_sys_fops igb drm dca ptp pps_core i2c_algo_bit ahci libahci
Jun 14 06:27:34 mammut kernel: [25320.363798] gpio_amdpt wmi fjes gpio_generic
Jun 14 06:27:34 mammut kernel: [25320.363809] CPU: 0 PID: 2120 Comm: Xorg Not tainted 4.9.0-kfd+ #2
Jun 14 06:27:34 mammut kernel: [25320.363820] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0803 06/05/2017
Jun 14 06:27:34 mammut kernel: [25320.363838] task: ffff9c4084a42ac0 task.stack: ffffaa4d08b98000
Jun 14 06:27:34 mammut kernel: [25320.363850] RIP: 0010:[] [] update_stream_scaling_settings+0x42/0x190 [amdgpu]
Jun 14 06:27:34 mammut kernel: [25320.363888] RSP: 0018:ffffaa4d08b9baf8 EFLAGS: 00010286
Jun 14 06:27:34 mammut kernel: [25320.363898] RAX: 0000000000000000 RBX: ffff9c40b1acd000 RCX: 0000000000000000
Jun 14 06:27:34 mammut kernel: [25320.363912] RDX: 0000000000000000 RSI: ffff9c3ffe97fac0 RDI: ffff9c4064c95d00
Jun 14 06:27:34 mammut kernel: [25320.363925] RBP: ffffaa4d08b9bb28 R08: 0000000000000001 R09: ffff9c40af8a0000
Jun 14 06:27:34 mammut kernel: [25320.363939] R10: 0000000000000f00 R11: 0000000000000870 R12: ffff9c40b0109000
Jun 14 06:27:34 mammut kernel: [25320.363952] R13: ffff9c40b4e0dae0 R14: 0000000000000000 R15: ffff9c4064c95c00
Jun 14 06:27:34 mammut kernel: [25320.363966] FS: 00007fb60d1e7a40(0000) GS:ffff9c40be600000(0000) knlGS:0000000000000000
Jun 14 06:27:34 mammut kernel: [25320.363981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 06:27:34 mammut kernel: [25320.363992] CR2: 0000000000000010 CR3: 00000007c4978000 CR4: 00000000003406f0
Jun 14 06:27:34 mammut kernel: [25320.364006] Stack:
Jun 14 06:27:34 mammut kernel: [25320.364010] ffffaa4d08b9bb28 0000000000000000 0000000000000000 ffff9c4000000000
Jun 14 06:27:34 mammut kernel: [25320.364027] ffff9c4000000000 ffff9c40b1acd000 ffffaa4d08b9bc10 ffffffffc050390d
Jun 14 06:27:34 mammut kernel: [25320.364044] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jun 14 06:27:34 mammut kernel: [25320.364061] Call Trace:
Jun 14 06:27:34 mammut kernel: [25320.364084] [] amdgpu_dm_atomic_commit_tail+0x66d/0xbe0 [amdgpu]
Jun 14 06:27:34 mammut kernel: [25320.364101] [] ? kmem_cache_alloc_trace+0x181/0x190
Jun 14 06:27:34 mammut kernel: [25320.364117] [] ? drm_atomic_helper_setup_commit+0x2f7/0x340 [drm_kms_helper]
Jun 14 06:27:34 mammut kernel: [25320.364136] [] ? drm_atomic_helper_wait_for_dependencies+0xb6/0x180 [drm_kms_helper]
Jun 14 06:27:34 mammut kernel: [25320.364157] [] commit_tail+0x3e/0x60 [drm_kms_helper]
Jun 14 06:27:34 mammut kernel: [25320.364172] [] drm_atomic_helper_commit+0x9c/0xe0 [drm_kms_helper]
Jun 14 06:27:34 mammut kernel: [25320.364193] [] drm_atomic_commit+0x49/0x50 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364208] [] drm_atomic_helper_connector_set_property+0x87/0xc0 [drm_kms_helper]
Jun 14 06:27:34 mammut kernel: [25320.364231] [] drm_mode_connector_set_obj_prop+0x3c/0x70 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364250] [] drm_mode_obj_set_property_ioctl+0xfc/0x140 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364269] [] drm_mode_connector_property_set_ioctl+0x30/0x40 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364289] [] drm_ioctl+0x1f9/0x480 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364305] [] ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm]
Jun 14 06:27:34 mammut kernel: [25320.364321] [] ? ep_ptable_queue_proc+0xa0/0xa0
Jun 14 06:27:34 mammut kernel: [25320.364346] [] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
Jun 14 06:27:34 mammut kernel: [25320.364359] [] do_vfs_ioctl+0x94/0x5c0
Jun 14 06:27:34 mammut kernel: [25320.364371] [] ? __sys_recvmsg+0x62/0x80
Jun 14 06:27:34 mammut kernel: [25320.364383] [] SyS_ioctl+0x79/0x90
Jun 14 06:27:34 mammut kernel: [25320.364394] [] entry_SYSCALL_64_fastpath+0x1e/0xad
Jun 14 06:27:34 mammut kernel: [25320.364407] Code: 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10 00 00 00 00 4c 8b 48 28 0f 84 d1 00 00 00 8b 46 20 44 8b 57 5c 48 89 d1 44 8b 5f 70 <44> 8b 42 10 8b 7a 28 83 f8 03 44 89 54 24 10 44 89 5c 24 14 44
Jun 14 06:27:34 mammut kernel: [25320.364487] RIP [] update_stream_scaling_settings+0x42/0x190 [amdgpu]
Jun 14 06:27:34 mammut kernel: [25320.364520] RSP
Jun 14 06:27:34 mammut kernel: [25320.364527] CR2: 0000000000000010
Jun 14 06:27:34 mammut kernel: [25320.383596] ---[ end trace 395a6a2c42173531 ]---
Jun 14 06:27:35 mammut kernel: [25321.188572] [drm] [Detect] [DP][ConnIdx:2] Rx Caps: 12 14 C4 01 01 00 01 82 02 02 06 00 00 00 00 ^
Jun 14 06:27:35 mammut kernel: [25321.196036] [drm] [LKTN] [DP][ConnIdx:2] HBR2x4 pass VS=2, PE=1^
Jun 14 06:27:35 mammut kernel: [25321.207215] [drm] [Detect] [DP][ConnIdx:2] DELL P2415Q: [Block 0] 00 FF FF FF FF FF FF 00 10 AC BE A0 4C 38 37 30 02 19 01 04 B5 35 1E 78 3A E2 45 A8 55 4D A3 26 0B 50 54 A5 4B 00 71 4F 81 80 A9 C0 A9 40 D1 C0 E1 00 D1 00 01 01 4D D0 00 A0 F0 70 3E 80 3E 30 35 00 0F 28 21 00 00 1A 00 00 00 FF 00 50 32 50 43 32 35 31 35 30 37 38 4C 0A 00 00 00 FC 00 44 45 4C 4C 20 50 32 34 31 35 51 0A 20 00 00 00 FD 00 1D 4C 1E 8C 36 00 0A 20 20 20 20 20 20 01 4B ^
Jun 14 06:27:35 mammut kernel: [25321.207224] [drm] [Detect] [DP][ConnIdx:2] DELL P2415Q: [Block 1] 02 03 1D F1 50 90 05 04 02 07 16 01 14 1F 12 13 20 21 22 03 06 23 09 07 07 83 01 00 00 02 3A 80 18 71 38 2D 40 58 2C 25 00 0F 28 21 00 00 1E 01 1D 80 18 71 1C 16 20 58 2C 25 00 0F 28 21 00 00 9E 56 5E 00 A0 A0 A0 29 50 30 20 35 00 0F 28 21 00 00 1A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3D ^
Jun 14 06:27:35 mammut kernel: [25321.207226] [drm] dc_link_detect: manufacturer_id = AC10, product_id = A0BE, serial_number = 3037384C, manufacture_week = 2, manufacture_year = 25, display_name = DELL P2415Q, speaker_flag = 1, audio_mode_count = 1
Jun 14 06:27:35 mammut kernel: [25321.207228] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 7, sample_size = 7
Jun 14 06:27:35 mammut kernel: [25321.207229] [drm] link=2, dc_sink_in=ffff9c4064c90400 is now Connected
Jun 14 06:27:35 mammut kernel: [25321.207230] [drm] DCHPD: connector_id=2: Old sink= (null) New sink=ffff9c4064c90400
Jun 14 06:27:35 mammut kernel: [25321.568020] [drm] [Detect] [DP][ConnIdx:1] Rx Caps: 12 14 C4 01 01 00 01 C0 02 02 06 00 00 00 00 ^
Jun 14 06:27:35 mammut kernel: [25321.575465] [drm] [LKTN] [DP][ConnIdx:1] HBR2x4 pass VS=2, PE=1^
Jun 14 06:27:35 mammut kernel: [25321.586328] [drm] [Detect] [DP][ConnIdx:1] U24E850: [Block 0] 00 FF FF FF FF FF FF 00 4C 2D CF 0C 56 53 4D 30 16 19 01 04 A5 34 1D 78 3B 12 55 A9 54 4D 9F 25 0C 50 54 23 08 00 81 00 81 C0 81 80 A9 C0 B3 00 95 00 01 01 01 01 4D D0 00 A0 F0 70 3E 80 30 20 35 00 09 25 21 00 00 1A 00 00 00 FD 00 28 3C 87 87 3C 01 0A 20 20 20 20 20 20 00 00 00 FC 00 55 32 34 45 38 35 30 0A 20 20 20 20 20 00 00 00 FF 00 48 54 48 47 35 30 30 32 35 32 0A 20 20 01 C2 ^
Jun 14 06:27:35 mammut kernel: [25321.586337] [drm] [Detect] [DP][ConnIdx:1] U24E850: [Block 1] 02 03 0E F0 41 10 23 09 07 07 83 01 00 00 02 3A 80 18 71 38 2D 40 58 2C 45 00 09 25 21 00 00 1E 56 5E 00 A0 A0 A0 29 50 30 20 35 00 09 25 21 00 00 1A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 D3 ^
Jun 14 06:27:35 mammut kernel: [25321.586339] [drm] dc_link_detect: manufacturer_id = 2D4C, product_id = CCF, serial_number = 304D5356, manufacture_week = 22, manufacture_year = 25, display_name = U24E850, speaker_flag = 1, audio_mode_count = 1
Jun 14 06:27:35 mammut kernel: [25321.586340] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 7, sample_size = 7
Jun 14 06:27:35 mammut kernel: [25321.586341] [drm] link=1, dc_sink_in=ffff9c4064c93400 is now Connected
Jun 14 06:27:35 mammut kernel: [25321.586342] [drm] DCHPD: connector_id=1: Old sink= (null) New sink=ffff9c4064c93400
Jun 14 06:27:37 mammut kernel: [25323.207791] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:42 mammut kernel: [25328.829002] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:43 mammut kernel: [25329.847442] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:49 mammut kernel: [25335.083962] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:50 mammut kernel: [25336.097016] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:55 mammut kernel: [25341.327560] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:27:56 mammut kernel: [25342.347293] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:01 mammut kernel: [25347.580611] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:02 mammut kernel: [25348.597300] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:07 mammut kernel: [25353.835894] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:08 mammut kernel: [25354.856766] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:14 mammut kernel: [25360.099215] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:15 mammut kernel: [25361.116691] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:20 mammut kernel: [25366.342524] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:21 mammut kernel: [25367.356617] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:26 mammut kernel: [25372.595842] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
Jun 14 06:28:27 mammut kernel: [25373.616542] [drm] amdgpu_dm_irq_schedule_work FAILED src 4

GPU fan noise after ROCK install

Hello,

I have an Intel Haswell + R9 Nano machine with Ubuntu 14.04. After a clean install of the OS, I followed the instructions to install ROCK. After rebooting with kfd kernel, my GPU fan is always in high speed and the fan noise is annoying.

The same machine is setup for dual boot with Windows. On windows side, the problem does NOT happen with Catalyst drivers.

Also in the past, I have used the same machine with Ubuntu Linux and earlier version of Boltzmann stack (from January) without issues.

I would like a solution to this fan problem. Is this known issue? Is there a fix?

X11 problem with 4.11.0-kfd kernel

The latest 4.11.0-kfd kernel has a problem with X on my two Ryzen 1700X / Radeon 480 boxes (running Fedora 26). There is no such issue with the previous 4.9.0-kfd+ kernel. Xorg.0.log says:

...
[    19.157] (II) AMDGPU: Driver for AMD Radeon:
	All GPUs supported by the amdgpu kernel driver
[    19.160] (II) [KMS] drm report modesetting isn't supported.
[    19.160] (EE) Screen 0 deleted because of no matching config section.
[    19.160] (II) UnloadModule: "amdgpu"
[    19.160] (EE) Device(s) detected, but none match those in the config file.
[    19.160] (EE) Fatal server error:
[    19.160] (EE) no screens found(EE) 
...

xorg.conf Device section:

...
Section "Device"
	Identifier  "AMD"
	Driver      "amdgpu"
	Option	"AccelMethod" "glamor"
      	Option	"DRI3" "1"
      	Option	"TearFree" "on"
EndSection
...

The amdgpu and amdkfd modules are present:

# lsmod | grep amd
amdkfd                208896  1
amd_iommu_v2           20480  1 amdkfd
amdgpu               2490368  0
edac_mce_amd           28672  0
ttm                    98304  1 amdgpu
drm_kms_helper        143360  1 amdgpu
kvm_amd              2179072  0
kvm                   581632  1 kvm_amd
drm                   344064  3 amdgpu,ttm,drm_kms_helper
gpio_amdpt             16384  0
gpio_generic           16384  1 gpio_amdpt
i2c_algo_bit           16384  2 igb,amdgpu

dmesg says

...
[    6.330267] [drm] amdgpu kernel modesetting enabled.
[    6.337185] AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
[    6.382485] Parsing CRAT table with 0 nodes
[    6.382540] Virtual CRAT table created for CPU
[    6.382592] Parsing CRAT table with 1 nodes
[    6.382645] Creating topology SYSFS entries
[    6.382706] Topology: Add CPU node
[    6.382756] Finished initializing topology
[    6.384510] kfd kfd: Initialized module
[    6.384928] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x148C:0x2372 0xC7).
[    6.385005] [drm] register mmio base: 0xFE900000
[    6.385058] [drm] register mmio size: 262144
[    6.385117] [drm] probing gen 2 caps for device 1022:1453 = 733903/e
[    6.385174] [drm] probing mlw for device 1022:1453 = 733903
[    6.385234] [drm] UVD is enabled in VM mode
[    6.385286] [drm] VCE enabled in VM mode
[    6.385508] amdgpu 0000:29:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    6.385582] ATOM BIOS: xxx-xxx-xxx
[    6.385637] [drm] GPU post is not needed
[    6.385699] [drm] vm size is 128 GB, block size is 14-bit
[    6.386535] amdgpu 0000:29:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[    6.386601] amdgpu 0000:29:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    6.386665] [drm] Detected VRAM RAM=8192M, BAR=256M
[    6.386718] [drm] RAM width 256bits GDDR5
[    6.386817] [TTM] Zone  kernel: Available graphics memory: 30818988 kiB
[    6.386876] [TTM] Initializing pool allocator
[    6.386931] [TTM] Initializing DMA pool allocator
[    6.386996] [drm] amdgpu: 8192M of VRAM memory ready
[    6.387051] [drm] amdgpu: 32103M of GTT memory ready.
[    6.387113] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    6.387239] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[    6.387341] amdgpu 0000:29:00.0: amdgpu: using MSI.
[    6.387409] [drm] amdgpu: irq initialized.
[    6.558790] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    6.562838] amdgpu 0000:29:00.0: fence driver on ring 0 use gpu addr 0x0000000000400008, cpu addr 0xffff8807e4c8a008
[    6.562944] amdgpu 0000:29:00.0: fence driver on ring 1 use gpu addr 0x0000000000400018, cpu addr 0xffff8807e4c8a018
[    6.563043] amdgpu 0000:29:00.0: fence driver on ring 2 use gpu addr 0x0000000000400028, cpu addr 0xffff8807e4c8a028
[    6.563140] amdgpu 0000:29:00.0: fence driver on ring 3 use gpu addr 0x0000000000400038, cpu addr 0xffff8807e4c8a038
[    6.563234] amdgpu 0000:29:00.0: fence driver on ring 4 use gpu addr 0x0000000000400048, cpu addr 0xffff8807e4c8a048
[    6.563332] amdgpu 0000:29:00.0: fence driver on ring 5 use gpu addr 0x0000000000400058, cpu addr 0xffff8807e4c8a058
[    6.563423] amdgpu 0000:29:00.0: fence driver on ring 6 use gpu addr 0x0000000000400068, cpu addr 0xffff8807e4c8a068
[    6.563519] amdgpu 0000:29:00.0: fence driver on ring 7 use gpu addr 0x0000000000400078, cpu addr 0xffff8807e4c8a078
[    6.563610] amdgpu 0000:29:00.0: fence driver on ring 8 use gpu addr 0x0000000000400088, cpu addr 0xffff8807e4c8a088
[    6.563691] amdgpu 0000:29:00.0: fence driver on ring 9 use gpu addr 0x000000000040009c, cpu addr 0xffff8807e4c8a09c
[    6.564062] amdgpu 0000:29:00.0: fence driver on ring 10 use gpu addr 0x00000000004000ac, cpu addr 0xffff8807e4c8a0ac
[    6.564157] amdgpu 0000:29:00.0: fence driver on ring 11 use gpu addr 0x00000000004000bc, cpu addr 0xffff8807e4c8a0bc
[    6.565147] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[    6.565467] amdgpu 0000:29:00.0: fence driver on ring 12 use gpu addr 0x000000f4002ad420, cpu addr 0xffffc90008e5a420
[    6.566011] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[    6.566135] amdgpu 0000:29:00.0: fence driver on ring 13 use gpu addr 0x00000000004000dc, cpu addr 0xffff8807e4c8a0dc
[    6.566238] amdgpu 0000:29:00.0: fence driver on ring 14 use gpu addr 0x00000000004000ec, cpu addr 0xffff8807e4c8a0ec
[    6.612194] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
[    6.622844] [drm] DAL is enabled
[    6.623012] [drm] DM_PPLIB: values for Engine clock
[    6.623069] [drm] DM_PPLIB:	 30000
[    6.623123] [drm] DM_PPLIB:	 60800
[    6.623177] [drm] DM_PPLIB:	 91900
[    6.623230] [drm] DM_PPLIB:	 108800
[    6.623284] [drm] DM_PPLIB:	 115600
[    6.623338] [drm] DM_PPLIB:	 120300
[    6.623391] [drm] DM_PPLIB:	 124800
[    6.623445] [drm] DM_PPLIB:	 127900
[    6.623499] [drm] DM_PPLIB: Warning: using default validation clocks!
[    6.623559] [drm] DM_PPLIB: Validation clocks:
[    6.623616] [drm] DM_PPLIB:    engine_max_clock: 72000
[    6.623673] [drm] DM_PPLIB:    memory_max_clock: 80000
[    6.623730] [drm] DM_PPLIB:    level           : 0
[    6.623787] [drm] DM_PPLIB: reducing engine clock level from 8 to 2
[    6.623847] [drm] DM_PPLIB: values for Memory clock
[    6.623904] [drm] DM_PPLIB:	 30000
[    6.623958] [drm] DM_PPLIB:	 200000
[    6.624011] [drm] DM_PPLIB: Warning: using default validation clocks!
[    6.624071] [drm] DM_PPLIB: Validation clocks:
[    6.624127] [drm] DM_PPLIB:    engine_max_clock: 72000
[    6.624184] [drm] DM_PPLIB:    memory_max_clock: 80000
[    6.624241] [drm] DM_PPLIB:    level           : 0
[    6.624298] [drm] DM_PPLIB: reducing memory clock level from 2 to 1
[    6.624358] [drm] DC: create_links: connectors_num: physical:5, virtual:0
[    6.624421] [drm] Connector[0] description:signal 32
[    6.624480] [drm] Using channel: CHANNEL_ID_DDC1 [1]
[    6.624552] [drm] Connector[1] description:signal 32
[    6.624610] [drm] Using channel: CHANNEL_ID_DDC3 [3]
[    6.624681] [drm] Connector[2] description:signal 32
[    6.624739] [drm] Using channel: CHANNEL_ID_DDC2 [2]
[    6.624810] [drm] Connector[3] description:signal 4
[    6.624868] [drm] Using channel: CHANNEL_ID_DDC4 [4]
[    6.624939] [drm] Connector[4] description:signal 2
[    6.624997] [drm] Using channel: CHANNEL_ID_DDC6 [6]
[    6.635261] [drm] Display Core initialized
[    6.635352] [drm] amdgpu: freesync_module init done ffff8807fa70dcc0.
[    6.635651] [drm] link=0, dc_sink_in=          (null) is now Disconnected
[    6.635713] [drm] DCHPD: connector_id=0: dc_sink didn't change.
[    6.635879] [drm] link=1, dc_sink_in=          (null) is now Disconnected
[    6.635941] [drm] DCHPD: connector_id=1: dc_sink didn't change.
[    6.648146] [drm] [Detect]	[DP][ConnIdx:2] Rx Caps: 12 14 C4 01 01 00 01 00 02 02 06 00 00 00 00 ^
[    6.659828] [drm] [LKTN]	[DP][ConnIdx:2] HBR2x4 pass VS=1, PE=1^
[    6.669581] [drm] [Detect]	[DP][ConnIdx:2] PL4071UH: [Block 0] 00 FF FF FF FF FF FF 00 26 CD 09 00 01 01 01 01 00 19 01 04 B5 58 31 78 3A 0D 4D AB 4F 42 A6 26 0E 47 4A BF EF 80 E1 C0 D1 00 D1 C0 B3 00 A9 40 A9 C0 81 80 81 00 4D D0 00 A0 F0 70 3E 80 30 20 35 00 6E E5 31 00 00 1A 04 74 80 18 71 70 5A 80 58 2C 8A 00 6E E5 31 00 00 1E 00 00 00 FD 00 17 4C 0F 8A 3C 00 0A 20 20 20 20 20 20 00 00 00 FC 00 50 4C 34 30 37 31 55 48 0A 20 20 20 20 01 3B ^
[    6.669733] [drm] [Detect]	[DP][ConnIdx:2] PL4071UH: [Block 1] 02 03 2E F3 53 10 1F 05 14 04 13 03 02 12 11 07 06 16 15 20 01 DD DE DF 23 09 7F 07 83 01 00 00 6D B9 14 00 04 00 B8 6E 20 00 60 01 02 03 04 74 00 30 F2 70 5A 80 B0 58 8A 00 6E E5 31 00 00 1E 56 5E 00 A0 A0 A0 29 50 30 20 35 00 6E E5 31 00 00 1A 02 3A 80 18 71 38 2D 40 58 2C 45 00 6E E5 31 00 00 1E 01 1D 00 72 51 D0 1E 20 6E 28 55 00 6E E5 31 00 00 1E 00 00 00 00 00 00 00 00 00 D4 ^
[    6.669878] [drm] dc_link_detect: manufacturer_id = CD26, product_id = 9, serial_number = 1010101, manufacture_week = 0, manufacture_year = 25, display_name = PL4071UH, speaker_flag = 1, audio_mode_count = 1
[    6.669969] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 127, sample_size = 7
[    6.670043] [drm] link=2, dc_sink_in=ffff8807ee630400 is now Connected
[    6.670103] [drm] DCHPD: connector_id=2: Old sink=          (null) New sink=ffff8807ee630400
[    6.670694] [drm] link=3, dc_sink_in=          (null) is now Disconnected
[    6.670755] [drm] DCHPD: connector_id=3: dc_sink didn't change.
[    6.670893] [drm] link=4, dc_sink_in=          (null) is now Disconnected
[    6.670959] [drm] DCHPD: connector_id=4: dc_sink didn't change.
[    6.671027] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    6.671088] [drm] Driver supports precise vblank timestamp query.
[    6.671147] [drm] KMS initialized.
[    6.672600] [drm] ring test on 0 succeeded in 11 usecs
[    6.673286] [drm] ring test on 9 succeeded in 7 usecs
[    6.673364] [drm] ring test on 1 succeeded in 7 usecs
[    6.673430] [drm] ring test on 2 succeeded in 2 usecs
[    6.673496] [drm] ring test on 3 succeeded in 2 usecs
[    6.673561] [drm] ring test on 4 succeeded in 2 usecs
[    6.673627] [drm] ring test on 5 succeeded in 2 usecs
[    6.673693] [drm] ring test on 6 succeeded in 2 usecs
[    6.673759] [drm] ring test on 7 succeeded in 2 usecs
[    6.673825] [drm] ring test on 8 succeeded in 2 usecs
[    6.673959] [drm] ring test on 10 succeeded in 5 usecs
[    6.674025] [drm] ring test on 11 succeeded in 5 usecs
[    7.690209] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    8.700434] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    9.710656] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   10.720882] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   11.731101] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   12.741335] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   13.751560] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   14.761781] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   15.772007] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   16.782237] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   16.802139] [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, giving up!!!
[   17.093900] [drm:uvd_v6_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed (0xCAFEDEAD)
[   17.093993] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <uvd_v6_0> failed -22
[   17.094062] amdgpu 0000:29:00.0: amdgpu_init failed
...

1.3 opencl support

Hey,

I read that 1.3 kernel/boltzman/hcc/etc should support OpenCL, is this true?

AMD R9 M280X FOR ASUS N551ZU

Hardware is wrong.

screenshot_2018-02-11_14-26-30


Asus website
https://www.asus.com/Laptops/N551ZU/

lspci -mm

00:00.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Root Complex" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Root Complex" 00:00.2 "IOMMU" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) I/O Memory Management Unit" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) I/O Memory Management Unit" 00:01.0 "VGA compatible controller" "Advanced Micro Devices, Inc. [AMD/ATI]" "Kaveri [Radeon R7 Graphics]" "ASUSTeK Computer Inc." "Kaveri [Radeon R7 Graphics]" 00:01.1 "Audio device" "Advanced Micro Devices, Inc. [AMD/ATI]" "Kaveri HDMI/DP Audio Controller" "ASUSTeK Computer Inc." "Kaveri HDMI/DP Audio Controller" 00:02.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 1424" "" "" 00:02.1 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 1425" "" "" 00:03.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 1424" "" "" 00:03.1 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Root Port" "" "" 00:03.2 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Root Port" "" "" 00:04.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Device 1424" "" "" 00:10.0 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB XHCI Controller" -r09 -p30 "ASUSTeK Computer Inc." "FCH USB XHCI Controller" 00:10.1 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB XHCI Controller" -r09 -p30 "ASUSTeK Computer Inc." "FCH USB XHCI Controller" 00:11.0 "SATA controller" "Advanced Micro Devices, Inc. [AMD]" "FCH SATA Controller [IDE mode]" -r40 -p01 "ASUSTeK Computer Inc." "FCH SATA Controller [IDE mode]" 00:12.0 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB OHCI Controller" -r11 -p10 "ASUSTeK Computer Inc." "FCH USB OHCI Controller" 00:12.2 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB EHCI Controller" -r11 -p20 "ASUSTeK Computer Inc." "FCH USB EHCI Controller" 00:13.0 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB OHCI Controller" -r11 -p10 "ASUSTeK Computer Inc." "FCH USB OHCI Controller" 00:13.2 "USB controller" "Advanced Micro Devices, Inc. [AMD]" "FCH USB EHCI Controller" -r11 -p20 "ASUSTeK Computer Inc." "FCH USB EHCI Controller" 00:14.0 "SMBus" "Advanced Micro Devices, Inc. [AMD]" "FCH SMBus Controller" -r16 "ASUSTeK Computer Inc." "FCH SMBus Controller" 00:14.2 "Audio device" "Advanced Micro Devices, Inc. [AMD]" "FCH Azalia Controller" -r01 "ASUSTeK Computer Inc." "FCH Azalia Controller" 00:14.3 "ISA bridge" "Advanced Micro Devices, Inc. [AMD]" "FCH LPC Bridge" -r11 "ASUSTeK Computer Inc." "FCH LPC Bridge" 00:14.4 "PCI bridge" "Advanced Micro Devices, Inc. [AMD]" "FCH PCI Bridge" -r40 -p01 "" "" 00:18.0 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 0" "" "" 00:18.1 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 1" "" "" 00:18.2 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 2" "" "" 00:18.3 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 3" "" "" 00:18.4 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 4" "" "" 00:18.5 "Host bridge" "Advanced Micro Devices, Inc. [AMD]" "Family 15h (Models 30h-3fh) Processor Function 5" "" "" 01:00.0 "Display controller" "Advanced Micro Devices, Inc. [AMD/ATI]" "Bonaire PRO [Radeon R9 M270X]" -rff -pff "" "" 02:00.0 "Network controller" "Intel Corporation" "Wireless 7260" -rbb "Intel Corporation" "Dual Band Wireless-N 7260" 03:00.0 "Unassigned class [ff00]" "Realtek Semiconductor Co., Ltd." "RTL8411B PCI Express Card Reader" -r01 "ASUSTeK Computer Inc." "RTL8411B PCI Express Card Reader" 03:00.1 "Ethernet controller" "Realtek Semiconductor Co., Ltd." "RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller" -r12 "ASUSTeK Computer Inc." "RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller"


There is no problem in Windows

Distribute me Xubuntu

amdkfd needs CONFIG_UNUSED_SYMBOLS=y

Without CONFIG_UNUSED_SYMBOLS=y, amdgpu cannot find amdkfd (with current mainline HEAD at least).
Specifically, symbol_request(kgd2kfd_init) in amdgpu_amdkfd_init returns NULL because it cannot find the symbol. I guess the symbol has to be added somewhere so that it does not get deleted when CONFIG_UNUSED_SYMBOLS=n.

OpenCL installation issue on VM

Hello I'm quite new in the field, I want to install drivers for an Amethyst XT [Radeon R9 M295X Mac Edition / R9 380X] GC with pci passthrough on a ubuntu 16.04 VM launched from an Openstack host which runs ubuntu 16.04. The final goal is to be able to use OpenCL on the VMs (for Caffe or Veles Deep Learning). After the system changes for passthrough, GC is visible on the guest OS (before any driver installation):

$ lshw
   *-display:1 UNCLAIMED
         description: VGA compatible controller
         product: Amethyst XT [Radeon R9 M295X Mac Edition / R9 380X]
         vendor: Advanced Micro Devices, Inc. [AMD/ATI]
         physical id: 5
         bus info: pci@0000:00:05.0
         version: f1
         width: 64 bits
         clock: 33MHz
         capabilities: pm pciexpress msi vga_controller cap_list
         configuration: latency=0
         resources: memory:e0000000-efffffff memory:f2000000-f21fffff ioport:c000(size=256) memory:feb80000-febbffff memory:febc0000-febdffff

Then I tried several options for the driver.
With ROCm, I finally had the same issues as mentioned in issue 307 using rocm-smi, clinfo, helloworld.

With amdgpu-pro, clinfo didn't even produce an output, hanged and even survive the kill -s SIGKILL. Here is the result of dmesg after installing the driver and rebooting
dmesg.txt

So my questions are:

  • Having the KVM hypervisor, is there a solution a bit more production-ready for an Openstack deployment than editing manually the pci config after the launch of the guest OS with ROCm as explained in the issue 307.
  • Is there maybe something I did wrong in my attempts to get this dmesg output and/or is the problem I had with amdgpu-pro related to the one i encountered with ROCm.

Compile on Debian 9 generate error

I tried to compile and install driver with DKMS on Debian 9 kernel 4.9.0, but I receive this error:

DKMS make.log for amdgpu-1.8-192 for kernel 4.9.0-7-amd64 (x86_64)
Thu Aug  9 22:54:27 CEST 2018
make: Entering directory '/usr/src/linux-headers-4.9.0-7-amd64'
  LD      /var/lib/dkms/amdgpu/1.8-192/build/built-in.o
  LD      /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkcl/built-in.o    
  LD      /var/lib/dkms/amdgpu/1.8-192/build/amd/lib/built-in.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/amd/lib/chash.o
  LD      /var/lib/dkms/amdgpu/1.8-192/build/scheduler/built-in.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkcl/kcl_drm.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkcl/main.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/scheduler/gpu_scheduler.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkcl/symbols.o
  LD      /var/lib/dkms/amdgpu/1.8-192/build/ttm/built-in.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/ttm/ttm_memory.o
  CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/scheduler/sched_fence.o
  LD      /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkfd/built-in.o
 CC [M]  /var/lib/dkms/amdgpu/1.8-192/build/amd/amdkfd/kfd_module.o
In file included from /var/lib/dkms/amdgpu/1.8-192/build/scheduler/backport/backport.h:5:0,
             from <command-line>:0:
/var/lib/dkms/amdgpu/1.8-192/build/include/kcl/kcl_fence.h: In function ‘kcl_dma_fence_set_error’:
/var/lib/dkms/amdgpu/1.8-192/build/include/kcl/kcl_fence.h:163:7: error: ‘struct fence’ has no member named ‘status’
  fence->status = error;
        ^~
In file included from /var/lib/dkms/amdgpu/1.8-192/build/ttm/backport/backport.h:5:0,
             from <command-line>:0:
/var/lib/dkms/amdgpu/1.8-192/build/include/kcl/kcl_fence.h: In function ‘kcl_dma_fence_set_error’:
/var/lib/dkms/amdgpu/1.8-192/build/include/kcl/kcl_fence.h:163:7: error: ‘struct fence’ has no member named ‘status’
  fence->status = error;
   ^~
  LD      /var/lib/dkms/amdgpu/1.8-192/build/amd/amdgpu/built-in.o
In file included from /var/lib/dkms/amdgpu/1.8-192/build/scheduler/backport/backport.h:5:0,
             from <command-line>:0:

Any suggestions for resolve the issue?

Kernel won't install

So this is most likely a very beginner-ish issue but i can't get the rocm kernel to install on my system.
i tried to follow the procedure on ubuntu 16.04 while on kernel 4.4, 4.11 and 4.14 but i was never able to get the rocm kernel installed, am i missing something obvious?

https://github.com/RadeonOpenCompute/ROCm/wiki

i followed all the steps on that site, but the rocm kernel wont show up if i do dpkg --list | grep linux-image
and if i change the default line in grub to what is described in the wiki it will keep booting the default kernel.

Polaris GPU support: Radeon RX480

Hi,
wanting to buy AMD Polaris RX480 card coming this month to developer with Radeon Open COmpute framework but as of now is Fiji dGPU only..
as Polaris should be more advanced than Fiji can we exepct even "unofficial" support for this GPU i.e. not officially supported but working anyway?
thanks..

kernel fails to compile with gcc 6.3

  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.o
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:408:12: error: "amdgpu_amdkfd_validate" defined but not used [-Werror=unused-function]
 static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo)
            ^~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Removing this function leads to the next failure:

drivers/gpu/drm/amd/amdgpu/../dal/dc/i2caux/dce80/i2c_sw_engine_dce80.c:53:23: error: "ddc_hw_status_addr" defined but not used [-Werror=unused-const-variable=]
 static const uint32_t ddc_hw_status_addr[] = {
                       ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

and

drivers/gpu/drm/amd/amdgpu/../dal/dc/core/dc_link_ddc.c:44:21: error: "dvi_hdmi_dongle_signature_str" defined but not used [-Werror=unused-const-variable=]
 static const int8_t dvi_hdmi_dongle_signature_str[] = "6140063500G";
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now things get exciting:

drivers/gpu/drm/amd/amdgpu/../dal/modules/freesync/freesync.c: In function "set_freesync_on_streams":
drivers/gpu/drm/amd/amdgpu/../dal/modules/freesync/freesync.c:503:5: error: this "if" clause does not guard... [-Werror=misleading-indentation]
     if (v_total_nominal >=
     ^~
drivers/gpu/drm/amd/amdgpu/../dal/modules/freesync/freesync.c:513:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the "if"
      core_freesync->dc->stream_funcs.
      ^~~~~~~~~~~~~

lazily patching that gives us:

drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/tonga_hwmgr.c:102:22: error: "PP_ClockStretchAmountConversion" defined but not used [-Werror=unused-const-variable=]
 static const uint8_t PP_ClockStretchAmountConversion[2][6] = {
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/tonga_hwmgr.c:97:23: error: "PP_ClockStretcherDDTTable" defined but not used [-Werror=unused-const-variable=]
 static const uint32_t PP_ClockStretcherDDTTable[2][4][4] = {
                       ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/tonga_hwmgr.c:92:23: error: "PP_ClockStretcherLookupTable" defined but not used [-Werror=unused-const-variable=]
 static const uint16_t PP_ClockStretcherLookupTable[2][4] = {
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/polaris10_hwmgr.c:111:22: error: "polaris10_clock_stretch_amount_conversion" defined but not used [-Werror=unused-const-variable=]
 static const uint8_t polaris10_clock_stretch_amount_conversion[2][6] =
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/polaris10_hwmgr.c:106:23: error: "polaris10_clock_stretcher_ddt_table" defined but not used [-Werror=unused-const-variable=]
 static const uint32_t polaris10_clock_stretcher_ddt_table[2][4][4] =
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/polaris10_hwmgr.c:102:23: error: "polaris10_clock_stretcher_lookup_table" defined but not used [-Werror=unused-const-variable=]
 static const uint16_t polaris10_clock_stretcher_lookup_table[2][4] =
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Crossfire via DisplayPort

I am interested is multi-gpu crossfire via external interconnection Displayport could be still used, especially for Vega?
Looking to build fully virtualized gpu cluster using SR-IOV, where cpu used only for bootstrap.

OpenGL test errors in glmark2, glxinfo, glxgears

Hi,

I booted my Skylake target with ROCM-1.5(generated linux image and headers from 'rocm-1.5.x' branch) and its booted successfully.
Executed OpenCL tests like clinfo and luxmark-v3.1 successfully.

But When I run "OpenGL" tests I am encounterd below erros.
{{{

glmark2

Segmentation fault

LD_PRELOAD=/lib/x86_64-linux-gnu/libpthread.so.0 glmark2

Error: GLX version >= 1.3 is required
Error: Error: Couldn't get GL visual config!
Error: main: Could not initialize canvas

glxinfo

name of display: :0.0
Error: couldn't find RGB GLX visual or fbconfig

glxgears

Error: couldn't get an RGB, Double-buffered visual
}}}

I installed all the dependencies. Still I am facing above erros.
Could you please help us how to resolve the above errors.

flip_done timed out

After about 10 minutes I reliably get X freezes with 4.9.0-kfd, after which I can switch once to VT but not back without hanging the system. I did not observe this with stock kernels, including 4.9.0. TearFree is activated in xorg.conf. dmesg output is

[  491.868816] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:34:crtc-0] flip_done timed out
[  492.380778] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:34:crtc-0] flip_done timed out
[  492.380816] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* dm_dc_surface_commit: acrtc 0, already busy
[  492.383795] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* dm_dc_surface_commit: acrtc 0, already busy
[  492.389201] ------------[ cut here ]------------
[  492.389239] WARNING: CPU: 15 PID: 0 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:193 dm_pflip_high_irq+0xb4/0x1b0 [amdgpu]
[  492.389240] Modules linked in: nls_iso8859_1 xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc kvm_amd kvm amdkfd irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amd_iommu_v2 aesni_intel amdgpu aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_generic snd_hda_codec_hdmi ttm snd_hda_intel snd_hda_codec drm_kms_helper snd_hda_core snd_hwdep drm snd_seq snd_seq_device snd_pcm eeepc_wmi asus_wmi
[  492.389264]  joydev sparse_keymap input_leds snd_timer fb_sys_fops snd syscopyarea mxm_wmi sysfillrect sysimgblt sp5100_tco soundcore i2c_piix4 i2c_designware_platform gpio_amdpt gpio_generic wmi i2c_designware_core sch_fq_codel ip_tables x_tables hid_generic hid_cherry usbkbd usbmouse igb ptp uas usb_storage usbhid pps_core serio_raw hid i2c_algo_bit dca autofs4
[  492.389279] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.9.0-kfd #3
[  492.389280] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 9945 05/19/2017
[  492.389281]  ffff9cba7e9c3d50 ffffffff92399592 0000000000000000 0000000000000000
[  492.389284]  ffff9cba7e9c3d90 ffffffff9206736b 000000c1706a9fa0 ffff9cba7960d000
[  492.389285]  ffff9cba60990000 0000000000000086 000000000000001a ffff9cba70f91d00
[  492.389287] Call Trace:
[  492.389289]  <IRQ> 
[  492.389292]  [<ffffffff92399592>] dump_stack+0x63/0x81
[  492.389294]  [<ffffffff9206736b>] __warn+0xcb/0xf0
[  492.389296]  [<ffffffff9206745d>] warn_slowpath_null+0x1d/0x20
[  492.389326]  [<ffffffffc07627c4>] dm_pflip_high_irq+0xb4/0x1b0 [amdgpu]
[  492.389355]  [<ffffffffc0763c8b>] amdgpu_dm_irq_handler+0x7b/0x110 [amdgpu]
[  492.389379]  [<ffffffffc06dc93c>] amdgpu_irq_dispatch+0x7c/0x1a0 [amdgpu]
[  492.389403]  [<ffffffffc06dd186>] amdgpu_ih_process+0xd6/0x140 [amdgpu]
[  492.389405]  [<ffffffff923a27e9>] ? timerqueue_add+0x59/0xb0
[  492.389428]  [<ffffffffc06dc596>] amdgpu_irq_handler+0x16/0x30 [amdgpu]
[  492.389430]  [<ffffffff920c2541>] __handle_irq_event_percpu+0x81/0x1a0
[  492.389432]  [<ffffffff920c2683>] handle_irq_event_percpu+0x23/0x60
[  492.389433]  [<ffffffff920c26fe>] handle_irq_event+0x3e/0x60
[  492.389435]  [<ffffffff920c5c6d>] handle_edge_irq+0x7d/0x150
[  492.389437]  [<ffffffff9201f01a>] handle_irq+0x1a/0x30
[  492.389439]  [<ffffffff9284992b>] do_IRQ+0x4b/0xd0
[  492.389442]  [<ffffffff92847bc2>] common_interrupt+0x82/0x82
[  492.389442]  <EOI> 
[  492.389444]  [<ffffffff92846b86>] ? native_safe_halt+0x6/0x10
[  492.389446]  [<ffffffff920d75cd>] ? enqueue_hrtimer+0x3d/0x80
[  492.389449]  [<ffffffff92458428>] arch_safe_halt+0x9/0xd
[  492.389450]  [<ffffffff92846cfe>] acpi_safe_halt+0x1e/0x27
[  492.389452]  [<ffffffff92846d27>] acpi_idle_do_entry+0x20/0x39
[  492.389454]  [<ffffffff92459567>] acpi_idle_enter+0x1c4/0x1ec
[  492.389456]  [<ffffffff92669212>] cpuidle_enter_state+0xf2/0x2c0
[  492.389458]  [<ffffffff92669417>] cpuidle_enter+0x17/0x20
[  492.389460]  [<ffffffff920aa7f3>] call_cpuidle+0x23/0x40
[  492.389462]  [<ffffffff920aaa5c>] cpu_startup_entry+0x14c/0x230
[  492.389464]  [<ffffffff9203bc12>] start_secondary+0x142/0x170
[  492.389465] ---[ end trace 5749764e73ca4208 ]---

Supported hardware update...

Hi,

Do you have additional information about supported hardware ?
Are there specific features required (in the motherboard or cpu) ?
Ideally I would like a system with an Intel Skylake CPU and a dGPU (Nano) - would you have any recommendation to make about supported motherboard ?

The Z97-PRO motherboard said to work is not compatible with newer CPUs.

Thanks for your help

1.8-192 module works fine with 4.15.0-29 but fails with 4.15.0-33

The rock-dkms ubuntu package works fine with kernel 4.15.0-29 but not with the latest one (-33). I believe the cause is a change in vga_swithceroo.h that generates an error in amdgpu_atpx_handler.c:

/var/lib/dkms/amdgpu/1.8-192/build/amd/amdgpu/amdgpu_atpx_handler.c:577:19: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .get_client_id = amdgpu_atpx_get_client_id,
                   ^

I have no problem using the previous kernel version on my system but thought this might help. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.