gpuopen-drivers / pal Goto Github PK

Platform Abstraction Library

License: MIT License

CMake 0.55% C++ 93.36% C 5.74% Python 0.30% Objective-C++ 0.01% Kaitai Struct 0.01% C# 0.02% Jinja 0.02%

pal's Introduction

Platform Abstraction Library (PAL)

The Platform Abstraction Library (PAL) provides hardware and OS abstractions for Radeon™ (GCN+) user-mode 3D graphics drivers. The level of abstraction is chosen to support performant driver implementations of several APIs while hiding the client from hardware and operating system details.

PAL client drivers will have no HW-specific code; their responsibility is to translate API/DDI commands into PAL commands as efficiently as possible. This means that the client should be unaware of hardware registers, PM4 commands, etc. However, PAL is an abstraction of AMD hardware only, so many things in the PAL interface have an obvious correlation to hardware features. PAL does not provide a shader compiler: clients are expected to use an external compiler library that targets PAL's Pipeline ABI to produce compatible shader binaries.

PAL client drivers should have little OS-specific code. PAL and its companion utility collection provide OS abstractions for almost everything a client might need, but there are some cases where this is unavoidable:

Handling dynamic library infrastructure. I.e., the client has to implement DllMain() on Windows, etc.
OS-specific APIs or extensions. DX may have Windows-specific functionality in the core API, and Vulkan may export certain OS-specific features as extensions (like for presenting contents to the screen).
Single OS clients (e.g., DX) may choose to make OS-specific calls directly simply out of convenience with no down side.

PAL is a source deliverable. Clients will periodically promote PAL's source into their own tree and build a static pal.lib as part of their build process.

The following diagram illustrates the typical software stack when running a 3D application with a PAL-based UMD:

PAL is a relatively thick abstraction layer, typically accounting for the majority of code in any particular UMD built on PAL, excluding the shader compiler backend. The level of abstraction tends to be higher in areas where client APIs are similar, and lower (closer to hardware) in areas where client APIs diverge significantly. The overall philosophy is to share as much code as possible without impacting client driver performance.

PAL uses a C++ interface. The public interface is defined in .../pal/inc, and clients must only include headers from that directory. The interface is spread over many header files - typically one per class - in order to clarify dependencies and reduce build times. There are three sub-directories in .../pal/inc:

inc/core - Defines the PAL Core
inc/util - Defines the PAL Utility Collection
inc/gpuUtil - Defines the PAL GPU Utility Collection

PAL Core

PAL's core interface is defined in the Pal namespace. It defines an object-oriented model for interacting with the GPU and OS. The interface closely resembles the Mantle, Vulkan, and DX12 APIs. Some common features of these APIs that are central to the PAL interface:

All shader stages, and some additional "shader adjacent" state, are glommed together into a monolithic pipeline object.
Explicit, free-threaded command buffer generation.
Support for multiple, asynchronous engines for executing GPU work (graphics, compute, DMA).
Explicit system and GPU memory management.
Flexible shader resource binding model.
Explicit management of stalls, cache flushes, and compression state changes.

However, as a common component supporting multiple APIs, the PAL interface tends to be lower level in places where client APIs diverge.

System Memory Allocation

Clients have a lot of control over PAL's system memory allocations. Most PAL objects require the client to provide system memory; the client first calls a GetSize() method and then passes a pointer to PAL on the actual create call. Further, when PAL needs to make an internal allocation, it will optionally call a client callback, which can be specified on platform creation. This callback will specify a category for the allocation, which may imply an expected lifetime.

Interface Classes

The following diagram illustrates the relationship of some key PAL interfaces and how they interact to render a typical frame in a modern game. Below that is a listing of most of PAL's interface classes, and a very brief description of their purpose.

OS Abstractions
- IPlatform: Root-level object created by clients that interact with PAL. Mostly responsible for enumerating devices and screens attached to the system and returning any system-wide properties.
- IDevice: Configurable context for querying properties of a particular GPU and interacting with it. Acts as a factory for almost all other PAL objects.
- IQueue: A device has one or more engines which are able to issue certain types of work. Tahiti, for example, has 1 universal engine (supports graphics, compute, or copy commands), 2 compute engines (support compute or copy commands), and 2 DMA engines (support only copy commands). An IQueue object is a context for submitting work on a particular engine. This mainly takes the form of submitting command buffers and presenting images to the screen. Work performed in a queue will be started in order, but work executed on different queues (even if the queues reference the same engine) is not guaranteed to be ordered without explicit synchronization.
- IQueueSemaphore: Queue semaphores can be signaled and waited on from an IQueue in order to control execution order between queues.
- IFence: Used for coarse-grain CPU/GPU synchronization. Fences can be signaled from the GPU as part of a command buffer submission on a queue, then waited on from the CPU.
- IGpuMemory: Represents a GPU-accessible memory allocation. Can either be virtual (only VA allocation which must be explicitly mapped via an IQueue operation) or physical. Residency of physical allocations must be managed by the client either globally for a device (IDevice::AddGpuMemoryReferences) or by specifying allocations referenced by command buffers at submit.
- ICmdAllocator: GPU memory allocation pool used for backing an ICmdBuffer. The client is free to create one allocator per device, or one per thread to remove thread contention.
- IScreen: Represents a display attached to the system. Mostly used for managing full-screen flip presents.
- IPrivateScreen: Represents a display that is not otherwise visible to the OS, typically a VR head mounted display.
Hardware IP Abstractions
- All IP
  - ICmdBuffer: Clients build command buffers to execute the desired work on the GPU, and submit them on a corresponding queue. Different types of work can be executed depending on the queueType of the command buffer (graphics work, compute work, DMA work).
  - IImage: Images are a 1D, 2D, or 3D collection of pixels (i.e., texture) that can be accessed by the GPU in various ways: texture sampling, BLT source/destination, UAV, etc.
- GFXIP-only
  - IPipeline: Comprised of all shader stages (CS for compute, VS/HS/DS/GS/PS for graphics), resource mappings describing how user data entries are to be used by the shaders, and some other fixed-function state like depth/color formats, blend enable, MSAA enable, etc.
  - IColorTargetView: IImage view allowing the image to be bound as a color target (i.e., RTV.).
  - IDepthStencilView: IImage view allowing the image to be bound as a depth/stencil target (i.e., DSV).
  - IGpuEvent: Used for fine-grained (intra-command buffer) synchronization between the CPU and GPU. GPU events can be set/reset from either the CPU or GPU and waited on from either.
  - IQueryPool: Collection of query slots for tracking occlusion or pipeline stats query results.
  - Dynamic State Objects: IColorBlendState, IDepthStencilState, and IMsaaState define logical collections of related fixed function graphics state, similar to DX11.
  - IPerfExperiment: Used for gathering performance counter and thread trace data.
  - IBorderColorPalette: Provides a collection of indexable colors for use by samplers that clamp to an arbitrary border color.
Common Base Classes
- IDestroyable: Defines a Destroy() method for the PAL interface. Calling Destroy() will release any internally allocated resources for the object, but the client is still responsible for freeing the system memory provided for the object.
- IGpuMemoryBindable: Defines a set of methods for binding GPU memory to the object. Interfaces that inherit IGpuMemoryBindable require GPU memory in order to be used by the GPU. The client must query the requirements (e.g., alignment, size, heaps) and allocate/bind GPU memory for the object. IGpuMemoryBindable inherits from IDestroyable.

Format Info

Several helper methods are available for dealing with image formats in the Formats namespace.

Utility Collection

In addition to its GPU-specific core functionality, PAL provides a lot of generic, OS-abstracted software utilities in the Util namespace. The PAL core relies on these utilities, but they are also available for use by its clients. The features available in Util include memory management, debug prints and asserts, generic containers, multithreading and synchronization primitives, file system access, and cryptographic algorithm implementations.

GPU Utility Collection

In addition to the generic, OS-abstracted software utilities, PAL provides GPU-specific utilities in the GpuUtil namespace. These utilities provide common, useful functionality that build on top of the core Pal interfaces. Some examples include an interface for writing text with the GPU, an MLAA implementation, and a wrapper on top of Pal::IPerfExperiment to simplify performance data gathering.

Third Party Software

PAL contains code written by third parties:

libuuid, see license in files under src/util/imported/libuuid
md5, see license in src/util/imported/md5.hpp
TinySHA1, see license in src/util/imported/TinySHA1.hpp
gtest, see license in shared/devdriver/shared/legacy/third_party/gtest/LICENSE
lz4, see license in shared/devdriver/shared/legacy/third_party/lz4/LICENSE and src/util/imported/pal_lz4/LICENSE
whereami, see license in shared/devdriver/shared/legacy/third_party/whereami/LICENSE.MIT
cwalk is distributed under the terms of MIT License, see shared/devdriver/shared/legacy/third_party/cwalk/LICENSE.md
flatbuffers is distributed under the terms of Apache License 2.0, see shared/devdriver/shared/legacy/third_party/flatbuffers/LICENSE.txt
mpack is distributed under the terms of MIT License, see shared/devdriver/shared/legacy/third_party/mpack/LICENSE
rapidjson is distributed under the terms of MIT License, see shared/devdriver/shared/legacy/third_party/rapidjson/license.txt
tiny_printf is distributed under the terms of MIT License, see shared/devdriver/shared/legacy/third_party/tiny_printf/LICENSE
dds is distributed under the terms of MIT License, see src/util/imported/dds/dds-license
hsa.h and amd_hsa_*.h are distributed under the terms of the University of Illinois/NCSA Open Source License, see the license embedded into src/core/imported/hsa/hsa.h
AMDHSAKernelDescriptor.h is distributed under the terms of Apache License 2.0 with LLVM Exceptions, see https://llvm.org/LICENSE.txt

pal's People

Stargazers

Watchers

Forkers

shadiramadan elongbug petarkirov samana felixbellaby kimxiao1119 pollend gowtham614 crystaljinamd jfactory07 ardacoskunses hustwarhd dnovillo bnieuwenhuizen varunk08 googlestadia flakebi liangyue1981816 jafffy arbalasquide shchchowamd dpsm cainguo flonier zakhrov jayfoad kai-amd gflegar itrainl4 kleinerm romangrechin inequation wensishuai c00lrain npcdoom xdevs23 matthesseling kuhar mgsegal rrocm cjatin mitzhang louishp bjoeris jaxlinamd konstantinegorov jakemerdichamd tomsunchen999 stanleyjacob blueroka patelhardik dooka75 dadschoorse samikhawaja luckfxy shichangsheng nputikhin v01dxyz zhanghao00925 ivan-kits yxsamurai neonkore fengye0316 thisewan r-value light1707 mrjiang001 rockamd fabiao sumlion fairywreath wangzhongguo 0x4d3342 marijns95 lightningccopen hermanyang mxisme munubaykal brugarolas jamestiotio lvcheng1229 juanmaneo leekingly llbb123 elongreco caoyzh

pal's Issues

Can Mantle be open sourced?

Can Mantle to PAL layer be open sourced?

Thanks!

Allowing msvc compilation

pal won't allow msvc as compiler but dependency cwpack requires msvc, thus preventing compilation.

pal assumes the processor is an x86

Parts of pal use x86 specific headers and instructions (e.g. CpuId stuff in palSysUtil.h, sysUtil.cpp, lnxSysUtil.cpp).
AMD GPUs are used in other machines (aarch64, RISC-V) as well; works fine with Mesa drivers. AMDVLK currently can't be compiled for those machines.

Unbounded memory usage increase with implicit command buffer reset

Sometimes when reusing command buffers with implicit command buffer resets through begin calls the memory usage keeps increasing. After running an application for hours this causes crashes due to out of memory issues. Adding either type of command buffer reset prevents the issue from happening.

The issue can be resolved by expanding the implicit reset code path in BeginCommandStream of GfxCmdBuffer to also reset the internal structures as seen in the Reset() function.

Errors in lnxUuid.cpp fail the build of amdvlk-2022.Q3.2

Tried to make a package for openSUSE and got this for both gcc-12 and clang-14:

[ 2332s] /home/abuild/rpmbuild/BUILD/amdvlk-2022.Q3.2/pal/src/util/lnx/lnxUuid.cpp:68:8: error: variable has incomplete type 'tm'
[ 2332s]     tm FixedPoint = {0, 0, 0, 1, 2, 2021};
[ 2332s]        ^
[ 2332s] /usr/include/wchar.h:83:8: note: forward declaration of 'tm'
[ 2332s] struct tm;
[ 2332s]        ^
[ 2332s] /home/abuild/rpmbuild/BUILD/amdvlk-2022.Q3.2/pal/src/util/lnx/lnxUuid.cpp:80:47: error: use of undeclared identifier 'CLOCK_MONOTONIC'
[ 2332s]     bool clockGettimeSuccess = (clock_gettime(CLOCK_MONOTONIC, &time) == 0);
[ 2332s]                                               ^
[ 2332s] /home/abuild/rpmbuild/BUILD/amdvlk-2022.Q3.2/pal/src/util/lnx/lnxUuid.cpp:90:47: error: use of undeclared identifier 'CLOCK_REALTIME'
[ 2332s]     bool clockGettimeSuccess = (clock_gettime(CLOCK_REALTIME, &time) == 0);

vkCreate*Pipeline slows down for larger pipelines

vkCreate*Pipeline slows down when the size of the pipeline is bigger than 128KB. Pipelines large than 128KB cause a separate allocation of default pool size of 256KB. Also for each pipeline creation the pool gpumemory is mapped and unmapped.

This can be resolved by increasing the default size of the pool and keeping the memory mapped after usage. The mapped memory can be reused by the new vkCreate*Pipeline.

Usage of interprocess BOs causes synchronization of GPU execution

Usage of interprocess BOs causes synchronization of GPU execution even if the usage happens on independent queues.

Currently all buffers created that don't have the "VM always valid" flag are added as memory references with all submissions. This causes the KMD to do implicit synchronization.

In order to allow efficient processing of data from multiple processes explicit synchronization in the applications must be used.

Windows compilation fails with unicode support

Compiling pal on Windows with unicode support enabled fails due to some Windows APIs used by gpuopen passing a regular string (LPCSTR) to functions that expect wide strings (LPCWSTR) when compiled with unicode support.

wrong register used for SPI_SHADER_REQ_CTRL on gfx10

Just noticed this on navi, but for SHADER_REQ_CNTL for vertex shaders it's using the wrong thing

I'm sure you've fixed it internally already but for the lols.

diff --git a/src/core/hw/gfxip/gfx9/gfx9PipelineChunkVsPs.cpp b/src/core/hw/gfxip/gfx9/gfx9PipelineChunkVsPs.cpp
index cab8947..6837818 100644
--- a/src/core/hw/gfxip/gfx9/gfx9PipelineChunkVsPs.cpp
+++ b/src/core/hw/gfxip/gfx9/gfx9PipelineChunkVsPs.cpp
@@ -440,7 +440,7 @@ void PipelineChunkVsPs::LateInit(

         if (IsGfx10(chipProps.gfxLevel))
         {

           pUploader->AddShReg(Gfx10::mmSPI_SHADER_REQ_CTRL_PS, m_commands.sh.vs.shaderReqCtrlVs);

           pUploader->AddShReg(Gfx10::mmSPI_SHADER_REQ_CTRL_VS, m_commands.sh.vs.shaderReqCtrlVs);
       }
   } // if enableNgg == false

Use of select() may exhaust available file descriptors

If an application uses the maximum number of file descriptors, using one more for select() will fail. Use poll() instead.

Driver failed to create swapchain if modesetting driver is being used.

Any XCB/XLIB Vulkan application would fail to create swapchain if X is launched with mode setting driver but not amdgpu's ddx driver.

GCC 7.2 constexpr compilation issues

Hi,

I had some compilation issues while trying to compile this project on Fedora 27 :

/mnt/workspace/vulkandriver/drivers/pal/src/util/math.cpp:73:1: error: 'const Util::Math::NBitFloatInfo{16, 10, 5, 15, 32768, 1023, 31744, 15, 15, -14, 1199562752, 947912704, ((Util::uint32)((((1 << (5 - 1)) - 1) - 127) << 23)), 13}' is not a constant expression
};
^
/mnt/workspace/vulkandriver/drivers/pal/src/util/math.cpp:91:1: error: 'const Util::Math::NBitFloatInfo{11, 6, 5, 0, 0, 63, 1984, 15, 15, -14, 1191698432, 947912704, ((Util::uint32)((((1 << (5 - 1)) - 1) - 127) << 23)), 17}' is not a constant expression
};
^
/mnt/workspace/vulkandriver/drivers/pal/src/util/math.cpp:109:1: error: 'const Util::Math::NBitFloatInfo{10, 5, 5, 0, 0, 31, 992, 15, 15, -14, 1191690240, 947912704, ((Util::uint32)((((1 << (5 - 1)) - 1) - 127) << 23)), 18}' is not a constant expression
};

I compiled using the instructions provided in the main repository (https://github.com/GPUOpen-Drivers/AMDVLK).

Thanks

Compiler version :

c++ (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

CMake version :

cmake version 3.10.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Compilation error on PAL_ASSERT_MSG

On gcc 8.1.1 the current build in dev throws
AMDVLK/vulkandriver/drivers/pal/inc/util/palAssert.h:96:20: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]

In PAL_ASSERT_MSG()
changing if ((_expr) == false)

to either if (!(_expr)) or if ((_expr) == 0) should fix this

vkGetShaderInfoAMD fails to get disassembly on Linux

I tried using vkGetShaderInfoAMD to get a disassembly of a pipeline, but this apparently is not supported properly on AMDVLK. Is there any reason why this is supported on Windows, but not Linux?

vkDestroySwapchain deadlocks on Xorg

On VK_KHR_xcb_surface, tearing down swapchain triggers a deadlock. FWIW, there's a similar bug in Mesa.

#0  0x00007ffff747ecd6 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x00007ffff747edc8 in __new_sem_wait_slow.constprop.0 () from /usr/lib/libpthread.so.0
#2  0x00007fffe517b836 in Pal::SwapChain::WaitIdle() () from /usr/lib/amdvlk64.so
#3  0x00007fffe505f8d7 in vk::entry::vkDestroySwapchainKHR(VkDevice_T*, VkSwapchainKHR_T*, VkAllocationCallbacks const*) () from /usr/lib/amdvlk64.so
#4  0x0000555555605bfe in Vulkan::WSI::deinit_external (this=0x555556589058) at /home/maister/git/granite/vulkan/wsi/wsi.cpp:346
#5  0x0000555555605ef2 in Vulkan::WSI::~WSI (this=0x555556589058, __in_chrg=<optimized out>) at /home/maister/git/granite/vulkan/wsi/wsi.cpp:500
#6  0x00005555555caed9 in Granite::SceneViewerApplication::~SceneViewerApplication (this=0x555556589050, __in_chrg=<optimized out>)
    at /home/maister/git/granite/application/application.cpp:323
#7  0x00005555555aa8f7 in std::default_delete<Granite::Application>::operator() (this=<optimized out>, __ptr=0x555556589050)
    at /usr/include/c++/7.2.1/bits/unique_ptr.h:78
#8  std::unique_ptr<Granite::Application, std::default_delete<Granite::Application> >::~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>)
    at /usr/include/c++/7.2.1/bits/unique_ptr.h:268
#9  main (argc=<optimized out>, argv=<optimized out>) at /home/maister/git/granite/application/platforms/application_glfw.cpp:335

Communication through gpuopen can be unreliable for non-lan usage

Doing RGP captures or using other features through gpuopen protocols over UDP when connecting to cloud servers can be unreliable.

Question for UseMipInSrd

if useMipInSrd is true in rsrcProcMgr.cpp, then does selecting CopyImage2dShaderMipLevel pipeline look natural literally??

else if (useMipInSrd)
{
    // GFX10+: The types declared in the IL source are encoded into the DIM field of the instructions.
    //    DIM determines the max number of texture parameters [S,R,T,Q] to allocate.
    //    TA ignores unused parameters for a resource if the image view defines them as size 1.
    //    [S,R,T] can be generalized (3D, 2D array) for non-sampler operations like copies.
    //        [Q] TA's interpretation of Q depends on DIM. MIP unless DIM is MSAA
    //    Image Copies with a Q component need their own copy shaders.
    //    Simpler copies (non-msaa, non-mip) can all share a single 3-dimensional (2d array) copy shader.
    pipeline = RpmComputePipeline::CopyImage2d;
}
else
{
    pipeline = RpmComputePipeline::CopyImage2dShaderMipLevel;
}

Cleanup of CreateMemoryPoolAndSubAllocate does not handle OOM properly

The Vulkan CTS has a number of OOM tests that caused double-free errors in InternalMmeMgr. CreateMemoryPoolAndSubAllocate tries to access uninitialized pointers and tries to free them.

Following snippet of code gets executed when the pool creation fails, here pInternalMemory would be uninitialized on failure.

else
    {
        auto it = pOwnerList->Begin();
        bool needEraseFromOwnerList = pOwnerList->NumElements() > 0 ?
            (it.Get()->groupMemory.PalMemory(DefaultDeviceIndex) ==
             pInternalMemory->groupMemory.PalMemory(DefaultDeviceIndex)) : false;

        // Unmap any persistently mapped memory
        pInternalMemory->groupMemory.Unmap();

        // Destroy the buddy allocator
        if (pInternalMemory->pBuddyAllocator != nullptr)
        {
            PAL_DELETE(pInternalMemory->pBuddyAllocator, m_pSysMemAllocator);
        }

        // Release this pool's base allocation
        FreeBaseGpuMem(pInternalMemory);

        // Remove this memory pool from the list if we added it
        if (needEraseFromOwnerList)
        {
            pOwnerList->Erase(&it);
        }
    }

Possibly erroneous PAL_ASSERT

Hi,

When running AMDVLK driver on debug mode I get the following message:

AMD-PAL: Error: Assertion failed: pInfo->interpolatorCount >= 1 | Reason: Unknown (/some/path/drivers/pal/src/core/hw/gfxip/gfx6/gfx6PipelineChunkVsPs.cpp:124:EarlyInit)

I get this message on pipelines that are mainly used in shadowpasses. Those pipelines are not expected to have any varyings (I guess that's what you call interpolators). Maybe the assertion is incorrect.

Shadowmap visual quality issues by a reduction of depth buffer precision

In commit 2483d46 the depth buffer precision was lowered which causes a reduced visual quality in scenes with shadow maps.

This change was done before but was reverted in 02e0b6b

You can refer to the conversation on the commit to know more details and the previous issue #49

UNORM Depth buffer precision reduced to 22 or 15 bits

Precision was reduced to 22 bits for 24 bit UNORM depth buffers and to 15 for 16 bit UNORM depth buffers in 6fa4f8a (see changes in src/core/hw/gfxip/gfx9/gfx9DepthStencilView.cpp or src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp). The change was introduced to correct rounding issues when depth bias is 1.0f, but reduces image quality overall. A solution what fixes the rounding issue while maintaining precision would be ideal

How to build on Windows use vs2019?

Hello, how to compile using vs2019 on windows? I used cmake to generate the vs2019 project and reported the following error:

CMake Error at cmake/PalVersionHelper.cmake:91 (message):
PAL_CLIENT_INTERFACE_MAJOR_VERSION not set. Defaulting to -1.
Call Stack (most recent call first):
cmake/PalBuildParameters.cmake:34 (pal_bp)
CMakeLists.txt:30 (include)

how will i deal with it?
thanks

build error with PAL_BUILD_GFX9=OFF

Build fails when setting -DPAL_BUILD_GFX9=OFF with:

/home/zenyd/Downloads/amdvlk/drivers/pal/src/core/device.cpp: In static member function ‘static bool Pal::Device::DetermineGpuIpLevels(Pal::uint32, Pal::uint32, Pal::uint32, Pal::HwIpLevels*)’:
/home/zenyd/Downloads/amdvlk/drivers/pal/src/core/device.cpp:176:10: error: ‘FAMILY_AI’ was not declared in this scope
     case FAMILY_AI:
          ^~~~~~~~~
/home/zenyd/Downloads/amdvlk/drivers/pal/src/core/device.cpp:176:10: note: suggested alternative: ‘FAMILY_CI’
     case FAMILY_AI:
          ^~~~~~~~~
          FAMILY_CI
make[2]: *** [pal/src/CMakeFiles/pal.dir/build.make:180: pal/src/CMakeFiles/pal.dir/core/device.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:17635: pal/src/CMakeFiles/pal.dir/all] Error 2

I think this is missing

+++ b/src/core/device.cpp
@@ -173,7 +173,9 @@ bool Device::DetermineGpuIpLevels(
         break;
 #endif
 #if PAL_BUILD_OSS4
+#if PAL_BUILD_GFX9
     case FAMILY_AI:
+#endif
     case FAMILY_RV:
         pIpLevels->oss = Oss4::DetermineIpLevel(familyId, eRevId);
         break;

because FAMILY_AI only gets defined when PAL_BUILD_GFX9=ON (see here)

PAL code violates One Definition Rule during link phase.

When building with gcc 7.2.1, I get the following warnings at the end of the build as it's linking amdvlk64.so:

/home/clee/src/vulkandriver/drivers/pal/inc/core/palLib.h:90:12: note: type ‘NullGpuId’ itself violates the C++ One Definition Rule
enum class NullGpuId : uint32
^
/home/clee/src/vulkandriver/drivers/pal/inc/core/palLib.h:90:12: note: type ‘NullGpuId’ itself violates the C++ One Definition Rule
/home/clee/src/vulkandriver/drivers/pal/inc/core/palDevice.h:155:12: note: type ‘AsicRevision’ itself violates the C++ One Definition Rule
enum class AsicRevision : uint32
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95400:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:56816:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:34950:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:22985:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:81606:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:46883:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:10568:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:6147:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:94865:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:56258:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:11922:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:7712:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:4271:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:1696:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:10721:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:6351:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:10672:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:6289:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:10652:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:6270:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:20563:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:14367:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:20026:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:13726:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95291:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:56696:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:80869:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:46146:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:20597:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:14430:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82188:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47531:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82034:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47417:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82208:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47552:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82044:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47426:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82129:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47489:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:81994:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47381:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82024:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47408:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:82149:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:47510:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95827:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:57227:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:5319:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:3003:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95947:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:57370:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95132:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:56539:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:95687:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:57083:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:28072:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:16415:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:80110:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:45334:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx6/chip/si_ci_vi_merged_registers.h:60266:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/gfxip/gfx9/chip/gfx9_plus_merged_registers.h:33331:9: note: the incompatible type is defined here
struct {
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss2/sdma20_pkt_struct.h:735:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss4/sdma40_pkt_struct.h:917:9: note: the incompatible type is defined here
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss2/sdma20_pkt_struct.h:733:5: note: type ‘union ’ itself violates the C++ One Definition Rule
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss4/sdma40_pkt_struct.h:915:5: note: the incompatible type is defined here
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss2/sdma20_pkt_struct.h:930:9: note: type ‘struct ’ itself violates the C++ One Definition Rule
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss4/sdma40_pkt_struct.h:1100:9: note: the incompatible type is defined here
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss2/sdma20_pkt_struct.h:928:5: note: type ‘union ’ itself violates the C++ One Definition Rule
{
^
/home/clee/src/vulkandriver/drivers/pal/src/core/hw/ossip/oss4/sdma40_pkt_struct.h:1098:5: note: the incompatible type is defined here
{
^
/home/clee/src/vulkandriver/drivers/pal/inc/core/palDevice.h:569:8: note: type ‘struct DeviceProperties’ itself violates the C++ One Definition Rule
struct DeviceProperties
^

Add zwp_linux_dmabuf_v1 support to Wayland WSI

I noticed that the Wayland WSI currently uses the old wl_drm. Most compositor implementations better support the newer zwp_linux_dmabuf_v1 protocol. wl_drm might not necessarily be available if the compositor uses Vulkan for rendering.

Support for Musl Libc (`seed48_r`, `mrandr48_r`)

ADMVLK mostly runs fine on Linux systems with Musl libc instead of Glibc, there are only two small occurring in pal, 1 #25 and otherwise the use the drandr48_r function family,
I already created a small patch, but I wanted to ask beforehand whether these functions should be added to this repo or if I should let pal link against them as an external library?

UNORM Depth buffer precision reduced to 22 or 15 bits

Precision was reduced to 22 bits for 24 bit depth buffers and to 15 for 16 bit depth buffers in 6fa4f8a (see changes in src/core/hw/gfxip/gfx9/gfx9DepthStencilView.cpp or src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp). The change was introduced to correct rounding issues when depth bias is 1.0f, but reduces image quality overall. A solution what fixes the rounding issue while maintaining precision would be ideal

GCC 7.2 lnxGpuMemory.cpp:134:17 compile error: ISO C++ forbids comparison between pointer and integer

/home/constantine/AMDVLK/pal/inc/util/palAssert.h:96:20: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
     if ((_expr) == false)                                                                         \
                    ^
/home/constantine/AMDVLK/pal/inc/util/palAssert.h:104:27: note: in expansion of macro ‘PAL_ASSERT_MSG’
 #define PAL_ASSERT(_expr) PAL_ASSERT_MSG(_expr, "%s", "Unknown")
                           ^~~~~~~~~~~~~~
/home/constantine/AMDVLK/pal/src/core/os/lnx/lnxGpuMemory.cpp:134:17: note: in expansion of macro ‘PAL_ASSERT’
                 PAL_ASSERT(m_pPinnedMemory);
                 ^~~~~~~~~~
make[2]: *** [pal/src/CMakeFiles/pal.dir/core/os/lnx/lnxGpuMemory.cpp.o] Error 1
make[1]: *** [pal/src/CMakeFiles/pal.dir/all] Error 2
make: *** [all] Error 2

constantine@linux:~/AMDVLK$ gcc --version
gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

vkGetRandROutputDisplayEXT doesn't return VK_INCOMPLETE when it should

@ kleinerm report it on #37

Additionally there is a bug in the XGL component which prevents vkGetRandROutputDisplayEXT() from returning VK_INCOMPLETE as it should if it encounters this pal error.

Here:

https://github.com/GPUOpen-Drivers/xgl/blob/2c44bbc7b58efba12c501f71f5b268f83c1bdad5/icd/api/vk_physical_device.cpp#L3808

Inside the error handling clause of the if statement, instead of
result = VK_INCOMPLETE;
we have
VKResult result = VK_INCOMPLETE;

So in case of error, result is assigned inside the local scope, and then goes out of scope when leaving that branch, and return result actually returns VK_SUCCESS.

However, i actually stumbled over these bugs while i was trying to figure out why vkGetRandROutputDisplayEXT() works unreliably in the officially released amdvlk drivers.

Testing against the different released drivers on Ubuntu 19.10, depending on the driver release, either vkGetRandROutputDisplayEXT() works as expected, or fails and returns VK_SUCCESS, but also returns a NULL VkDisplayKHR handle. I'm not sure if this is because the driver is only officially supported for Ubuntu 18.04 LTS, or if something is wrong in your release process. I can't reproduce the failure with my own self-built driver checked out from your Git repos, so was unable to track down the cause.

E.g., these releases worked on Ubuntu 19.10:

amvlk PRO 1.1.129 from the amdgpu-pro driver package.
amdvlk 2019.Q4-3
And one of the 2020.Q1-x drivers also worked, can't remember which one atm.

These fail:

2019-Q4-5
2020-Q1-1

And the latest 2020-Q2-1 driver fails again.

Disable memset and change packet initialization order to optimize write combine

gfx9CmdUtil.cpp uses memset to set the pm4 packet structs to 0 and then fills individual fields of the packet. Using memset to set to zero and then initializing the struct impacts the write combine.

VK_KHR_wayland_surface wanted

Currently, the VK_KHR_xcb_surface implementation does not work on Wayland (although RADV does), triggering a segfault in vkCreateSwapchainKHR.

Like RADV, it would be great to have a wayland implementation as well, considering amdgpu-dri is being used by both.

CalcMaxLateAllocLimit() minor sgpr calculation bug

In gfx9GraphicsPipeline.cpp, under the CalcMaxLateAllocLimit() function, you have the following:

const uint32 vsNumSgpr = (numSgprs * 8);
const uint32 vsNumVgpr = (numVgprs * 4);

My understanding is that it should be this instead for gfx9:

const uint32 vsNumSgpr = (numSgprs * 16);

Actually I'm wondering if you have to do a +1 as well before multiplying. The code does check whether vsNumSgprs is > 0 or not later on, but then uses it like so:

const uint32 maxSgprVsWaves = (chipProps.gfx9.numPhysicalSgprs / vsNumSgpr) * simdPerSh;

... which, unless numPhysicalSgprs is 0-based as well, seems to suggest the +1 should be there ...

Precompiled Shader Sources

I’d like to package amdvlk (and thus pal) for a FSDG-compatible distribution with strict requirements to build everything from source (GNU Guix). However I see that

contain pre-compiled shaders. Are they available somewhere in source code form?

HDR and all display modes > 8 bpc broken with current 2020-Q3-4 "release" again!

So i was just fetching and building the current state of the dev branch for the 2020-Q3-4 release to test how well it works. The depressing result is that it does not work at all in direct display mode for any swapchain format other than 8 bpc standard VK_FORMAT_B8G8R8A8_UNORM, ie. for VK_FORMAT_A2R10G10B10_UNORM_PACK32 etc.

Ofc. it doesn't matter so much, because whoever made the release also screwed up the release process, as no release announcement or actual packages for Ubuntu or RedHat were created :/.

Running an application using one of the higher precision formats leads to a hang in vkAcquireNextImageKHR() and game over.

The reason seems to be a missing modeset that seems to be required when switching framebuffer format.

The following diff, applied to the current pal/dev branch will fix the bug. It adds a ModeSet(pImage) call back into
DisplayWindowSystem::CreatePresentableImage, which was removed for the 2020-Q3-4 driver release.

I don't know if this is the current solution, because somebody thought that ModeSet to be not needed. But it should give you an idea what's wrong.

diff --git a/src/core/os/amdgpu/display/displayWindowSystem.cpp b/src/core/os/amdgpu/display/displayWindowSystem.cpp
index 9994acd..7b15b21 100644
--- a/src/core/os/amdgpu/display/displayWindowSystem.cpp
+++ b/src/core/os/amdgpu/display/displayWindowSystem.cpp
@@ -285,6 +285,7 @@ Result DisplayWindowSystem::CreatePresentableImage(
pImage->SetPresentImageHandle(imageHandle);

         FindCrtc();

```
       ModeSet(pImage);
   }
```
}

Swizzle questions

First, is this a good place for general questions about the PAL repo code?

On Gfx10, what is PipeBankXor exactly, and how important is it? I know it alters the swizzle pattern, but I'm wondering what would happen if I always used a ADDR2_COMPUTE_PIPEBANKXOR_INPUT::surfIndex of zero for example. Would parts of the hardware be underutilized? How much of a penalty might I be looking at?

For Gfx10Lib::HwlGetPreferredSurfaceSetting in the non-pow2 case, it looks like the swizzle precedence order for BC textures is D -> S -> R, and R -> D -> S for most other surfaces. What's the rationale there? How much worse might it be to use standard swizzle for BC textures when I could have used display?

Use of select() file descriptor can exceed FD_SETSIZE

The issue we ran into with select was that the file descriptor used in PAL was >= 1024 (FD_SETSIZE) which is the limit select fd_set will handle in glibc's implementation. Using poll doesn't have the same constraint and allows an application which already has many open file descriptors to work correctly with PAL.

The presence of AMDVLK hogs DRM_NODE_PRIMARY even without the VK_KHR_display instance extension

The mere presence of the AMDVLK driver hogs DRM_NODE_PRIMARY even when VK_KHR_display is not enabled.

I spent way too long debugging why Gamescope was getting -EBUSY when trying to start with RADV.
It was because AMDVLK was installed and hogging DRM master for itself at instance creation time, when it should not be. We do not even use KHR_display.

It should only open a master node (DRM_NODE_PRIMARY) when a Vulkan device is created and the instance has the VK_KHR_display extension enabled.

https://github.com/GPUOpen-Drivers/pal/blob/dev/shared/devdriver/shared/ddGpuInfo/src/ddLinuxAmdGpuInfo.cpp#L217

Build failures with gcc

The dev checkout of AMDVLK fails to build with gcc. This was caught by the llpc/xgl CI in a newly opened PR GPUOpen-Drivers/xgl#109.

/usr/bin/g++ -DADDR_CI_BUILD -DADDR_GFX10_BUILD -DADDR_GFX9_BUILD -DADDR_NAVI12_BUILD -DADDR_NAVI14_BUILD -DADDR_NAVI21_BUILD -DADDR_NAVI22_BUILD -DADDR_RAVEN1_BUILD -DADDR_RAVEN2_BUILD -DADDR_RENOIR_BUILD -DADDR_SI_BUILD -DADDR_VEGA12_BUILD -DADDR_VEGA20_BUILD -DLITTLEENDIAN_CPU -D_DEBUG -I/vulkandriver/drivers/pal/src/core/imported/addrlib/inc -I/vulkandriver/drivers/pal/src/core/imported/addrlib/src -I/vulkandriver/drivers/pal/src/core/imported/addrlib/src/core -I/vulkandriver/drivers/pal/src/core/imported/addrlib/src/chip/r800 -I/vulkandriver/drivers/pal/src/core/imported/addrlib/src/chip/gfx9 -I/vulkandriver/drivers/pal/src/core/imported/addrlib/src/chip/gfx10 -O3  -fPIC   -UNDEBUG -flto -fuse-linker-plugin -Wno-odr -fPIC -Werror -fno-strict-aliasing -fno-exceptions -fno-rtti -fcheck-new -fno-math-errno -Wall -Wextra -Wno-unused -Wno-unused-parameter -Wno-unused-command-line-argument -Wno-ignored-qualifiers -Wno-missing-field-initializers -Wno-self-assign -Wno-implicit-fallthrough -std=c++11 -MD -MT pal/addrlib/CMakeFiles/addrlib.dir/src/gfx10/gfx10addrlib.cpp.o -MF pal/addrlib/CMakeFiles/addrlib.dir/src/gfx10/gfx10addrlib.cpp.o.d -o pal/addrlib/CMakeFiles/addrlib.dir/src/gfx10/gfx10addrlib.cpp.o -c /vulkandriver/drivers/pal/src/core/imported/addrlib/src/gfx10/gfx10addrlib.cpp
/vulkandriver/drivers/pal/src/core/imported/addrlib/src/gfx10/gfx10addrlib.cpp: In member function ‘virtual ADDR_E_RETURNCODE Addr::V2::Gfx10Lib::HwlGetPreferredSurfaceSetting(const ADDR2_GET_PREFERRED_SURF_SETTING_INPUT*, ADDR2_GET_PREFERRED_SURF_SETTING_OUTPUT*) const’:
/vulkandriver/drivers/pal/src/core/imported/addrlib/src/gfx10/gfx10addrlib.cpp:3110:80: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
 3110 |                             minSizeBlk = (minSizeBlk == AddrBlockMaxTiledType) ? AddrBlockLinear : minSizeBlk;
      |                                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/vulkandriver/drivers/pal/src/core/imported/addrlib/src/gfx10/gfx10addrlib.cpp: At top level:
cc1plus: error: unrecognized command line option ‘-Wno-self-assign’ [-Werror]
cc1plus: error: unrecognized command line option ‘-Wno-unused-command-line-argument’ [-Werror]
cc1plus: all warnings being treated as errors

I think this may come from the addrlib cmake file:

➜ /ssd/opensource/amdvlk/vulkandriver/drivers  
$ ag self-assign -Gtxt
spvgen/external/SPIRV-tools/CMakeLists.txt
92:    set(SPIRV_WARNINGS ${SPIRV_WARNINGS} -Wno-self-assign)

pal/src/core/imported/addrlib/CMakeLists.txt
223:        -Wno-self-assign

$ ag unused-command-line-argument -Gtxt
llvm-project/libcxx/benchmarks/CMakeLists.txt
11:    -Wno-unused-command-line-argument

llvm-project/libcxx/CMakeLists.txt
683:    target_add_compile_flags_if_supported(${target} PUBLIC -Wno-unused-command-line-argument)

llvm-project/flang/CMakeLists.txt
347:    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-command-line-argument -Wstring-conversion \

llvm-project/compiler-rt/lib/tsan/CMakeLists.txt
116:if("${CMAKE_C_FLAGS}" MATCHES "-Wno-(error=)?unused-command-line-argument")
117:  set(EXTRA_CFLAGS "-Wno-error=unused-command-line-argument ${EXTRA_CFLAGS}")

llvm-project/libcxxabi/CMakeLists.txt
442:  add_compile_flags_if_supported(-Wno-unused-command-line-argument)

pal/src/core/imported/addrlib/CMakeLists.txt
220:        -Wno-unused-command-line-argument

pal/src/core/imported/vam/CMakeLists.txt
110:        -Wno-unused-command-line-argument

Is it possible to submit update patches individually rather than one giant patch includes many patches?

It is very hard to read added features when they are not isolated.

PAL Code Object to D3D12 Pipeline State

Hello -- Not an issue, but a question. If I have a PAL compatible code object can is there an interface to create a ID3D12PipelineState object from it?

Thank You

Justin