GithubHelp home page GithubHelp logo

Transition to physical addressing about clvk HOT 11 OPEN

kpet avatar kpet commented on August 23, 2024
Transition to physical addressing

from clvk.

Comments (11)

rjodinchr avatar rjodinchr commented on August 23, 2024 1

Here are the regressions I got running on an Nvidia device:

DATA ERROR

  • api/set_kernel_arg_struct_array
  • basic/async_copy_local_to_global
  • basic/async_copy_global_to_local
  • basic/async_strided_copy_global_to_local
  • basic/async_strided_copy_local_to_global
  • basic/vload_local
  • basic/vstore_local
  • basic/work_item_functions
  • compiler/compile_and_link_status_options_log
  • compiler/execute_after_serialize_reload_library
  • compiler/execute_after_simple_library_with_link
  • compiler/execute_after_two_file_link
  • compiler/multi_file_libraries
  • compiler/multiple_files
  • compiler/multiple_files_multiple_libraries
  • compiler/multiple_libraries
  • compiler/program_binary_type
  • compiler/execute_after_included_header_link
  • images/kernel_image_methods/1D-CL_SIGNED_INT32
  • images/kernel_image_methods/1D-CL_SIGNED_INT8
  • images/kernel_image_methods/1D-CL_SNORM_INT16
  • images/kernel_image_methods/1D-CL_SNORM_INT8
  • images/kernel_image_methods/1D-CL_UNORM_INT16
  • images/kernel_image_methods/1D-CL_UNORM_INT8
  • images/kernel_image_methods/1D-CL_UNORM_SHORT_565
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/1Darray-CL_FLOAT
  • images/kernel_image_methods/1Darray-CL_HALF_FLOAT
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT32
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT8
  • images/kernel_image_methods/1Darray-CL_SNORM_INT16
  • images/kernel_image_methods/1Darray-CL_SNORM_INT8
  • images/kernel_image_methods/1Darray-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2D-CL_FLOAT
  • images/kernel_image_methods/2D-CL_HALF_FLOAT
  • images/kernel_image_methods/2D-CL_SIGNED_INT16
  • images/kernel_image_methods/2D-CL_SIGNED_INT32
  • images/kernel_image_methods/2D-CL_SIGNED_INT8
  • images/kernel_image_methods/2D-CL_SNORM_INT16
  • images/kernel_image_methods/2D-CL_SNORM_INT8
  • images/kernel_image_methods/2D-CL_UNORM_INT16
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_FLOAT
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT32
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT8
  • images/kernel_image_methods/2Darray-CL_SNORM_INT8
  • images/kernel_image_methods/2Darray-CL_UNORM_INT8
  • images/kernel_image_methods/2Darray-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT8
  • images/kernel_image_methods/3D-CL_FLOAT
  • images/kernel_image_methods/3D-CL_SIGNED_INT16
  • images/kernel_image_methods/3D-CL_SIGNED_INT32
  • images/kernel_image_methods/3D-CL_SIGNED_INT8
  • images/kernel_image_methods/3D-CL_SNORM_INT8
  • images/kernel_image_methods/3D-CL_UNORM_INT16
  • images/kernel_image_methods/3D-CL_UNORM_INT8
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/1D-CL_FLOAT
  • images/kernel_image_methods/1D-CL_HALF_FLOAT
  • images/kernel_image_methods/1D-CL_SIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_UNORM_INT16
  • images/kernel_image_methods/1Darray-CL_UNORM_INT8
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT32
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT8
  • images/kernel_image_methods/2D-CL_UNORM_INT8
  • images/kernel_image_methods/2D-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/2Darray-CL_HALF_FLOAT
  • images/kernel_image_methods/2Darray-CL_SNORM_INT16
  • images/kernel_image_methods/2Darray-CL_UNORM_INT16
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT32
  • images/kernel_image_methods/3D-CL_HALF_FLOAT
  • images/kernel_image_methods/3D-CL_SNORM_INT16
  • images/kernel_image_methods/3D-CL_UNORM_SHORT_565
  • vectors/vec_align_packed_struct_arr
  • profiling/read_array_struct

CRASH

  • atomics/atomic_add
  • atomics/atomic_and
  • atomics/atomic_cmpxchg
  • atomics/atomic_dec
  • atomics/atomic_max
  • atomics/atomic_min
  • atomics/atomic_or
  • atomics/atomic_sub
  • atomics/atomic_xchg
  • atomics/atomic_inc
  • atomics/atomic_xor
  • c11_atomics/atomic_compare_exchange_strong
  • c11_atomics/atomic_exchange
  • c11_atomics/atomic_fetch_add
  • c11_atomics/atomic_fetch_and
  • c11_atomics/atomic_fetch_min
  • c11_atomics/atomic_fetch_orand
  • c11_atomics/atomic_fetch_sub
  • c11_atomics/atomic_fetch_xor
  • c11_atomics/atomic_fetch_xor2
  • c11_atomics/atomic_compare_exchange_weak
  • c11_atomics/atomic_fence
  • c11_atomics/atomic_fetch_max
  • c11_atomics/atomic_fetch_or
  • half/vstore_half_rtn-wimpy
  • half/vstore_half_rtp-wimpy
  • half/vstore_half_rte-wimpy
  • half/vstore_half_rtz-wimpy
  • half/vstorea_half_rte-wimpy
  • non_uniform_work_group/non_uniform_1d_atomics
  • non_uniform_work_group/non_uniform_1d_barriers
  • non_uniform_work_group/non_uniform_2d_atomics
  • non_uniform_work_group/non_uniform_2d_barriers
  • non_uniform_work_group/non_uniform_3d_atomics
  • non_uniform_work_group/non_uniform_3d_barriers
  • non_uniform_work_group/non_uniform_3d_basic
  • non_uniform_work_group/non_uniform_other_basic
  • non_uniform_work_group/non_uniform_1d_basic
  • non_uniform_work_group/non_uniform_2d_basic
  • non_uniform_work_group/non_uniform_other_atomics

FAILED

  • buffers/buffer_read_struct
  • buffers/buffer_map_read_struct

from clvk.

kpet avatar kpet commented on August 23, 2024 1

Agree the shader looks correct after a quick scan. As tempting as it is to blame drivers, I am seeing failures across two Vulkan stacks and HW from 3 vendors (NVIDIA, Mesa/AMD, Mesa/Intel). clvk is definitely not allocating the memory properly, the code in main does not have some of the flags used in my original prototype (VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage flag on the buffer and corresponding VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT memory allocation flag). After fixing this the validation layers are happy but it does not affect what I'm reading back from memory. On an Intel A750, I'm seeing all zeros. On AMD HW, I'm seeing one of the two values (the float with the test unmodified, the int if I comment out the line storing the float) but 3 bytes off where it should be. I've tried to strengthen memory barriers after kernels, to no avail. This is going to be a fun one ...

from clvk.

alan-baker avatar alan-baker commented on August 23, 2024 1

I'll take a look, I would have thought this was handled already.

from clvk.

rjodinchr avatar rjodinchr commented on August 23, 2024 1

I am working on the DATA ERROR issue.
I have made some progress, but I need more time to have it fixed.

from clvk.

kpet avatar kpet commented on August 23, 2024

Thanks for posting this! Looks about the same as what I'm seeing at my end.

from clvk.

rjodinchr avatar rjodinchr commented on August 23, 2024

I am working on those right now.

from clvk.

rjodinchr avatar rjodinchr commented on August 23, 2024

I have started with buffer/buffer_read_struct which is in fact as those in DATA ERROR.
It feels like a simple one to start with, but I have trouble understanding what is wrong.

Here is the kernel and the compiled version: https://godbolt.org/z/1Ejas96Yb

The output is:

  • dst[tid].a = 3.40282346638528860e+38
  • dst[tid].b = 0

I have tried to inverse the two stores: dst[tid].a and dst[tid].b. I ended up with:

  • dst[tid].a = ((1<<16)+1)
  • dst[tid].b = 0.0

It is like only the second storing is taken into account, and the last index of the OpPtrAccessChain used to compute the storing address is not taken into account.

I have also tried to force clspv to use ulong for the index of the two OpPtrAccessChain used to compute the storing address, but it had the same behavior.

@alan-baker , @kpet , any ideas?

from clvk.

alan-baker avatar alan-baker commented on August 23, 2024

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

from clvk.

rjodinchr avatar rjodinchr commented on August 23, 2024

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

Yes I have tried that, the writing occurs in the right thread.

from clvk.

kpet avatar kpet commented on August 23, 2024

I've got the test passing, we're not laying out the structure but all structures in the PhysicalStorageBuffer storage class must be explicitly laid out. I've hacked clspv to test but a proper solution will require more work. There might be cases where the same struct type is used to declare objects in storages classes where they must be explicitly laid out and in others where they must not but the layout decorations are applied to the type so we need multiple types. I vaguely remember that there's logic to deal with this in clspv already. Happy to hand over to @alan-baker or @rjodinchr for a full clspv solution.

from clvk.

kpet avatar kpet commented on August 23, 2024

Here's the clspv hack I used for reference:

diff --git a/lib/SPIRVProducerPass.cpp b/lib/SPIRVProducerPass.cpp
index d39555e5..f1eb95bc 100644
--- a/lib/SPIRVProducerPass.cpp
+++ b/lib/SPIRVProducerPass.cpp
@@ -1918,9 +1918,9 @@ SPIRVID SPIRVProducerPassImpl::getSPIRVType(Type *Ty, bool needs_layout) {
     StructType *canonical = cast<StructType>(CanonicalType(STy));
     bool use_layout =
         (Option::SpvVersion() < SPIRVVersion::SPIRV_1_4) || needs_layout;
-    if (TypesNeedingLayout.idFor(STy) &&
+    if (/*TypesNeedingLayout.idFor(STy) &&
         (canonical == STy || !TypesNeedingLayout.idFor(canonical)) &&
-        use_layout) {
+        use_layout*/true) {
       for (unsigned MemberIdx = 0; MemberIdx < STy->getNumElements();
            MemberIdx++) {
         // Ops[0] = Structure Type ID

from clvk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.