GithubHelp home page GithubHelp logo

Comments (14)

piotrAMD avatar piotrAMD commented on August 11, 2024 1
  • Are there other descriptor-table-like things that we should add a similar assumption for?

Not descriptor-table-like, but I did experiment with adding assume on some built-in variables like WorkgroupId (< 65536) and LocalInvocationId (< workGroupSize). That allowed optimizers to make some useful simplifications. I haven't pursued that just yet, because on the shader I looked at that change made the code worse, but I think that would be a good change in general.

from llpc.

ruiling avatar ruiling commented on August 11, 2024 1

For the alignment, I'd say assuming 16 is actually fairly reasonable? We'd need to double-check with VK_EXT_descriptor_buffer though.

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

About the dereferenceable size, I think we can simply put a UINT_MAX, I don't think we need to care about this number much. We just let LLVM know all accesses to descriptor table are always dereferenceable. any concern here?

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024

I came up with a quick patch to add the assumption:

diff --git a/lgc/patch/PatchEntryPointMutate.cpp b/lgc/patch/PatchEntryPointMutate.cpp
index c3d6519c0..c01e0ed1a 100644
--- a/lgc/patch/PatchEntryPointMutate.cpp
+++ b/lgc/patch/PatchEntryPointMutate.cpp
@@ -586,6 +586,12 @@ void PatchEntryPointMutate::fixupUserDataUses(Module &module) {
             // high half.
             descTableVal = addressExtender.extend(descTableVal, highHalf, ptrType, builder);
             addrExtMap[isDescTableSpilled].insert({highHalf, descTableVal});
+
+            // Add an assumption to tell LLVM the alignment and size of the descriptor table.
+            OperandBundleDefT<Value *> alignOpB("align", std::vector<Value *>{descTableVal, builder.getInt32(16)});
+            OperandBundleDefT<Value *> dereferenceableOpB("dereferenceable",
+                                                          std::vector<Value *>{descTableVal, builder.getInt32(1000)});
+            builder.CreateAssumption(builder.getTrue(), {alignOpB, dereferenceableOpB});
           }
 
           // Replace uses of the call and erase it.

However I don't know this code very well and I need some help:

  • What should the alignment be? Hardcoding 16 is not very nice.
  • What should the size be? Hardcoding 1000 is ridiculous.
  • Are there other descriptor-table-like things that we should add a similar assumption for?

from llpc.

nhaehnle avatar nhaehnle commented on August 11, 2024

I haven't given it much thought, but look at the code that "parses" the lgcName::DescriptorTableAddr. There, the resource node mapping is referenced from which the "correct" ranges should be derivable.

For the alignment, I'd say assuming 16 is actually fairly reasonable? We'd need to double-check with VK_EXT_descriptor_buffer though.

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024
  • Are there other descriptor-table-like things that we should add a similar assumption for?

Not descriptor-table-like

I meant either:

  • pointers passed into a shader, where LLPC knows about their size and alignment but LLVM does not, and/or
  • pointers generated by AddressExtender which LLVM cannot analyze because they are created by an inttoptr.

from llpc.

ruiling avatar ruiling commented on August 11, 2024

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

After looking at the xgl implementation(https://github.com/GPUOpen-Drivers/xgl/blob/dev/icd/api/vk_device.cpp#L1097), sounds like we can only assume 2dword alignment?

from llpc.

ruiling avatar ruiling commented on August 11, 2024

I have noticed that LLVM is unable to hoist descriptor table loads out of loops because it does not know the size and aligment of the descriptor table, when the pointer to the table is created with IR like this:

btw, I don't understand why the pointer alignment would affect hoisting, do you see the reason?

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024

I have noticed that LLVM is unable to hoist descriptor table loads out of loops because it does not know the size and aligment of the descriptor table, when the pointer to the table is created with IR like this:

btw, I don't understand why the pointer alignment would affect hoisting, do you see the reason?

Before LICM will hoist an instruction it calls ValueTracking isSafeToSpeculativelyExecute which calls isDereferenceableAndAlignedPointer.

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

After looking at the xgl implementation(https://github.com/GPUOpen-Drivers/xgl/blob/dev/icd/api/vk_device.cpp#L1097), sounds like we can only assume 2dword alignment?

That is a problem. According to DataLayout, the natural alignment for a <4 x i32> load is 16 bytes, so LLVM will not speculate descriptor loads unless it knows that the pointer has 16 byte alignment.

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024

For a test case I was using Vulkan CTS test dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.6.6, but that relied on #2112. Now #2112 has been reverted (#2381), descriptor loads are inserted in the entry block, so there is nothing for LICM to hoist. So you would have to un-revert #2112 to get a useful test case.

from llpc.

ruiling avatar ruiling commented on August 11, 2024

That is a problem. According to DataLayout, the natural alignment for a <4 x i32> load is 16 bytes, so LLVM will not speculate descriptor loads unless it knows that the pointer has 16 byte alignment.

I am not sure what kind of effect we would have if we gave a much better alignment like 16/32 or even larger? Actually in hardware perspective, we only need DWORD alignment. So I think codegen should still be correct. As for all the s_load_dword_xN instruction, they only require dword alignment, so this is kind of divergence with our ABI data-layout.

from llpc.

ruiling avatar ruiling commented on August 11, 2024

For a test case I was using Vulkan CTS test dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.6.6, but that relied on #2112. Now #2112 has been reverted (#2381), descriptor loads are inserted in the entry block, so there is nothing for LICM to hoist. So you would have to un-revert #2112 to get a useful test case.

Thanks for that information. If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction. We first find the dominator block, if the dominator block was in a loop, then we insert the descriptor load in the loop pre-header(in case there is no preheader, insert into dominator of the loop-header). Then we don't need to bother LICM to do the optimization work.

from llpc.

jayfoad avatar jayfoad commented on August 11, 2024

If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction.

Yes, I plan to do something like this, but I still think we should add assumptions so that the LLVM middle/back end has the freedom to move descriptor loads around.

from llpc.

ruiling avatar ruiling commented on August 11, 2024

If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction.

Yes, I plan to do something like this, but I still think we should add assumptions so that the LLVM middle/back end has the freedom to move descriptor loads around.

The current situation is that the descriptor table accesses are not aligned to its size. It's not only the base offset matters, but also the offset into the descriptor table. We have descriptor sizes as <2 x i32>, <4 x i32> and <8 x i32>, but the alignment is hard-coded as 2DWord. I am not sure changing the alignment to strictly match their data size is the right way to go.

Making pointers dereferenceable and properly aligned only help LLVM doing speculation transformation. We are still free to sink the descriptor load instructions without them.

On the other way, I am not sure whether we can possibly teach LLVM that certain load instructions are always safe to be speculated if we still want to speculate such instructions. As we also have cases that descriptor table are dynamically indexed where the offset is not a constant at compile time.

from llpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.