I have noticed that LLVM is unable to hoist deor table loads out of loops becaus

Are there other deor-table-like things that we should add a s

I came up with a quick patch to add the assumption: <div class="highlight highligh

Are there other deor-table-like things that we s

I checked the VK_EXT_deor_buffer, the alignment is the same as the

I have noticed that LLVM is unable to hoist deor table loads out of

I have noticed that LLVM is unable to hoist deor table

I checked the VK_EXT_deor_buffer, the alignment is the

Add LLVM assumptions for descriptor table size and alignment about llpc HOT 14 OPEN

jayfoad commented on August 11, 2024

Add LLVM assumptions for descriptor table size and alignment

from llpc.

Comments (14)

piotrAMD commented on August 11, 2024 1

Are there other descriptor-table-like things that we should add a similar assumption for?

Not descriptor-table-like, but I did experiment with adding assume on some built-in variables like WorkgroupId (< 65536) and LocalInvocationId (< workGroupSize). That allowed optimizers to make some useful simplifications. I haven't pursued that just yet, because on the shader I looked at that change made the code worse, but I think that would be a good change in general.

from llpc.

ruiling commented on August 11, 2024 1

For the alignment, I'd say assuming 16 is actually fairly reasonable? We'd need to double-check with VK_EXT_descriptor_buffer though.

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

About the dereferenceable size, I think we can simply put a UINT_MAX, I don't think we need to care about this number much. We just let LLVM know all accesses to descriptor table are always dereferenceable. any concern here?

from llpc.

jayfoad commented on August 11, 2024

I came up with a quick patch to add the assumption:

diff --git a/lgc/patch/PatchEntryPointMutate.cpp b/lgc/patch/PatchEntryPointMutate.cpp
index c3d6519c0..c01e0ed1a 100644
--- a/lgc/patch/PatchEntryPointMutate.cpp
+++ b/lgc/patch/PatchEntryPointMutate.cpp
@@ -586,6 +586,12 @@ void PatchEntryPointMutate::fixupUserDataUses(Module &module) {
             // high half.
             descTableVal = addressExtender.extend(descTableVal, highHalf, ptrType, builder);
             addrExtMap[isDescTableSpilled].insert({highHalf, descTableVal});
+
+            // Add an assumption to tell LLVM the alignment and size of the descriptor table.
+            OperandBundleDefT<Value *> alignOpB("align", std::vector<Value *>{descTableVal, builder.getInt32(16)});
+            OperandBundleDefT<Value *> dereferenceableOpB("dereferenceable",
+                                                          std::vector<Value *>{descTableVal, builder.getInt32(1000)});
+            builder.CreateAssumption(builder.getTrue(), {alignOpB, dereferenceableOpB});
           }
 
           // Replace uses of the call and erase it.

However I don't know this code very well and I need some help:

What should the alignment be? Hardcoding 16 is not very nice.
What should the size be? Hardcoding 1000 is ridiculous.
Are there other descriptor-table-like things that we should add a similar assumption for?

from llpc.

nhaehnle commented on August 11, 2024

I haven't given it much thought, but look at the code that "parses" the lgcName::DescriptorTableAddr. There, the resource node mapping is referenced from which the "correct" ranges should be derivable.

For the alignment, I'd say assuming 16 is actually fairly reasonable? We'd need to double-check with VK_EXT_descriptor_buffer though.

from llpc.

jayfoad commented on August 11, 2024

Are there other descriptor-table-like things that we should add a similar assumption for?

Not descriptor-table-like

I meant either:

pointers passed into a shader, where LLPC knows about their size and alignment but LLVM does not, and/or
pointers generated by AddressExtender which LLVM cannot analyze because they are created by an inttoptr.

from llpc.

ruiling commented on August 11, 2024

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

After looking at the xgl implementation(https://github.com/GPUOpen-Drivers/xgl/blob/dev/icd/api/vk_device.cpp#L1097), sounds like we can only assume 2dword alignment?

from llpc.

ruiling commented on August 11, 2024

I have noticed that LLVM is unable to hoist descriptor table loads out of loops because it does not know the size and aligment of the descriptor table, when the pointer to the table is created with IR like this:

btw, I don't understand why the pointer alignment would affect hoisting, do you see the reason?

from llpc.

jayfoad commented on August 11, 2024

I have noticed that LLVM is unable to hoist descriptor table loads out of loops because it does not know the size and aligment of the descriptor table, when the pointer to the table is created with IR like this:

btw, I don't understand why the pointer alignment would affect hoisting, do you see the reason?

Before LICM will hoist an instruction it calls ValueTracking isSafeToSpeculativelyExecute which calls isDereferenceableAndAlignedPointer.

from llpc.

jayfoad commented on August 11, 2024

I checked the VK_EXT_descriptor_buffer, the alignment is the same as the descriptor size, the minimum descriptor size is 4-dword. So I think using 16 alignment is reasonable here.

After looking at the xgl implementation(https://github.com/GPUOpen-Drivers/xgl/blob/dev/icd/api/vk_device.cpp#L1097), sounds like we can only assume 2dword alignment?

That is a problem. According to DataLayout, the natural alignment for a <4 x i32> load is 16 bytes, so LLVM will not speculate descriptor loads unless it knows that the pointer has 16 byte alignment.

from llpc.

jayfoad commented on August 11, 2024

For a test case I was using Vulkan CTS test dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.6.6, but that relied on #2112. Now #2112 has been reverted (#2381), descriptor loads are inserted in the entry block, so there is nothing for LICM to hoist. So you would have to un-revert #2112 to get a useful test case.

from llpc.

ruiling commented on August 11, 2024

That is a problem. According to DataLayout, the natural alignment for a <4 x i32> load is 16 bytes, so LLVM will not speculate descriptor loads unless it knows that the pointer has 16 byte alignment.

I am not sure what kind of effect we would have if we gave a much better alignment like 16/32 or even larger? Actually in hardware perspective, we only need DWORD alignment. So I think codegen should still be correct. As for all the s_load_dword_xN instruction, they only require dword alignment, so this is kind of divergence with our ABI data-layout.

from llpc.

ruiling commented on August 11, 2024

For a test case I was using Vulkan CTS test dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.6.6, but that relied on #2112. Now #2112 has been reverted (#2381), descriptor loads are inserted in the entry block, so there is nothing for LICM to hoist. So you would have to un-revert #2112 to get a useful test case.

Thanks for that information. If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction. We first find the dominator block, if the dominator block was in a loop, then we insert the descriptor load in the loop pre-header(in case there is no preheader, insert into dominator of the loop-header). Then we don't need to bother LICM to do the optimization work.

from llpc.

jayfoad commented on August 11, 2024

If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction.

Yes, I plan to do something like this, but I still think we should add assumptions so that the LLVM middle/back end has the freedom to move descriptor loads around.

from llpc.

ruiling commented on August 11, 2024

If that is the case, I would suggest we do some additional work when inserting the descriptor load instruction.

Yes, I plan to do something like this, but I still think we should add assumptions so that the LLVM middle/back end has the freedom to move descriptor loads around.

The current situation is that the descriptor table accesses are not aligned to its size. It's not only the base offset matters, but also the offset into the descriptor table. We have descriptor sizes as <2 x i32>, <4 x i32> and <8 x i32>, but the alignment is hard-coded as 2DWord. I am not sure changing the alignment to strictly match their data size is the right way to go.

Making pointers dereferenceable and properly aligned only help LLVM doing speculation transformation. We are still free to sink the descriptor load instructions without them.

On the other way, I am not sure whether we can possibly teach LLVM that certain load instructions are always safe to be speculated if we still want to speculate such instructions. As we also have cases that descriptor table are dynamically indexed where the offset is not a constant at compile time.

from llpc.

Add LLVM assumptions for descriptor table size and alignment about llpc HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs