Motivation
With the announcement of HLSL 2021 https://devblogs.microsoft.com/directx/announcing-hlsl-2021/ templates, operator overloading and bitfields have been introduced. We observe that it would be a substantial cost and time inertia to follow such impactful language evolutions in AZLSc. The question is: is there a way to remove the logic-heavy part of azslc (semantic analysis) to make it that azsl is transparently hlsl? That way future evolutions of HLSL as a language would naturally become immediately available, ideally simply by a package release of DXC.
Suggestion
One option that seems to me of least effort, would be to cut the edition process in 2 parts, the AZSL part that holds the resources, and the HLSL part that holds the code.
Concept prototype idea 1
If from that input:
ShaderResourceGroupSemantic slot1
{
FrequencyId = 1;
};
ShaderResourceGroup SRG : slot1
{
struct CB
{
float4 color;
};
ConstantBuffer<CB> m_uniforms;
};
We get that output:
struct SRG_CB
{
float4 color;
};
ConstantBuffer <::SRG_CB> SRG_m_uniforms : register(b0, space0);
Then we can save it to inputs.hlsl
and extend it with follow up file:
#include "inputs.hlsl" // auto generated from azsl
float4 MainPS( float2 uv : TEXCOORD0) :SV_Target0
{
// edit here
return SRG_m_uniforms.color; // your resource names have mutated, refer to inputs.hlsl to identify their flattened names
}
We note that the resource variables have changed names because of the mutations undergone in the process of SRG-erasure (transpilation from AZSL to HLSL). So it requires the programmers to take consciousness of the mutation scheme, and consult the input.hlsl to know what they have to work with.
Advantages
Inconvenients
- Not good touch-and-feel for the user, since discoverability of "secret variables" is not clear from the original azsl source.
- Loss of code mutators (Zpc Zpr matrix qualifiers, --no-ms or --cb-body mode).
- The 2-step authoring has effects on the Asset Processor build steps. There is one build of the .azsl and another build for the .hlsl which includes the generated part and user-authored parts.
Evolution idea 1.1
The problem is that the mutation can be platform specific, and can be azslc version dependent. Also, it can be unpredictable because of name collision avoidance. e.g. SRG::m_uniforms
may become SRG_m_uniforms
or SRG_m_uniforms1
.
Otherwise said, there is no specification guarantee on the rename scheme.
To ease that issue, we can imagine an __asm__
block scheme, with what historically was called "clobber" declarations to make the link between host language and DSL.
Example of what it could look like:
ShaderResourceGroupSemantic slot1
{
FrequencyId = 1;
ShaderVariantFallback = 128;
};
option bool reflections = false;
ShaderResourceGroup SRG : slot1
{
struct CB
{
float3 sceneBounds;
};
ConstantBuffer<CB> uniforms;
float4 iblAvg;
float4 ambient;
Texture2DMS<float4, 8> fresnel;
enum Composite { Spec, Diff };
Composite Get(bool forceOff) { return !forceOff && reflections ? Spec : Diff; }
CB Get() { return uniforms; }
float4 Get(Composite c) { return c == Spec ? iblAvg : ambient; }
};
struct PSInput
{
float4 position : SV_Position;
float4 color : COLOR0;
};
typealias CB = SRG::CB; // this location is stable so can be referred to
__hlsl__
@{
// declarative zone where lookup happens once from the global scope and gets cached into an alias, that becomes available for the HLSL block.
using Get = SRG::Get; // alias the overloadset
using Spec = SRG::Composite::Spec; // enumerators mutate
using fresnel = SRG::fresnel; // variables also mutate
// from here, code is like a comment for AZSLc
template< typename UV_t >
float4 PScoreT(PSInput input, UV_t uv, int si)
{
CB cb = Get();
float4 spec = fresnel.sample[si](uv); // we lose ability to mutate --no-ms
if (position.xyz < cb.sceneBounds) // field names don't mutate AFAIR
return Get(Get(false));
else
return 0;
}
// templates can't be entry points in HLSL. declare a concrete version
float4 PSMainF2(PSInput i, float2 f : TEXCOORD0, int si : SV_SampleIndex) : SV_Target0
{ return PSMainT(i,f, si); }
}@ // we need a "raw string literal"-way of ending the block
Bear with me that the program is nonsensical. But the point is to illustrate what we lose and what we win.
We win language involutivity, but we lose perfect integration with the azsl-declared resource. They need to be bridged in some way (somewhat akin to lambda capture), so that the access to the mutated symbols in the HLSL block can bind to their intended symbol.
Advantages
- No more magic names as in idea 1, the links become explicit.
- Possibility of preserving an integrated asset build (no 2 steps with the auto-generated include).
Inconvenients
- Do not open the possibility of a codebase diet (symbol lookup must still work for
using
directives)
- Not the best UX because of need to identify used symbols that are external to the
__hlsl__
block and repeat a short declaration.
- Like for idea 1, loss of code mutators (matrix qualifiers, --no-ms or --cb-body mode).
Concept idea 2
Strongly reduce the invasiveness of AZSL specific syntax constructs. Tending instead toward a decorated HLSL.
The compiler would still need to exist to do reflection and resource registers assignation in each platform way. Also would still generate option
and rootconstant
variable getters. Would still need to accept non-HLSL blocks such as: static samplers with in-situ states declarations, or SRG frequencies and option fallback key.
But the names would be expected to be stable since no flattening or scope mutation will happen. Client-site usage (later in code), will remain naturally compatible with the declaration.
As per @santorac proposal:
ShaderResourceGroupSemantic slot1
{
FrequencyId = 1;
};
[ShaderResourcesGroup(slot1)]
namespace
{
struct CB
{
float4 color;
};
ConstantBuffer<CB> uniforms;
}
Using an annotation, AZSLc2 would have to recognize that attribute to register resources instead of the ShaderResourceGroup
block of today.
Advantage
- Opens the possibility of a codebase diet
- On paper, it render azsl files syntactically compatible with shader explorers like godbolt or tim jones playground, OpenGPU analyzers etc.
Inconvenient
- Like for idea 1 and 1.1, loss of code mutators (matrix qualifiers, --no-ms or --cb-body mode).
- Necessity of a one large sweep intervention in current shaders to adapt them. Though we can imagine shipping both azslc versions until potential deprecation at an undefined date.
Prototype
I (@siliconvoodoo) am forking the main repository to try this evolution here: https://github.com/SiliconStudio/o3de-azslc-evo
Findings
I see 3 pathways of implementation to the target:
Further
We can also decide to delete the ShaderResourceGroupSemantic
syntax and integrate it to attributes as well:
[[azsl::ResourceGroupSemantic]]
namespace slot1
{
static const int slot = 1;
static const int frequencyId = 128;
};
[[azsl::ResourceGroup(slot1)]]
namespace SRG
{
struct Data { float4 f; };
ConstantBuffer<Data> glob : register(b0);
}
We'll note that those attributes are still oddly not compatible with DXC. Even with -HV 2021:
error: an attribute list cannot appear here
But it's reasonable as long as AZSLc2 swallows those attributes.
Desirable diet features
seenat
refer to https://github.com/o3de/o3de-azslc/wiki/Features#seenats
This is a necessity for mathematically infallible symbol rename and migration. (the migration from SRG scopes to global, and some typealias/structs from function scopes to outter scope which was a bonus of azsl)
Maintaining this is the most costly because of its dependency to reliable lookup. Lookup depends on semantic contexts and requires understanding of scopes, type deduction, inheritance, function overloads, and overrides.
Introduction of templates is hindered by the weight of updating all these mechanisms.
impacts:
srg-constants references mutations
in the original azsl source:
// ...
ShaderResourceGroup S : slot1
{
float3 sunDir;
float3 Get() { return sunDir; }
};
float4 psmain() : SV_Target0
{
return float4(S::sunDir, 1);
}
accesses to sunDir
get mutated to their actual materialization in a generated constant buffer, as such:
struct S_SRGConstantsStruct
{
float3 S_sunDir;
};
ConstantBuffer<::S_SRGConstantsStruct> S_SRGConstantBuffer : register(b0, space0);
float3 S_Get()
{
return ::S_SRGConstantBuffer.S_sunDir ;
}
float4 psmain() :SV_Target0
{
return float4 ( ::S_SRGConstantBuffer.S_sunDir , 1 ) ;
}
Finding the points of mutation requires the seenat system.
One way to do away with that problem is to adopt the option
strategy which is to declare a static variable that is fetch from a function call. Refer to wiki features paragraph for illustration.
rootconstant
mutations
azsl example source:
rootconstant bool fog;
static const float3 fogClr = float3(0.5, 0.5, 0.5);
float4 psmain(float3 clr : COLOR0, float d : DEPTH) : SV_Target0
{
return float4(fog ? lerp(clr, fogClr, pow(1.8, d)) : clr, 1);
}
results in mutated references to:
bool GetShaderRootConst_Root_Constants_fog();
static const bool _g_Root_Constants_fog = GetShaderRootConst_Root_Constants_fog();
static const float3 fogClr = float3 ( 0.5 , 0.5 , 0.5 ) ;
float4 psmain( float3 clr :COLOR0, float d :DEPTH) :SV_Target0
{
return float4 ( _g_Root_Constants_fog ? lerp ( clr , fogClr , pow ( 1.8 , d ) ) : clr , 1 ) ;
}
struct Root_Constants
{
bool fog;
};
ConstantBuffer<::Root_Constants> rootconstantsCB : register(b0, space0);
bool GetShaderRootConst_Root_Constants_fog()
{
return ::rootconstantsCB.fog;
}
Suggestion for solution: the variable is mutated at definition site to a self-initializing static. Therefore we could imagine that the name needn't change. That behavior seems like a conservative choice but it doesn't seem necessary, the original symbol name could probably be preserved. That would be consistent with the behavior for option
s. And lift us from the need of iterating the references.
--no-ms, --strip-unused-srg, --cb-body, --bindingdep
Mentioned in later paragraphs.
Packing
Feature to reflect "constantByteOffset" "constantByteSize" "typeDimensions"
(document).
It would be desirable to delete all the alignment computation code that serves as a support to RHI for buffer-as-byte understanding of where (at what offset) variables actually sit in the CB. This code is heavy and costs many tests, has cost long investigations of reverse engineering DXC, and multiplied by the combination of options for vulkan/dx/glsl rules.
Dependencies: The pack computer relies on the type system. Because it accesses type class (user defined or fundamental, matrix or vector) and typeinfo (array or matrix dimensions), and sizeof. The type system can't work without symbol lookup system because they can be combined (UDT members, inheritance...), and typedefed.
Alternative: rely on a sort of reflective DXC API?
Constant folding
It would save us a small amount of code, but always a nice to add to the diet. Unfortunately at this point, it's important for array dimension reflection of SRG resources. Also for [[pad_to(N)]] feature, or option range, the thread count reflector (for metal), static sampler reflection.
Difficult features
--no-ms
Any route will at least diminish the robustness of the current approach: no more possible to check that X in X.Load() is a Texture2DMS type-referring symbol since it relies on Lookup facility, and typeof facility.
Yellow route: Totally unsupported. We won't have enough grammar power to work on AST level anymore.
Alternatives:
- dxil/spirv evolution for flag-configurable texture resources?
- a separated human assisted tool that works with regex (or tree-sitter?) to generate the supervariant version after an edit of the main file, if that supervariant is present in the .shader?
--cb-body
This behavior switch on the CLI activates a mode where generation of constant buffer takes a different form, as such:
input:
ShaderResourceGroup SRG : slot1
{
float4 color;
struct CB { bool fog; };
ConstantBuffer<CB> uniforms;
};
float4 MainPS(float2 uv : TEXCOORD0) : SV_Target0
{
return SRG::color + SRG::uniforms.fog;
}
output:
struct SRG_CB { bool fog; };
ConstantBuffer SRG_CBContainer : register(b0)
{
CBVArrayView uniforms;
float4 SRG_color;
};
static const RegularBuffer<::SRG_CB> SRG_uniforms = RegularBuffer<::SRG_CB> (uniforms);
float4 MainPS( float2 uv :TEXCOORD0) :SV_Target0
{
return ::SRG_color + ::SRG_uniforms[0] . fog ;
}
We note that again, in MainPS
the references to external resources are mutated to a different symbol. SRG::color
becomes ::SRG_Color
and more complicated, access to uniforms
has be enriched with a subscript access.
Solution: It seems like we could once again go with the option
strategy, declare a static accessor variable with an initializer calling a generated getter function fetching the member in the generated ConstantBuffer corresponding member.
As a matter of fact, since these srg-constant variables becomes immediately visible to the outter scope, they may even be declared as is, with the same original name. It will cause declaration vs access order problems though since their declaration site will be migrated all together into one location. We already have this problem for srg-constants. The static variable with initializer seem like a simple enough counter strategy for that.
The [0] subscript can maybe be solved in the exact same way, let the references refer to a generated static variable, and initialize it with a fetcher function, the body of that fetcher function will possess the [0] subscript. This way we free the reference sites accross the program from need of awareness of this specificity.
--bindingdep
This system reflects the "participantsConstants" (in JSON) by "dependentFunctions" (entry points). (documentation in features page).
It relies on the reference tracker to iterate on appearances through the program of external resources. Same core system than --strip-unsued-srgs
as described in this picture.
The code for the feature can be found here.
We can either forgo of that facility, if we re-evaluate its necessity in sensitive platforms like vulkan or metal. Or we will need to find an alternative, using DXC internal API, maybe by analyzing remaining resources post optimization. Though I seem to recall optimization was not a factor, even on the contrary we needed to know the variables that should be there by contract, irrespective of potential dead-variable optimizations.
Maybe leveraging clang (or the clang in DXC) if push comes to shove.
--strip-unused-srgs
Reference doc: https://github.com/o3de/o3de-azslc/wiki/Features#strip-unused-srgs
This relies on the homonym visitor system to visit the seenats of each resource inside an SRG.
It's the same problem as --bindingdep since it relies on the same system.
Any route: will rid us of ability to iterate over seenats since the point of the evolution RFC, is to remove the complexity involved in the seenat system.
Alternatives: Drop that feature? It seems undesirable in raytracing contexts. Maybe we can hack an artificial resource-toucher to force DXC to not optimize out resources (or propose a flag)? Or if we do the clang explorer for --bindingdep it will be factorized for that feature too.