cg-tuwien / auto-vk Goto Github PK

Afterburner for Vulkan development; Auto-Vk is a modern C++ low-level convenience and productivity layer atop Vulkan-Hpp.

License: MIT License

C++ 99.85% CMake 0.15%

vulkan-hpp vulkan real-time-rendering

auto-vk's People

Contributors

Stargazers

Watchers

Forkers

firnor jolifantobambla saaye-tu jakobper mrumpelnik ifl0w clayne libreliu sondro jimbok8 dnikolaidis2 ahab-l knut0815 asdlei99 geraldkimmersdorfer luciddream-tsin canadacow

auto-vk's Issues

Add meta data for new ray tracing buffer usages

With the official VK_KHR_ray_tracing extension, additional buffer usage flags have been introduced (and vk::BufferUsageFlagBits::eRayTracingKHR has been removed).

Add new meta data for the new buffer usages:

Shader Binding Table (see avk.cpp#L6309)
Acceleration Structure (see avk.hpp#L256)
Anything else? Did I forget anything?

can't compile with -Wall flag

Can't compile with -Wall flag. There's a lot of warnings and some errors.

Add support for buildAccelerationStructureIndirectKHR

eShaderDeviceAddressKHR or eShaderDeviceAddress

Which one to use - eShaderDeviceAddressKHR or eShaderDeviceAddress?

Why is there a KHR version anyways? Can't we just use the non-extension version?

Currently, eShaderDeviceAddressKHR is used (probably) everywhere. => Choose one and unify usage.

Base conditional compilation on Vulkan SDK version for things that were added to the standard with SDK 1.2

Instead of evaluating the header version, i.e. e.g.

#if VK_HEADER_VERSION >= 135

the following would be more appropriate for such cases:

#if defined(VK_VERSION_1_2)

Definition of done:

Each feature that is enabled in #if VK_HEADER_VERSION-fashion has been investigated if it is a feature that has been added to the standard with SDK version 1.2. If so, it has been replaced with #if defined(VK_VERSION_1_2).

Add support for memory region which is BOTH: DeviceLocal AND HostCoherent

Modern GPUs typically have a (relatively small) memory region which is both: vk::MemoryPropertyFlagBits::eDeviceLocal | vk::MemoryPropertyFlagBits::eHostCoherent. See the following output of context().print_available_memory_types();:

INFO: ========== MEMORY PROPERTIES OF DEVICE 'NVIDIA GeForce RTX 3070'   | file[avk.cpp] line[118]
INFO: ------------------------------------------------------------- | file[avk.cpp] line[119]
INFO:  HEAP TYPES:                                                  | file[avk.cpp] line[120]
INFO:  heap-idx |               bytes | heap flags                  | file[avk.cpp] line[121]
INFO: ------------------------------------------------------------- | file[avk.cpp] line[122]
INFO:         0 |       8,433,696,768 | { DeviceLocal }             | file[avk.cpp] line[139]
INFO:         1 |      17,141,145,600 | {}                          | file[avk.cpp] line[139]
INFO:         2 |         224,395,264 | { DeviceLocal }             | file[avk.cpp] line[139]
INFO: ============================================================= | file[avk.cpp] line[141]
INFO:  MEMORY TYPES:                                                | file[avk.cpp] line[142]
INFO:  mem-idx | heap-idx | memory propety flags                    | file[avk.cpp] line[143]
INFO: ------------------------------------------------------------- | file[avk.cpp] line[144]
INFO:        0 |         1 | {}                                     | file[avk.cpp] line[154]
INFO:        1 |         0 | { DeviceLocal }                        | file[avk.cpp] line[154]
INFO:        2 |         1 | { HostVisible | HostCoherent }         | file[avk.cpp] line[154]
INFO:        3 |         1 | { HostVisible | HostCoherent | HostCached } | file[avk.cpp] line[154]
INFO:        4 |         2 | { DeviceLocal | HostVisible | HostCoherent } | file[avk.cpp] line[154]
INFO: ============================================================= | file[avk.cpp] line[156]

The problem is just, that enum struct memory_usage only supports device OR host_coherent, but not the combination of both.

Furthermore, the framework does probably disregard which memory region exactly a memory_usage::device is allocated from (in the example above, the options would be memory regions at heap indices 0 or 2, but the framework does probably not really care which one it chooses -- but it should!)

memory_usage::device should allocate from the memory region that has the flag vk::MemoryPropertyFlagBits::eDeviceLocal, but does not have vk::MemoryPropertyFlagBits::eHostCoherent! Furthermore, this flag should probably be renamed into memory_usage::device_local!

memory_usage::host_coherent should allocate from the memory region that has the flag vk::MemoryPropertyFlagBits::eHostCoherent, but does not have vk::MemoryPropertyFlagBits::eDeviceLocal!

And a new memory_usage::device_local_and_host_coherent should allocate from the memory region that has both flags: vk::MemoryPropertyFlagBits::eDeviceLocal | vk::MemoryPropertyFlagBits::eHostCoherent. The question is just if there is a better name for this. (I think, there was a dedicated name for this memory region, but I can't remember.)

support mutable format

...to enable an image view with different format than the image

Proper lifetime handling of those "->as_something()" (and similar) objects (should be temporary lifetime!)

When using image_view_t::as_storage_image or similar (intended to be temporary) objects of the pattern "as_something", the following situation can occur:

An image_view_as_storage_image stores the vk::ImageView handle internally, and its lifetime might extend the lifetime of the image_view it has stored the handle from. That's not good. Optimally, the framework would prevent such usages.

One option would be that the image_view_as_storage_image co-owns the image_view which would imply (for most cases, probably) that enable_shared_ownership() is being applied to an image_view. That is a potentially huge overhead for an operation which produces an object that shall only be used as a temporary object anyways. So, what to do?

Best option would probably be to modify image_view_as_storage_image in a way so that it can not be stored somewhere but can only be used as a temporary object. Maybe all functions/methods that take a parameter of type image_view_as_storage_image shall take it as rvalue only (i.e. image_view_as_storage_image&&). If, furthermore, its move constructor and move assignment operator would be disabled, that might prevent the user from std::move-ing it into those functions/methods that take an image_view_as_storage_image&& parameter.

This shall be implemented for the following types:

class image_view_as_input_attachment
class image_view_as_storage_image
class buffer_descriptor
class buffer_view_descriptor
shader_binding_table_ref

Pattern for disabling all kinds of stuff for type T:

		T() = delete;
		T(T&&) noexcept = delete;
		T(const T&) = delete;
		T& operator=(T&&) noexcept = delete;
		T& operator=(const T&) = delete;
		~T() = delete;

Update:
With the introduction of the functions as_uniform_buffers, as_uniform_texel_buffer_views, as_storage_buffers, as_storage_texel_buffer_views, as_storage_images, and as_input_attachments in bindings.hpp, the request to make those helper classes non-storable has become a bit more challenging, because they are stored in those functions.

Idea: Can constructors be declared as being friend of something, so that only those functions may use them, but not "ordinary" users of the framework? I'm not a big fan of restricted usage either, but we need to ensure correctness, and if those classes do not own the resources they are referencing, we can not ensure correctness.

MoltenVK support?

Hi,
I hope this request is not too outrageous, I greatly appreciate the work you are doing!
Might it be possible to integrate MoltenVK into / use it with Auto-VK?
I'm an absolutely clueless beginner, and I'm trying to write a cross-platform audio plug-in that displays a GLSL fragment shader in its window, like on shadertoys.com.

create_renderpass does not natively support subpass self-dependencies

If you want to use pipeline barriers withing a renderpass or subpass, a self-dependency to the subpass the barrier is used in needs to be specified during renderpass creation. Details here.

This can currently be circumvented by using the alterConfigBeforeCreation feedback, but needs to be done with native Vulkan configs. A suitable abstraction (maybe in the renderpass_sync?) would be great for that.

Deprecate VK_HEADER_VERSION 135 Ray Tracing functionality and clean up code!

In a few months time, just deprecate all previous Ray Tracing code (i.e. the original NVIDIA version and also the VK_KHR_ray_tracing beta version from header version 135)

Only support official VK_KHR_ray_tracing which was introduced with header version 162.

=> Search for all

#if VK_HEADER_VERSION >= 162
#else 
#endif

patterns and delete all the #else branches!

Make the shader binding table reside in "device"-memory

The shader binding table (SBT) is created right after the ray tracing pipeline is created by the means of vk::Device::createRayTracingPipelineKHR and put into the following buffer:

result.mShaderBindingTable = create_buffer(
	memory_usage::host_coherent, 
	vk::BufferUsageFlagBits::eRayTracingKHR,				
	generic_buffer_meta::create_from_size(shaderBindingTableSize)
);

This has been done because then there needs to be no synchronization. However, host_coherent is suboptimal => it should be device-memory. In order to store the SBT-entries in device-memory, synchronization has to be added.

Support for export objects into OpenGL?

I would like to ask or suggest to add support for memory export in OpenGL. Namely, built-in support for GL_EXT_external_objects, EXT_external_objects_win32, EXT_external_objects_fd… in short, all to support interoperability with OpenGL. Of course, I can try to implement it myself, but it already depends on whether such a Pull Request is accepted, whether I can do it, and how much it will change the code…

Uniformly return by value

As pointed out in some C++ talk that I'll have to find once again returning by reference or even const-reference can be very dangerous. Because if the returnee dies, the reference is dangling. Therefore, one should always return by value.

There are some exceptions to this rule where it is okay to return by reference --- namely when it can be ensured with certainty that the returnee outlives anything that could be done with a returned reference. Specifically, that applies to:

root::physical_device
root::device
root::dynamic_dispatch
root::memory_allocator

In many cases (e.g. returning a handle) return by value has probably exactly the same performance cost as returning the reference. I.e. in many cases, you don't get anything positive from returning a reference. In some cases, return by reference can be cheaper: e.g. when returning a vector of something. However, correctness is always more important than performance.

Actually, there is also a potential "security" risk: By const_cast-ing away the const of a const T&, one could modify the internal state through the returned reference.

Definition of Done:

All functions/methods follow the "return by value" principle.
Concrete locations TBD.

Add .gitignore

Hi,

Would you mind adding a .gitignore?
I get modified: gears_vk (untracked content) by git status when I build gears_vk because auto_vk will produce an out/ directory.
As auto_vk is a submodule, I do not wish to change anything inside, and an external .gitignore has no effect on folders/files inside submodules.

Thanks so much!

Uniform return type for `handle()`s

In most places, const auto& is returned, but handles are just (64-bit, I think) integers, so returning auto would also be a good option. What's better?

Example: vk::Image handle() instead of const vk::Image& handle()

Add numInstances parameter to top_level_acceleration_structure_t::update and top_level_acceleration_structure_t::build

...so that one can build fewer instances as the maximum number.
Also ensure that numInstances is less than the maximum.

Standard library header <thread> not included

std::thread::id is used in descriptor_cache.hpp, but <thread> is not included in avk.hpp.

Refactor avk::image_usage

The current avk::image_usage is somewhat inflexible and suboptimal design. Furthermore, it contains several flags which are no longer required due to image layout transitions now being explicitly handled by users (therefore, image_usage::read_only, image_usage::presentable, and image_usage::shared_presentable can be removed).

Better: Refactor the existing enum struct image_usage into the following format:

namespace image_usage
{
  struct image_usage_data
  {
      vk::ImageUsageFlags mImageUsageFlags;
      vk::ImageTiling mImageTiling;
      vk::ImageCreateFlags mImageCreateFlags;
  };
}

and let such a struct be created by users in a convenient manner in the style of avk::stage or avk::access, using operator| or operator+ to let the user conveniently put together such a configuration.

Furthermore (and this is probably the big part of work for this issue): Refactor all the places where enum struct image_usage is used currently accordingly!

Hint: See extern std::tuple<vk::ImageUsageFlags, vk::ImageTiling, vk::ImageCreateFlagBits> to_vk_image_properties(avk::image_usage aImageUsage) for further details!

Add indirect draw call support to command_buffer_t

Implement a method similar to command_buffer_t::draw_vertices and command_buffer_t::draw_indexed that performs an indirect draw call. The indirect draw call data is stored in a buffer that must be passed to vk::CommandBuffer::drawIndexedIndirect.

Furthermore, an additional buffer meta data will have to be added (=> buffer_meta.hpp) which stores all the usual offsets, sizes, and numbers, and it will demand vk::BufferUsageFlagBits::eIndirectBuffer usage flags for the buffer.

Static friend declarations not valid in Clang

Static friend declarations (e.g. equality operator in descriptor_set) are not valid in Clang and thus Auto-Vk does not compile.
"error: 'static' is invalid in friend declarations"

image_t layout is inconsistent

The image_t currentLayout and targetLayout is not always set to the actual current layouts, which makes it hard to perform pipeline barriers (image memory barrier) inside of a renderpass for example, as these layouts are likely undefinded in that case. (unless some previous step sets the layout correctly)

Layout tracking probably needs to be reworked to reflect the correct layout at the time of calling the getter.

Current workaround is to set the layout manually before making a call to the barrier or other functions that rely on the layout.
Or you can always perform the direct Vulkan calls manually.

Comment bindings.hpp

....which actually should be named "descriptor_bindings.hpp"

Add comments explaining what all the code in that file can be used for and how. Add usage examples!

Optimize copying of `descriptor_set` instances

During the cg_base -> Auto-Vk + Gears-Vk refactoring, descriptor sets are now generally passed via copy. Concretely, they are passed around via std::vector<descriptor_set>.

descriptor_set stores several members of the kind std::vector<...>. Copying them might introduce substantial and probably unnecessary overheard. It would most likely be beneficial to convert all of them into std::shared_ptr<std::vector<...>>.

Make descriptor_cache usable from parallel threads

The avk::descriptor_cache shall be usable from parallel threads. This is partly already prepared by the means of different descriptor_pools being used from different threads (see: std::unordered_map<std::thread::id, std::vector<std::weak_ptr<descriptor_pool>>> mDescriptorPools; and avk::descriptor_cache::get_descriptor_pool_for_layouts).

What needs to be added is a std::mutex that synchronizes access.

Add std::mutex to class descriptor_cache
Use it in member methods that modify members or access members that could be modified from parallel threads (best synchronize using std::scoped_lock<std::mutex>)

To test this issue, cg-tuwien/Gears-Vk#41 would have to be implemented.

Definition of done:

std::mutex-based synchronization has been added to avk::descriptor_cache::get_descriptor_pool_for_layouts.
It has been checked if further synchronization is required in the methods of avk::descriptor_cache and if so, the necessary measures have been taken.
Usage of the descriptor_cache from parallel threads has been tested (e.g. by using a parallel_invoker)
All modified functions/methods are well documented and the Contribution Guidelines have been followed.

Filling staging buffers should not be waited-idle

...instead, sync::not_required() is appropriate!

Definition of done:

In all overloads of buffer_t::fill
Wherever else staging buffers are used

Make Auto-Vk compatible with Vulkan 1.1

Use #if VK_HEADER_VERSION > 141 or similar preprocessor statements to make Auto-Vk compatible with SDK version 1.1. Some features (e.g. the VK_KHR_ray_tracing extension) have been added with Vulkan 1.2 and were not available before

C++20: Replace all SFINAE with Concepts

Concepts are a new feature introduced into C++20. Make yourself familiar with the topic. You can get an overview of it in the following video:

C++20: An (Almost) Complete Overview - Marc Gregoire - CppCon 2020 starting at 18:57

At many places in the framework, SFINAE are used which basically serve the same purpose, but are more complicated to use. The SFINAE classes are the following:

class is_dereferenceable
class has_resize
class has_size_and_iterators
class has_nested_value_types

Definition of done:

Search for all usages of the SFINAE classes mentioned above and search for all usages of std::enable_if. Replace them with Concepts => i.e. use the newly introduced C++20 requires keyword.
Delete all the SFINAE classes mentioned above from the codebase.
For each refactored function/method, make sure it is properly documented and the Contribution Guidelines have been followed.

1080 vk

Eliminate avk::sync

...and replace with command/commands/work.

nuff said

Support offsets for vertex buffers

For vkCmdBindVertexBuffers, which is invoked from the following methods:

command_buffer_t::draw_indexed
command_buffer_t:draw_vertices,

for each vertex buffer passed to those methods, an offset of 0 is set. There should be other offsets possible. But where to define those offsets? Meta data does not seem right. => Probably in the input_description.

C++20: Possible to use homogeneous variadic function parameters?

Some functions/methods are taking an arbitrary number of arguments that should all be of the same type. An example of such a method is command_buffer_t::draw_indexed. It accepts any type, but optimally, it would only accept any number of arguments each of the SAME type, namely const buffer_t&.

Evaluate if it is possible (with C++20 ?!) to specify that all variadic arguments shall have the same type!

Article that describes this issue in general: Fluent C++: How to Define a Variadic Number of Arguments of the Same Type
C++ Proposal: Homogeneous variadic function parameters

In the case of command_buffer_t::draw_indexed, the big advantage would be that -- for instance in the model_loader example -- the buffers could not only be passed like follows:

cmdbfr->draw_indexed(
	*drawCall.mIndexBuffer,
	*drawCall.mPositionsBuffer, *drawCall.mTexCoordsBuffer, *drawCall.mNormalsBuffer
);

but also without explicitly invoking ak::owning_resource::operator* i.e. like follows:

cmdbfr->draw_indexed(
	drawCall.mIndexBuffer,
	drawCall.mPositionsBuffer, drawCall.mTexCoordsBuffer, drawCall.mNormalsBuffer
);

AND even a mixture would be possible because the implicit cast operator ak::owning_resource::operator const T& would be invoked automatically by the compiler. The following invocation would then also be viable:

cmdbfr->draw_indexed(
	drawCall.mIndexBuffer,
	*drawCall.mPositionsBuffer, drawCall.mTexCoordsBuffer, *drawCall.mNormalsBuffer
);

...or any other combination of passing const owning_resource<buffer_t>& or const buffer_t& for that matter since all would be cast to const buffer_t& since the method declaration would state the type explicitly.

Initializer Lists?
An alternative would maybe be to look into using std::initializer_list but it would have to capture references and pass them on. Not sure if would lead to nice syntax. In the optimal case, with an std::initializer_list, the following might be possible:

cmdbfr->draw_indexed(
	drawCall.mIndexBuffer,
	{ *drawCall.mPositionsBuffer, drawCall.mTexCoordsBuffer, *drawCall.mNormalsBuffer }
);

For reading back data to the host, a barrier is required

... and avk::read does not perform such a barrier. Not sure yet, if this issue is a bug or an enhancement, though -- i.e., if it is the responsibility of avk::read to perform the barrier or if it is the user's. But this should be investigated.

Anyways, Vulkan synchronization examples have the following example:

CPU read back of data written by a compute shader
This example shows the steps required to get data written to a buffer by a compute shader, back to the CPU.

vkCmdDispatch(...);

VkMemoryBarrier2KHR memoryBarrier = {
  ...
  .srcStageMask = VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT_KHR,
  .srcAccessMask = VK_ACCESS_2_SHADER_WRITE_BIT_KHR,
  .dstStageMask = VK_PIPELINE_STAGE_2_HOST_BIT_KHR,
  .dstAccessMask = VK_ACCESS_2_HOST_READ_BIT_KHR};

VkDependencyInfoKHR dependencyInfo = {
    ...
    1,              // memoryBarrierCount
    &memoryBarrier, // pMemoryBarriers
    ...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

vkEndCommandBuffer(...);

vkQueueSubmit2KHR(..., fence); // Submit the command buffer with a fence

Add support for host-builds and "array of pointers" when building/updating real-time ray tracing acceleration structures

Currently, the build/update methods of both, class bottom_level_acceleration_structure_t and class top_level_acceleration_structure_t, support only "pointer to an array"-style input.

Add support for:

"array of pointers" input to bottom level acceleration structure builds/updates
"array of pointers" input to top level acceleration structure builds/updates
Host-builds of bottom level acceleration structures
Host-builds of top level acceleration structures

The relevant code that needs to be updated/extended in order to support these different built/update types is contained in the methods:

More information about "array of pointers" input format can be found here: VkAccelerationStructureBuildGeometryInfoKHR => the description of VkBool32 geometryArrayOfPointers;

More information about host builds of acceleration structures can be found here: 36.5. Host Acceleration Structure Operations

Definition of done:

All the new types of acceleration structure builds/updates mentioned above have been implemented
The interfaces to the methods mentioned above have been updated, and they are easy, convenient, and safe to use.
There should be an example application that lets users choose between all the different types of builds/updates of acceleration structures. This functionality shall either be integrated into an entirely new ray tracing example application or it has been added to an existing example application.
Out of curiosity, a quick small test has been made, measuring the build time differences of ~10,000 host builds compared to ~10,000 device builds.
It has been determined if the assert(sizeof(VkAabbPositionsKHR) == result.mSizeOfOneElement); in buffer_meta.hpp#L833 is really appropriate.
It has been determined if the assert(sizeof(VkAccelerationStructureInstanceKHR) == result.mSizeOfOneElement); in buffer_meta.hpp#L912 is really appropriate.
All modified or new methods, as well as the example application are well documented and the Contribution Guidelines have been followed.

Some examples to get auto vk running for people wanting to use this instead of auto-vk toolkit

So I have encountered difficulties when using this library. To be specific, I find it difficult integrating this into my project. I think the lack of examples and documentation probably pushes people away from auto vk, even though it looks like a really nice library.

Remove CPPLINQ

CPPLINQ must be removed due to an incompatible license. Sad story.

However, there is the perfect replacement for it: C++20 Ranges

A quick overview about C++20 ranges is given in the following video:

C++20: An (Almost) Complete Overview - Marc Gregoire - CppCon 2020 at time 09:05.

Definition of done:

All CPPLINQ code has been completely removed from Auto-Vk and also from Gears-Vk. CPPLINQ code can be identified by looking where operator>> has been used (probably better to search for usages of CPPLINQ's from, or just compile after removing it and fix the errors), ~~that is, for instance in avk.cpp#L3952, avk.cpp#L4100, avk.cpp#L4122, avk.cpp#L4161, avk.cpp#L4181,~~
The CPPLINQ header has been removed from include/
The code still works after the change (most importantly creating graphics pipelines, because there CPPLINQ was used the most (only there?))
All parts of the code that were refactored are well documented and the Contribution Guidelines have been followed.

README states that Auto-Vk requires only a C++17 compiler, but it uses C++20 features

std::basic_string<CharT,Traits,Allocator>::ends_with (used in avk.cpp) is a C++20 feature.

Double check buffer_view and buffer_view_descriptor bindings

There has been some refactoring w.r.t. the binding of buffer_views:

class buffer_view_descriptor was introduced
as_uniform_texel_buffer and as_storage_texel_buffer has been removed from buffer
as_uniform_texel_buffer_view and as_storage_texel_buffer_view have been added to buffer_view
as_uniform_texel_buffer_views and as_storage_texel_buffer_views have been adapted accordingly, and so has been struct binding_data and all associated functions/methods.

It is a bit unclear if the structure is flawless/faultless.

Double check the actual descriptor infos that are written, where the data is stored, how the data is gathered, and if the vk::BufferView bindings are created properly.

This can best be investigated in the "ray_tracing_triangle_meshes" examples, because it uses buffer_views. See if everything still works, then bind explicitly using as_uniform_texel_buffer_views and also try to bind as_storage_texel_buffer_views (that will need some adaption in shaders).

Use Vulkan Memory Allocator (VMA) from the Vulkan SDK

VMA is included in the most recent Vulkan SDKs! We should (probably?!) use that version instead of the one which comes bundled with this repository: see file vk_mem_alloc.h

Make the necessary configuration changes!

Double check `input_binding_to_location_mapping`

Something seems to be confusing about

	struct input_binding_to_location_mapping
	{
		vertex_input_buffer_binding mGeneralData;
		buffer_element_member_meta mMemberMetaData;
	};

Why are the parameters stride on the one hand, and offset and format on the other hand distributed to different structs?

But maybe it's okay, because that distribution is also present when compiling the configuration for the graphics pipeline in graphics_pipeline root::create_graphics_pipeline => steps 1 and 2, when the vk::VertexInputBindingDescription and vk::VertexInputAttributeDescription elements are compiled.

Everything is working, so probably it's okay.
But just double check everything once again; especially if the data structures in input_binding_to_location_mapping are okay as they are or if they could maybe be optimized somehow.

cg-tuwien / auto-vk Goto Github PK

auto-vk's People

Contributors

Stargazers

Watchers

Forkers

auto-vk's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs