General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Currently allocating tensors requires passing a std::vector<> to initialize it. This ticket is to add a possibility to create tensors by just passing a size of it.
If I understand correctly, this puts a barrier between a memory transfer operation (host to GPU) and shader read operations.#
Two things are uncertain about this approach:
When chaining two operations together when the 2nd consumes the tensor(s) produced by the previous one, the barrier access and stages flags should probably be:
Current implementation inserts a barrier for each tensor of the algorithm regardless of whether it is read by the kernel, or actually written only. I suppose, for the write-only tensors the barrier should not be needed.
The original implementation of Algorithm::destroy() does not free the push and specialization constants data - the code is commented.
This however leads to a memory leak as the data do not seem to be freed as described in the comment.