GithubHelp home page GithubHelp logo

Implement VulkanRenderer about fna3d HOT 19 CLOSED

fna-xna avatar fna-xna commented on May 28, 2024 2
Implement VulkanRenderer

from fna3d.

Comments (19)

flibitijibibo avatar flibitijibibo commented on May 28, 2024 2

As of the latest commit, the VulkanRenderer is now in Beta. All the critical features are done and function, but there are still some FIXMEs and a handful of places where performance improvements are possible, but at this point the renderer should Just Work with XNA games now. Instead of having a testing period and locking down a release date, we'll try and get it to where it's indisputably faster than OpenGL, then lock down a Release Candidate and final testing period before making it the default renderer.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024 2

At this point we now have Vulkan running pretty fast with all the uninitialized memory issues handled by Valgrind - there's really only one thing left for us to assess: Moving Vulkan command recording and queue submissions to a thread. This is relatively straightforward on paper, as it should just be a matter of moving VULKAN_INTERNAL_RecordCommands, Flush, and FlushAndPresent on to a dedicated thread which runs on a single semaphore. Adding threads is always harder than that, but as far as existing structure is concerned we've already set this up really really well. Once this works, it is expected that most games will be at least as fast when running with Vulkan, vs running with OpenGL.

SpriteBatchTest shows the issue pretty clearly when you crank up the sprite count, as it's definitely faster to submit commands, but when it's time to present the frame it chokes pretty hard and it doesn't show up in the frame timing at all since it's all squished into Present() right now. Games like FEZ show very stable framerates but still fall behind GL by a pretty sizeable percentage. Games are easiest to compare by using GALLIUM_HUD=fps and VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024 2

The latest VulkanRenderer now supports proper command buffer generation and submission and asynchronous presentation. As a result, performance is pretty damn good and beats OpenGL in a whole lot of cases now.

The downside is that threading support had to get rolled back, but we expect this to be easier to fix than last time. Once we have our stress tests working, that'll be when we start running through the catalog to make sure everything works, and provided nothing goes wrong (ha) we'll mark this as a release candidate and eventually move it to the top of the driver list. Most of the TODOs/FIXMEs are minor, but they may be important for optimization later on.

One last thing we need that's not critical for testing but is critical for release is PrepareWindowAttributes - we have it falling back when we don't have a loader, but we also need to fall back when no hardware supports the extensions we need. I'm hoping we can mostly rip off device creation here without creating a device, but we'll see...

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Currently being done in #13

from fna3d.

thatcosmonaut avatar thatcosmonaut commented on May 28, 2024

Remaining tasks are being tracked here:
thatcosmonaut#9

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

TODO list, since we're coming up to the first testable version...

SNIP Moved to the OP!

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Some notes for point rendering... in glspirv we have this patching setup when linking the shader program:

https://github.com/FNA-XNA/MojoShader/blob/upstream/mojoshader_opengl.c#L477

It's not too complicated but it does mean we have to hold off on building the shader module until we know that point sprite rendering is in use. This may end up being a part of the shader linker that solves the problem of linking vertex/pixel shader semantics together, instead of this disaster I came up with:

https://github.com/FNA-XNA/MojoShader/blob/upstream/profiles/mojoshader_profile_spirv.c#L1513

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Some notes for threading support... we know two major limitations to Vulkan threading:

  • Command buffers/pools require a mutex when accessed from multiple threads
  • Certain commands cannot be executed during a render pass

To me, this says "we need a mutex that locks on both a render pass and on vkCmd execution." I'm thinking we have a mutex called "asyncCommandLock" that not only wraps around vkCmd calls, but also blocks for the entire duration of a render pass and for the entire duration that the command buffer is not recording. This sounds really broad, but this isn't as bad as it might seem:

  • For single-threaded games it means nothing, as the mutex will never be fought for
  • For threaded loading it means it will only ever be rendering or loading, never both at once
  • For threaded drawing, where loading is done on the main thread, point 2 is still true

This makes it about as deferred as D3D11, where we have a big lock on the DeviceContext.

Where it gets a little weird is the function dependency tree... these are all the functions that record a command, and the functions that use them:

INTERNAL_BufferMemoryBarrier
	INTERNAL_GetTextureData
	SetTextureData*
INTERNAL_ImageMemoryBarrier
	INTERNAL_GetTextureData
	INTERNAL_CreateFauxBackbuffer
	INTERNAL_BeginRenderPass
	INTERNAL_OutsideRenderPassClear
	SwapBuffers
	VerifySampler
	ResolveTarget
	SetTextureData*
	GenColorRenderbuffer
	GenDepthStencilRenderbuffer
	CreateDevice
INTERNAL_GetTextureData
	GetTextureData2D
		ReadBackbuffer
	GetTextureDataCube
INTERNAL_SetViewportCommand
	INTERNAL_BeginRenderPass
	SetViewport
INTERNAL_SetScissorRectCommand
	INTERNAL_BeginRenderPass
	SetScissorRect
	ApplyRasterizerState
INTERNAL_SetStencilReferenceValueCommand
	INTERNAL_BeginRenderPass
	SetReferenceStencil
INTERNAL_SetDepthBiasCommand
	ApplyRasterizerState
INTERNAL_BindPipeline
	Draw*
INTERNAL_BeginRenderPass
	INTERNAL_UpdateRenderPass
		Clear
		Draw*
INTERNAL_EndPass
	INTERNAL_UpdateRenderPass
		Clear
		Draw*
INTERNAL_RenderPassClear
	Clear
INTERNAL_OutsideRenderPassClear
	Clear
INTERNAL_BindResources
	Draw*
SwapBuffers
Draw*
SetBlendFactor
	SetBlendState
ApplyVertexBufferBindings
ResolveTarget
SetTextureData*
QueryBegin
QueryEnd
SetStringMarker

We have to make sure to lock for the duration that a single function submits commands, in addition to locking when a pass is active and a buffer isn't recording. This will prevent commands from getting mixed up when a thread tries to poke a command in during another function's execution (if that matters, it may not?).

EDIT: vkCmd limitations:

Clear/Draw, INSIDE ONLY:
vkCmdBindDescriptorSets
vkCmdBindIndexBuffer
vkCmdBindPipeline
vkCmdBindVertexBuffers
vkCmdClearAttachments
vkCmdClearColorImage
vkCmdClearDepthStencilImage
vkCmdDraw
vkCmdDrawIndexed
vkCmdSetBlendConstants
vkCmdSetDepthBias

ResolveTarget, SwapBuffers:
vkCmdBlitImage

SetTextureData, OUTSIDE ONLY:
vkCmdCopyBufferToImage

GetTextureData, OUTSIDE ONLY:
vkCmdCopyImageToBuffer

Wherever, really:
vkCmdPipelineBarrier
vkCmdSetScissor
vkCmdSetViewport
vkCmdSetStencilReference
vkCmdInsertDebugUtilsLabelEXT

BeginQuery, restarts pass:
vkCmdResetQueryPool
vkCmdBeginQuery

EndQuery, wherever:
vkCmdEndQuery

LOCKS THE PASS:
vkCmdBeginRenderPass

UNLOCKS THE PASS:
vkCmdEndRenderPass

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Some more threading notes:

Previously we tried decoupling render and data command buffers, but we didn't quite get it working because there's a bit more to it than just having separate buffers to avoid render pass errors. It may be worth trying this again, but with a better idea of how to synchronize the buffers:

Render Command Buffer

  • Locks for the following commands:
    • vkCmdBindDescriptorSets
    • vkCmdBindPipeline
    • vkCmdClearAttachments
    • vkCmdClearColorImage
    • vkCmdClearDepthStencilImage
    • vkCmdDraw
    • vkCmdDrawIndexed
    • vkCmdSetBlendConstants
    • vkCmdSetDepthBias
    • vkCmdBlitImage
    • vkCmdSetScissor
    • vkCmdSetViewport
    • vkCmdSetStencilReference
    • vkCmdInsertDebugUtilsLabelEXT
    • vkCmdResetQueryPool
    • vkCmdBeginQuery
    • vkCmdEndQuery
    • vkCmdBeginRenderPass
    • vkCmdEndRenderPass
  • Locks and submits the data command buffer for the following commands:
    • vkCmdBindIndexBuffer
    • vkCmdBindVertexBuffers
    • VULKAN_INTERNAL_ImageMemoryBarrier

Data Command Buffer

  • Locks for the following commands:
    • vkCmdCopyBufferToImage
    • VULKAN_INTERNAL_AllocateSubBuffer
    • VULKAN_INTERNAL_BufferMemoryBarrier
  • Locks and submits the render command buffer for the following commands:
    • vkCmdCopyImageToBuffer
    • VULKAN_INTERNAL_Stall

When to Lock Command Buffers

It's not enough to lock just for vkCmd operations - we need to lock for whole blocks of commands, as several commands may need to exist together for the queue to work (for example, render passes need to have both a Begin and End, and SetData needs one lock/unlock pair for both the buffer memory barriers and the vkCmdCopyImageToBuffer in between). Because of this, we may need to refactor the internal functions in such a way that it's clear what is a data command and what is a render command, while also making it clear when we're locking the command buffer to submit commands and when we're locking to submit to the queue. This is particularly bad for image memory barriers and pipeline stalls, where we typically have render and data operations happening all at once.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Yup, even more threading notes:

Vulkan commands called by every driver function:

  • VULKAN_SwapBuffers
    • vkCmdPipelineBarrier
    • vkCmdEndRenderPass
    • vkCmdBlitImage
  • VULKAN_Draw*
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
    • vkCmdBeginRenderPass
    • vkCmdSetViewport
    • vkCmdSetScissor
    • vkCmdSetStencilReferenceValue
    • vkCmdSetBlendConstants
    • vkCmdSetDepthBias
    • vkCmdBindPipeline
    • vkCmdBindDescriptorSets
    • vkCmdBindIndexBuffer
    • vkCmdDrawIndexed
    • vkCmdDraw
  • VULKAN_SetViewport
    • vkCmdSetViewport
  • VULKAN_SetScissorRect
    • vkCmdSetScissor
  • VULKAN_SetBlendFactor
    • vkCmdSetBlendConstants
  • VULKAN_SetReferenceStencil
    • vkCmdSetStencilReference
  • VULKAN_SetBlendState
    • vkCmdSetBlendConstants
  • VULKAN_SetDepthStencilState
    • vkCmdSetStencilReference
  • VULKAN_ApplyRasterizerState
    • vkCmdSetScissor
    • vkCmdSetDepthBias
  • VULKAN_VerifySampler
    • vkCmdPipelineBarrier
    • vkCmdEndRenderPass
  • VULKAN_VerifyVertexSampler
    • vkCmdPipelineBarrier
    • vkCmdEndRenderPass
  • VULKAN_ApplyVertexBufferBindings
    • vkCmdBindVertexBuffers
  • VULKAN_ResolveTarget
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
    • vkCmdBlitImage
  • VULKAN_ResetBackbuffer
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
  • VULKAN_ReadBackbuffer
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
    • vkCmdCopyImageToBuffer
  • VULKAN_SetTextureData
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
    • vkCmdCopyBufferToImage
  • VULKAN_GetTextureData
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
    • vkCmdCopyImageToBuffer
  • VULKAN_GenColorRenderbuffer
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
  • VULKAN_GenDepthStencilRenderbuffer
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
  • VULKAN_SetVertex/IndexBufferData
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
  • VULKAN_GetVertex/IndexBufferData
    • vkCmdEndRenderPass
    • vkCmdPipelineBarrier
  • VULKAN_QueryBegin
    • vkCmdEndRenderPass
    • vkCmdResetQueryPool
    • vkCmdBeginQuery
  • VULKAN_QueryEnd
    • vkCmdEndQuery
  • VULKAN_SetStringMarker
    • vkCmdInsertDebugUtilsLabelEXT

from fna3d.

TheSpydog avatar TheSpydog commented on May 28, 2024

Mostly just leaving this as a note for my future self. When we get to testing games, there are a few cases that will need special attention, beyond the standard "does it look ok and not crash" testing. In particular...

  1. Little Racers Street: We'll need to do a soak test in the garage. That scene of the game leaks memory like crazy, so it's a good stress test for both MojoShader uniform buffers and FNA3D-side buffer allocation.
  2. FEZ: We need to check the ending cutscene to make sure lines are being appropriately rendered, since we've run into some issues with that in the past.
  3. FEZ: On MoltenVK there's a weird blip immediately after the initial loading screen, where the screen goes black except for a small sprite in the middle. I'm not sure if this is an MVK bug or we're accidentally tripping some undefined behavior that's just happened to work correctly on the other devices we've tested.

I'm sure there are other special cases, these are just the ones at the forefront of my memory.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Thankfully 2 won't happen, as it was a spec violation regarding GL_POINT_SPRITE, a state that's not variable in Vulkan and can't have the same issues like radeonsi/iris did.

3 I remember apitracing but I forget what the end result was... a missing clear maybe? It's probably in the Discord logs somewhere.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

The latest commit should have Vulkan threading in a working state. But, as always, you should run this against any threaded work you have just in case, as behavior tends to be highly unpredictable for XNA games.

In my own testing this works well (Streets of Rage 4 is running fine with it), though I'm now realizing that we're having to look at memory management again as it uses well over twice the memory that GL does, which is fine if you have a lot of VRAM but not so fine if you don't (SOR4 bails at loading a single level on a 2GB card). This could possibly be influenced by the threads (maybe some contention with the memory allocator?) but we'll need to investigate that either way.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

We just pushed the commits to FNA and FNA3D that resolve all the known issues we have - the remaining FIXMEs/TODOs in the source aren't critical for release.

At this point we can finally start doing mass testing of games. I happen to be releasing Streets of Rage 4 really really soon, so I'll be monitoring that...

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

Testing is moving along slowly but surely - one major detail though: Has anyone checked MoltenVK on iOS? Even if we just test Rogue Legacy that's fine, I just want to be sure we're not losing anything by fully deprecating Metal when the Vulkan work is done.

from fna3d.

thatcosmonaut avatar thatcosmonaut commented on May 28, 2024

Oh I meant to ask, on Mono 4 games like Sequence and Blueberry Garden I get the following error when I drop in a new FNA.dll and fnalibs:

❯ FNA3D_FORCE_DRIVER=Vulkan ./Sequence.bin.x86_64 
Stacktrace:

  at SDL2.SDL.SDL_SetHintWithPriority (string,string,SDL2.SDL/SDL_HintPriority) <0x0008b>
  at Microsoft.Xna.Framework.SDL2_FNAPlatform.ProgramInit (Microsoft.Xna.Framework.LaunchParameters) <0x00593>
  at Microsoft.Xna.Framework.FNAPlatform..cctor () <0x00c23>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>
  at Microsoft.Xna.Framework.Game..ctor () <0x0020f>
  at Sequence.Game1..ctor () <0x015e7>
  at Sequence.Program.Main (string[]) <0x0004b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>

Any idea how I can work around this?

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

I forget what it is but I think it has to do with the compiler vs runtime? I fixed it locally either by using an older compiler or updating the runtime, and Blueberry needs MONO_FORCE_COMPAT=1.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

We just got the last major issue fixed in the Vulkan renderer - I'll be doing a full run of SOR4 on a 2GB card and if all is well we'll be in release candidate for 21.02. If I don't even run into memory issues I'll also take out the VULKAN_MEMORY_REQUIREMENT variable.

from fna3d.

flibitijibibo avatar flibitijibibo commented on May 28, 2024

We're at the point where this is actively shipping in stuff and works well enough, so at this point I'm going to mark this as "stable, but not default". For making this default, see #72.

from fna3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.