Comments (19)
As of the latest commit, the VulkanRenderer is now in Beta. All the critical features are done and function, but there are still some FIXMEs and a handful of places where performance improvements are possible, but at this point the renderer should Just Work with XNA games now. Instead of having a testing period and locking down a release date, we'll try and get it to where it's indisputably faster than OpenGL, then lock down a Release Candidate and final testing period before making it the default renderer.
from fna3d.
At this point we now have Vulkan running pretty fast with all the uninitialized memory issues handled by Valgrind - there's really only one thing left for us to assess: Moving Vulkan command recording and queue submissions to a thread. This is relatively straightforward on paper, as it should just be a matter of moving VULKAN_INTERNAL_RecordCommands, Flush, and FlushAndPresent on to a dedicated thread which runs on a single semaphore. Adding threads is always harder than that, but as far as existing structure is concerned we've already set this up really really well. Once this works, it is expected that most games will be at least as fast when running with Vulkan, vs running with OpenGL.
SpriteBatchTest shows the issue pretty clearly when you crank up the sprite count, as it's definitely faster to submit commands, but when it's time to present the frame it chokes pretty hard and it doesn't show up in the frame timing at all since it's all squished into Present() right now. Games like FEZ show very stable framerates but still fall behind GL by a pretty sizeable percentage. Games are easiest to compare by using GALLIUM_HUD=fps
and VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay
.
from fna3d.
The latest VulkanRenderer now supports proper command buffer generation and submission and asynchronous presentation. As a result, performance is pretty damn good and beats OpenGL in a whole lot of cases now.
The downside is that threading support had to get rolled back, but we expect this to be easier to fix than last time. Once we have our stress tests working, that'll be when we start running through the catalog to make sure everything works, and provided nothing goes wrong (ha) we'll mark this as a release candidate and eventually move it to the top of the driver list. Most of the TODOs/FIXMEs are minor, but they may be important for optimization later on.
One last thing we need that's not critical for testing but is critical for release is PrepareWindowAttributes - we have it falling back when we don't have a loader, but we also need to fall back when no hardware supports the extensions we need. I'm hoping we can mostly rip off device creation here without creating a device, but we'll see...
from fna3d.
Currently being done in #13
from fna3d.
Remaining tasks are being tracked here:
thatcosmonaut#9
from fna3d.
TODO list, since we're coming up to the first testable version...
SNIP Moved to the OP!
from fna3d.
Some notes for point rendering... in glspirv
we have this patching setup when linking the shader program:
https://github.com/FNA-XNA/MojoShader/blob/upstream/mojoshader_opengl.c#L477
It's not too complicated but it does mean we have to hold off on building the shader module until we know that point sprite rendering is in use. This may end up being a part of the shader linker that solves the problem of linking vertex/pixel shader semantics together, instead of this disaster I came up with:
https://github.com/FNA-XNA/MojoShader/blob/upstream/profiles/mojoshader_profile_spirv.c#L1513
from fna3d.
Some notes for threading support... we know two major limitations to Vulkan threading:
- Command buffers/pools require a mutex when accessed from multiple threads
- Certain commands cannot be executed during a render pass
To me, this says "we need a mutex that locks on both a render pass and on vkCmd execution." I'm thinking we have a mutex called "asyncCommandLock" that not only wraps around vkCmd calls, but also blocks for the entire duration of a render pass and for the entire duration that the command buffer is not recording. This sounds really broad, but this isn't as bad as it might seem:
- For single-threaded games it means nothing, as the mutex will never be fought for
- For threaded loading it means it will only ever be rendering or loading, never both at once
- For threaded drawing, where loading is done on the main thread, point 2 is still true
This makes it about as deferred as D3D11, where we have a big lock on the DeviceContext.
Where it gets a little weird is the function dependency tree... these are all the functions that record a command, and the functions that use them:
INTERNAL_BufferMemoryBarrier
INTERNAL_GetTextureData
SetTextureData*
INTERNAL_ImageMemoryBarrier
INTERNAL_GetTextureData
INTERNAL_CreateFauxBackbuffer
INTERNAL_BeginRenderPass
INTERNAL_OutsideRenderPassClear
SwapBuffers
VerifySampler
ResolveTarget
SetTextureData*
GenColorRenderbuffer
GenDepthStencilRenderbuffer
CreateDevice
INTERNAL_GetTextureData
GetTextureData2D
ReadBackbuffer
GetTextureDataCube
INTERNAL_SetViewportCommand
INTERNAL_BeginRenderPass
SetViewport
INTERNAL_SetScissorRectCommand
INTERNAL_BeginRenderPass
SetScissorRect
ApplyRasterizerState
INTERNAL_SetStencilReferenceValueCommand
INTERNAL_BeginRenderPass
SetReferenceStencil
INTERNAL_SetDepthBiasCommand
ApplyRasterizerState
INTERNAL_BindPipeline
Draw*
INTERNAL_BeginRenderPass
INTERNAL_UpdateRenderPass
Clear
Draw*
INTERNAL_EndPass
INTERNAL_UpdateRenderPass
Clear
Draw*
INTERNAL_RenderPassClear
Clear
INTERNAL_OutsideRenderPassClear
Clear
INTERNAL_BindResources
Draw*
SwapBuffers
Draw*
SetBlendFactor
SetBlendState
ApplyVertexBufferBindings
ResolveTarget
SetTextureData*
QueryBegin
QueryEnd
SetStringMarker
We have to make sure to lock for the duration that a single function submits commands, in addition to locking when a pass is active and a buffer isn't recording. This will prevent commands from getting mixed up when a thread tries to poke a command in during another function's execution (if that matters, it may not?).
EDIT: vkCmd limitations:
Clear/Draw, INSIDE ONLY:
vkCmdBindDescriptorSets
vkCmdBindIndexBuffer
vkCmdBindPipeline
vkCmdBindVertexBuffers
vkCmdClearAttachments
vkCmdClearColorImage
vkCmdClearDepthStencilImage
vkCmdDraw
vkCmdDrawIndexed
vkCmdSetBlendConstants
vkCmdSetDepthBias
ResolveTarget, SwapBuffers:
vkCmdBlitImage
SetTextureData, OUTSIDE ONLY:
vkCmdCopyBufferToImage
GetTextureData, OUTSIDE ONLY:
vkCmdCopyImageToBuffer
Wherever, really:
vkCmdPipelineBarrier
vkCmdSetScissor
vkCmdSetViewport
vkCmdSetStencilReference
vkCmdInsertDebugUtilsLabelEXT
BeginQuery, restarts pass:
vkCmdResetQueryPool
vkCmdBeginQuery
EndQuery, wherever:
vkCmdEndQuery
LOCKS THE PASS:
vkCmdBeginRenderPass
UNLOCKS THE PASS:
vkCmdEndRenderPass
from fna3d.
Some more threading notes:
Previously we tried decoupling render and data command buffers, but we didn't quite get it working because there's a bit more to it than just having separate buffers to avoid render pass errors. It may be worth trying this again, but with a better idea of how to synchronize the buffers:
Render Command Buffer
- Locks for the following commands:
- vkCmdBindDescriptorSets
- vkCmdBindPipeline
- vkCmdClearAttachments
- vkCmdClearColorImage
- vkCmdClearDepthStencilImage
- vkCmdDraw
- vkCmdDrawIndexed
- vkCmdSetBlendConstants
- vkCmdSetDepthBias
- vkCmdBlitImage
- vkCmdSetScissor
- vkCmdSetViewport
- vkCmdSetStencilReference
- vkCmdInsertDebugUtilsLabelEXT
- vkCmdResetQueryPool
- vkCmdBeginQuery
- vkCmdEndQuery
- vkCmdBeginRenderPass
- vkCmdEndRenderPass
- Locks and submits the data command buffer for the following commands:
- vkCmdBindIndexBuffer
- vkCmdBindVertexBuffers
- VULKAN_INTERNAL_ImageMemoryBarrier
Data Command Buffer
- Locks for the following commands:
- vkCmdCopyBufferToImage
- VULKAN_INTERNAL_AllocateSubBuffer
- VULKAN_INTERNAL_BufferMemoryBarrier
- Locks and submits the render command buffer for the following commands:
- vkCmdCopyImageToBuffer
- VULKAN_INTERNAL_Stall
When to Lock Command Buffers
It's not enough to lock just for vkCmd operations - we need to lock for whole blocks of commands, as several commands may need to exist together for the queue to work (for example, render passes need to have both a Begin and End, and SetData needs one lock/unlock pair for both the buffer memory barriers and the vkCmdCopyImageToBuffer in between). Because of this, we may need to refactor the internal functions in such a way that it's clear what is a data command and what is a render command, while also making it clear when we're locking the command buffer to submit commands and when we're locking to submit to the queue. This is particularly bad for image memory barriers and pipeline stalls, where we typically have render and data operations happening all at once.
from fna3d.
Yup, even more threading notes:
Vulkan commands called by every driver function:
VULKAN_SwapBuffers
vkCmdPipelineBarrier
vkCmdEndRenderPass
vkCmdBlitImage
VULKAN_Draw*
vkCmdEndRenderPass
vkCmdPipelineBarrier
vkCmdBeginRenderPass
vkCmdSetViewport
vkCmdSetScissor
vkCmdSetStencilReferenceValue
vkCmdSetBlendConstants
vkCmdSetDepthBias
vkCmdBindPipeline
vkCmdBindDescriptorSets
vkCmdBindIndexBuffer
vkCmdDrawIndexed
vkCmdDraw
VULKAN_SetViewport
vkCmdSetViewport
VULKAN_SetScissorRect
vkCmdSetScissor
VULKAN_SetBlendFactor
vkCmdSetBlendConstants
VULKAN_SetReferenceStencil
vkCmdSetStencilReference
VULKAN_SetBlendState
vkCmdSetBlendConstants
VULKAN_SetDepthStencilState
vkCmdSetStencilReference
VULKAN_ApplyRasterizerState
vkCmdSetScissor
vkCmdSetDepthBias
VULKAN_VerifySampler
vkCmdPipelineBarrier
vkCmdEndRenderPass
VULKAN_VerifyVertexSampler
vkCmdPipelineBarrier
vkCmdEndRenderPass
VULKAN_ApplyVertexBufferBindings
vkCmdBindVertexBuffers
VULKAN_ResolveTarget
vkCmdEndRenderPass
vkCmdPipelineBarrier
vkCmdBlitImage
VULKAN_ResetBackbuffer
vkCmdEndRenderPass
vkCmdPipelineBarrier
VULKAN_ReadBackbuffer
vkCmdEndRenderPass
vkCmdPipelineBarrier
vkCmdCopyImageToBuffer
VULKAN_SetTextureData
vkCmdEndRenderPass
vkCmdPipelineBarrier
vkCmdCopyBufferToImage
VULKAN_GetTextureData
vkCmdEndRenderPass
vkCmdPipelineBarrier
vkCmdCopyImageToBuffer
VULKAN_GenColorRenderbuffer
vkCmdEndRenderPass
vkCmdPipelineBarrier
VULKAN_GenDepthStencilRenderbuffer
vkCmdEndRenderPass
vkCmdPipelineBarrier
VULKAN_SetVertex/IndexBufferData
vkCmdEndRenderPass
vkCmdPipelineBarrier
VULKAN_GetVertex/IndexBufferData
vkCmdEndRenderPass
vkCmdPipelineBarrier
VULKAN_QueryBegin
vkCmdEndRenderPass
vkCmdResetQueryPool
vkCmdBeginQuery
VULKAN_QueryEnd
vkCmdEndQuery
VULKAN_SetStringMarker
vkCmdInsertDebugUtilsLabelEXT
from fna3d.
Mostly just leaving this as a note for my future self. When we get to testing games, there are a few cases that will need special attention, beyond the standard "does it look ok and not crash" testing. In particular...
- Little Racers Street: We'll need to do a soak test in the garage. That scene of the game leaks memory like crazy, so it's a good stress test for both MojoShader uniform buffers and FNA3D-side buffer allocation.
- FEZ: We need to check the ending cutscene to make sure lines are being appropriately rendered, since we've run into some issues with that in the past.
- FEZ: On MoltenVK there's a weird blip immediately after the initial loading screen, where the screen goes black except for a small sprite in the middle. I'm not sure if this is an MVK bug or we're accidentally tripping some undefined behavior that's just happened to work correctly on the other devices we've tested.
I'm sure there are other special cases, these are just the ones at the forefront of my memory.
from fna3d.
Thankfully 2 won't happen, as it was a spec violation regarding GL_POINT_SPRITE, a state that's not variable in Vulkan and can't have the same issues like radeonsi/iris did.
3 I remember apitracing but I forget what the end result was... a missing clear maybe? It's probably in the Discord logs somewhere.
from fna3d.
The latest commit should have Vulkan threading in a working state. But, as always, you should run this against any threaded work you have just in case, as behavior tends to be highly unpredictable for XNA games.
In my own testing this works well (Streets of Rage 4 is running fine with it), though I'm now realizing that we're having to look at memory management again as it uses well over twice the memory that GL does, which is fine if you have a lot of VRAM but not so fine if you don't (SOR4 bails at loading a single level on a 2GB card). This could possibly be influenced by the threads (maybe some contention with the memory allocator?) but we'll need to investigate that either way.
from fna3d.
We just pushed the commits to FNA and FNA3D that resolve all the known issues we have - the remaining FIXMEs/TODOs in the source aren't critical for release.
At this point we can finally start doing mass testing of games. I happen to be releasing Streets of Rage 4 really really soon, so I'll be monitoring that...
from fna3d.
Testing is moving along slowly but surely - one major detail though: Has anyone checked MoltenVK on iOS? Even if we just test Rogue Legacy that's fine, I just want to be sure we're not losing anything by fully deprecating Metal when the Vulkan work is done.
from fna3d.
Oh I meant to ask, on Mono 4 games like Sequence and Blueberry Garden I get the following error when I drop in a new FNA.dll and fnalibs:
❯ FNA3D_FORCE_DRIVER=Vulkan ./Sequence.bin.x86_64
Stacktrace:
at SDL2.SDL.SDL_SetHintWithPriority (string,string,SDL2.SDL/SDL_HintPriority) <0x0008b>
at Microsoft.Xna.Framework.SDL2_FNAPlatform.ProgramInit (Microsoft.Xna.Framework.LaunchParameters) <0x00593>
at Microsoft.Xna.Framework.FNAPlatform..cctor () <0x00c23>
at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>
at Microsoft.Xna.Framework.Game..ctor () <0x0020f>
at Sequence.Game1..ctor () <0x015e7>
at Sequence.Program.Main (string[]) <0x0004b>
at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>
Any idea how I can work around this?
from fna3d.
I forget what it is but I think it has to do with the compiler vs runtime? I fixed it locally either by using an older compiler or updating the runtime, and Blueberry needs MONO_FORCE_COMPAT=1.
from fna3d.
We just got the last major issue fixed in the Vulkan renderer - I'll be doing a full run of SOR4 on a 2GB card and if all is well we'll be in release candidate for 21.02. If I don't even run into memory issues I'll also take out the VULKAN_MEMORY_REQUIREMENT
variable.
from fna3d.
We're at the point where this is actively shipping in stuff and works well enough, so at this point I'm going to mark this as "stable, but not default". For making this default, see #72.
from fna3d.
Related Issues (20)
- standalone example? HOT 1
- Update Xcode-iOS to remove Metal/OpenGL, add Vulkan HOT 1
- Remove Wayland alpha hack after Mesa 21.3 release HOT 1
- Add support for multiple windows (for real) HOT 4
- Latest FNA3D dll crashes on graphics device creation and destruction HOT 12
- FNA3D support for Retina not working for Vulkan/Metal HOT 4
- Remove ES3 option, restrict use to WebGL2
- Add support for dxvk-native HOT 5
- Update MojoShader to fix build issue with *BSD HOT 1
- Add GitHub Actions HOT 2
- Add visualc-gdk HOT 1
- [GDK] Add replay tool to visualc-gdk HOT 1
- Implement SDLRenderer with SDL_gpu
- Excessive render pass restarts for each draw HOT 1
- Move Vulkan defrag state into FNA3D_Defragmenter HOT 1
- Swapped ternary in D3D11 causes SW renderer to be used by default HOT 1
- Switching from windowed to full screen results in incorrect viewport on macs with retina displays HOT 4
- Vulkan texture uploads for mipped compressed textures produce validation errors HOT 2
- Vulkan: image layouts incorrect when using mipped render targets
- Add support for HDR swapchains HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fna3d.