Comments (7)
this was the year!!! thanks @benvanik
from iree.
1+ years on and progress is being made on this. New threading system will have statistics built-in and help me define the API we want to expose for this (likely something similar to feedback buffers).
from iree.
Maybe this year? :P
I think the first step is to get tracy recording Vulkan times and the task system appearing as a dedicated execution context in tracy as well. That will unblock performance investigations. I'm still not sure we have solid use cases for programmatic fine-grained profiling yet though those may be useful in parameter searches (though with all the usual caveats of applicability of timings they may not be).
from iree.
Do you have some pointers on what would be needed to do to have tracy recoding Vulkan times and what would the result look like? On Android we have pretty much no useful tools for profiling within a command buffer. I hacked up some timestamp queries to be able to get a breakdown for mobile Bert but it is obviously not a sustainable solution. So this is probably our best bet to be able to do at least some basic profiling on phones and I can help out with implementation.
from iree.
I've got an old set of changes that enables Vulkan in tracy that I'll revive and get working. The issue I ran into last time (and what prevented me from committing it) was that tracy could not at the time render disjoint or overlapping zones, meaning that if there was any asynchronous or overlapping execution it would pad every zone out such that they were perfectly nested. I remember seeing that fiber support was getting added to tracy (in some form), and if it has landed then we can use that to allow the out-of-order zones. Otherwise, the tracy support only produces useful results if there's single dispatches between global barriers such that no two dispatches ever overlap and that's not very useful for anything but microbenchmarks (which could still be useful, but not general-purpose and with all the caveats of applicability microbenchmarks on GPUs have).
from iree.
Fiber support doesn't seem to have landed, but what we really want is wolfpld/tracy#149 - that's how I accomplished this in wtf and it worked really well. Unfortunately it looks like it's not planned work so I'm not sure what to do there.
With the new feature allowing multiple GPU context tracks I can at least split up queues such that queues can overlap, but within each queue the numbers you'll be getting will not account for overlap :(
from iree.
I'm going to avoid doing any HAL work here and instead just add Vulkan support directly to the Vulkan HAL. When we want programmatic queries we'll need to add explicit APIs to the HAL but for just seeing timing in tracy we can avoid that.
from iree.
Related Issues (20)
- CUDA transform dialect reduction tests are flakey. HOT 1
- [HIP] Support native executable and cache HOT 1
- Unable to compile SHARK HOT 2
- The polynomial approximation for f16 math.powf generates NAN and INF HOT 8
- Issue with Only Vulkan-Related Test Codes Failing on Android Device HOT 19
- `arith.truncf: f32 -> bf16` is lowered to "software" bf16 implementation HOT 1
- Vulkan validation layer errors in tests / on different devices
- [CodeGen] GPU Subgroup Reduction Pipeline for MatVec HOT 1
- Bad unrolling/vectorization for `linalg.generic` implementing group-dequantization-reduction
- Using iree-compile tool appears "Segmentation fault (core dumped)" for cuda target HOT 1
- Regressions on some models with data-tiling due to #15858
- [Regression][CPU] EfficientNetV2STF dt-only regresses total dispatch sizes after #15972
- Integrating IREE with Android Studio: Seeking Assistance with File Selection
- `flow.tensor.trace` custom parser does not round-trip for multiple arguments HOT 2
- [LLVMCPU] Fix linking between modules with different target attributes
- [LLVMCPU] Bad codegen for 1D layernorm on riscv64 HOT 1
- Assertion failed in VM flatbuffer serialization with empty NameLoc HOT 1
- Deprecated use of `dyn_cast` to excise
- -Werror=address failure in LLVMCPU/ConvertToLLVM.cpp on an aarch64 builder HOT 1
- iree_cc_library INCLUDES directories as -isystem, which disables warnings HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iree.