Comments (3)
Sure we are exploring CuTE, and we believe it's the best way to use TMA.
The main reason we are still sticking to custom implementation is we haven't figured out how to TMA for sparse memory loading (e.g. in page attention prefill). We also have some ongoing effort on supporting AMD GPUs, and I suppose porting cutlass 3.0 code to rocm might be hard (pls correct me if I'm wrong).
I'll gradually replace some of the existing code with higher level abstractions in the next few months, and yes we welcome your contributions.
from flashinfer.
@yzh119 Thanks for the response!
Is there an easy way to tweak the source install such that only a few of the kernels are compiled (e.g., prefill
+ decode
only, or some subset thereof)? I realize that there are certain environment variables that can be set to limit the template instantiations but haven't found a coarser-grained way of building only select kernel categories.
from flashinfer.
@yzh119 nevermind -- ended up stripping out certain kernels then using DEFINE
flags and torch.utils.cpp_extension.load
to jit-compile specific kernel instantiations.
from flashinfer.
Related Issues (20)
- 能否支持Volta/Tesla架构?
- Compilation fails due to "-Wno-switch-bool" nvcc flag
- multiple definition of `cuda::__3::pipeline...
- Circular import error when importing built-from-source flashinfer HOT 1
- CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) HOT 2
- Can BatchDecodeWithPaddedKVCache be used in cascade inference? HOT 1
- Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 HOT 1
- [Feature request] Support attention logits cap with tanh HOT 5
- [Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3 HOT 3
- build raise "cub::BlockAdjacentDifference<__nv_bool, 1024, 1, 1, 860>" has no member "SubtractLeft" HOT 8
- [Q&A] Any palns for different dtypes for Q (query) and KV (kv-cache)? HOT 4
- Feature request: support non-contiguous tensors for attention HOT 1
- Lacks prebuild whl for PyTorch2.3+cu118 HOT 2
- Installation fails immediately: ModuleNotFoundError: No module named 'torch' HOT 2
- How large the page_size could be? HOT 4
- How are prefill and decode kernels different? HOT 3
- Sizes of tensors must match except in dimension 0 when creating mask HOT 1
- Why did we perform an operation similar to data alignment here instead of directly adding 4? HOT 2
- There are precision errors compared with flash_attn_2_cuda.varlen_fwd HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flashinfer.