Comments (8)
I think that the majority of PyTorch GPU kernels could be rewritten much more concisely with Triton. This would also make it possible for PyTorch to no longer depend so heavily on cuBLAS, cuDNN, etc. and eventually make the kernels more portable across different hardware architectures. But one thing that the PyTorch team would probably want before they do that is an offline Triton compiler, so that DNN models can be properly deployed; now, Triton is only JIT-compiled
from triton.
@chsasank AMD has actually been looking at Triton for a while and I think they have started working on adding support. It sounds like a pretty hard project for someone unfamiliar with LLVM.
I think an off-line compiler project is probably more realistic. There's been a few efforts already, but none aimed at being production-ready. I think there is value in a proper offline compiler that can be used to generate binaries that can somehow be called from Python like a standard jit'd function :)
@S3Y3D I looked into it a while back but I think it looks exceedingly difficult to me. There are some papers that have added auto-diff support for CUDA-like code, but honestly I'm not confident this can be done without any performance hit. I think a better place for a JIT would be in a JAX/TorchScript-like JIT compiler based on Triton
from triton.
I'll close this for now, as at the moment it seems to be more an issue for PyTorch than for Triton :)
from triton.
That's great to hear and it would be an interesting project to rewrite some of the backend. Moving to triton kind of project allows adding support new hardware lot easier.
I am interested in AMD GPU support #46. I see the issue was raised about 6 months back. I wonder if there was any progress made already? If not, I can possibly contribute but would be nice to know how much effort it might be to add new backend.
from triton.
LLVM noob here but I can also contribute offline triton compiler if that's something not too hard.
from triton.
Hi,
What about 'Automatic Differentiation' in Triton like in PyTorch and JAX?
from triton.
I think there is value in a proper offline compiler that can be used to generate binaries
Are you envisioning a python AST -> systems programming language like C++/Rust?
I work on https://github.com/adsharma/py2many, which is currently focused on producing executables. It works the same way as this code:
https://github.com/openai/triton/blob/master/python/triton/code_gen.py
But the AST visitor spits out C++, Rust or 6 other languages.
There is some experimental support for generating python extensions from python AST via pyO3.
from triton.
@adsharma I think we want to stay focused on generating Triton-IR only. However, we'd love to have more frontends. A rust or C++ equivalent of [Python] @triton.jit
would be really valuable.
from triton.
Related Issues (20)
- Could triton support feature like fusion gemm into one? HOT 1
- Bug: RewriteTensorPointer rewriteForOp result mapping
- BUG: RuntimeError: Triton Error [CUDA]: device kernel image is invalid and test unit get 4 errors HOT 3
- Tree-like join with multiple times layout transfer are quite slow HOT 8
- RuntimeError: Cannot find a working triton installation.
- Triton Matrix Multiplication example invalid results (return zeros) on Volta HOT 1
- ninja test failed on M2 mac HOT 1
- https://github.com/Cecil500 HOT 3
- triton cache does not invalidate cache correctly when dynamically choosing a function to call HOT 10
- Why change the order of make_block_ptr when V.dtype.element_ty == tl.float8e5?
- Print statements inside kernel print incorrect value of int64 tensors HOT 4
- batched matrix multiplication within a program HOT 2
- urllib.error.HTTPError: HTTP Error 404: Not Found HOT 1
- Question about memory coalescing HOT 1
- For small size M, like the shape M=1 K=5120 N=1792, how to improve the performance with triton? HOT 3
- github tag is not consistent with pypi version
- Calling torch.compile fails when Triton kernel arguments include triton.language.dtype HOT 1
- tl.cumsum seems emitting an internal error. HOT 1
- How to perform a store operation on a part of a Tensor? HOT 1
- Question regarding stride HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from triton.