GithubHelp home page GithubHelp logo

Comments (3)

yzh119 avatar yzh119 commented on June 28, 2024

Sure we are exploring CuTE, and we believe it's the best way to use TMA.

The main reason we are still sticking to custom implementation is we haven't figured out how to TMA for sparse memory loading (e.g. in page attention prefill). We also have some ongoing effort on supporting AMD GPUs, and I suppose porting cutlass 3.0 code to rocm might be hard (pls correct me if I'm wrong).

I'll gradually replace some of the existing code with higher level abstractions in the next few months, and yes we welcome your contributions.

from flashinfer.

jeromeku avatar jeromeku commented on June 28, 2024

@yzh119 Thanks for the response!

Is there an easy way to tweak the source install such that only a few of the kernels are compiled (e.g., prefill + decode only, or some subset thereof)? I realize that there are certain environment variables that can be set to limit the template instantiations but haven't found a coarser-grained way of building only select kernel categories.

from flashinfer.

jeromeku avatar jeromeku commented on June 28, 2024

@yzh119 nevermind -- ended up stripping out certain kernels then using DEFINE flags and torch.utils.cpp_extension.load to jit-compile specific kernel instantiations.

from flashinfer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.