GithubHelp home page GithubHelp logo

rwkv-cuda's Introduction

BlinkDL

A minimalist deep learning library in Javascript using WebGL + asm.js. Runs in your browser.

Currently it is a proof-of-concept (inference only). Note: Convolution is buggy when memories overlap.

The WebGL backend is powered by weblas: https://github.com/waylonflinn/weblas.

Example

https://withablink.coding.me/goPolicyNet/ : a weiqi (baduk, go) policy network in AlphaGo style:

board_image

const N = 19;
const NN = N * N;
const nFeaturePlane = 8;
const nFilter = 128;

const x = new BlinkArray();
x.Init('weblas');
x.nChannel = nFeaturePlane;
x.data = new Float32Array(nFeaturePlane * NN);
for (var i = 0; i < NN; i++)
    x.data[5 * NN + i] = 1; // set feature plane for empty board

// pre-act residual network with 6 residual blocks
const bak = new Float32Array(nFilter * NN);
x.Convolution(nFilter, 3);
x.CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak).CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak).CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak).CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak).CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak).CopyTo(bak);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.BatchNorm().ReLU().Convolution(nFilter, 3);
x.Add(bak);
x.BatchNorm().ReLU().Convolution(1, 1).Softmax();

performance_image

Usage

<script src='weblas.js' type='text/javascript'></script>
<script src='BlinkDL.js' type='text/javascript'></script>

Todo

  • Convolution (3x3_pad_1 and 1x1), BatchNorm, ReLU, Softmax
  • Pooling layer
  • FC layer
  • Strided convolution
  • Transposed convolution
  • Webworker and async
  • Faster inference with weblas pipeline, WebGPU, WebAssembly
  • Memory manager
  • Training

rwkv-cuda's People

Contributors

bbuf avatar blealtan avatar blinkdl avatar mrsteyk avatar www avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rwkv-cuda's Issues

似乎多显卡支持有点问题

在用RWKV_Role_Playing项目
当使用--strategy='cuda:1 fp32 *26 -> cuda:0 fp32' --jit_on=1 --cuda_on=1
会在生成时崩溃报错

Traceback (most recent call last):
  File "/srv/RWKV/venv/lib/python3.10/site-packages/gradio/routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "/srv/RWKV/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/srv/RWKV/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/srv/RWKV/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/srv/RWKV/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/srv/RWKV/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/srv/RWKV/RWKV_Role_Playing/modules/ui.py", line 75, in __load_char
    chatbot = self.chat_model.load_init_prompt(char['user'], char['bot'], char['action_start'],
  File "/srv/RWKV/RWKV_Role_Playing/modules/chat.py", line 48, in load_init_prompt
    out, model_tokens, model_state = self.model_utils.run_rnn(model_tokens, model_state, self.model_utils.fix_tokens(self.model_utils.pipeline.encode(init_prompt)))
  File "/srv/RWKV/RWKV_Role_Playing/modules/model_utils.py", line 32, in run_rnn
    out, model_state = self.model.forward(tokens[:self.CHUNK_LEN], model_state)
  File "/srv/RWKV/RWKV_Role_Playing/rwkv/model.py", line 607, in forward
    x, state[i*5+0], state[i*5+1], state[i*5+2], state[i*5+3] = ATT(
torch.jit.Error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/srv/RWKV/RWKV_Role_Playing/rwkv/model.py", line 39, in cuda_att_seq
    def cuda_wkv(T: int, C: int, w, u, k, v, aa, bb, pp):
        assert 1 * C % min(C, 32) == 0
        assert k.dtype == torch.float16
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        w = w.contiguous()
        u = u.contiguous()
RuntimeError: AssertionError:

^CKeyboard interruption in main thread... closing server.

去掉--cuda_on=1关闭RWKV-CUDA后正常
不去掉--cuda_on=1,但是--strategy='cuda:1 fp16也正常
当使用--strategy='cuda:1 fp32 *26 -> cpu fp32'时再次故障

目前看来排除法复现分析应该是
RWKV-CUDA启用时不能使用任何多硬件联合计算功能

python3 run.py failed

(gh_baize-chatbot) ub2004@ub2004-B85M-A0:~/llm_dev/RWKV-CUDA/wkv$ python3 run.py
Using /home/ub2004/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Creating extension directory /home/ub2004/.cache/torch_extensions/py38_cu117/wkv...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ub2004/.cache/torch_extensions/py38_cu117/wkv/build.ninja...
Building extension module wkv...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF wkv_op.o.d -DTORCH_EXTENSION_NAME=wkv -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 /wd4624 -c /home/ub2004/llm_dev/RWKV-CUDA/wkv/cuda/wkv_op.cpp -o wkv_op.o
FAILED: wkv_op.o
c++ -MMD -MF wkv_op.o.d -DTORCH_EXTENSION_NAME=wkv -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 /wd4624 -c /home/ub2004/llm_dev/RWKV-CUDA/wkv/cuda/wkv_op.cpp -o wkv_op.o
c++: error: /wd4624: No such file or directory
[2/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=wkv -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/ub2004/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' --use_fast_math --extra-device-vectorization -std=c++17 -c /home/ub2004/llm_dev/RWKV-CUDA/wkv/cuda/wkv_cuda_v2.cu -o wkv_cuda_v2.cuda.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "run.py", line 86, in
wkv_cuda = load(name="wkv", sources=["cuda/wkv_op.cpp", f"cuda/wkv_cuda_v{CUDA_KERNEL_VERSION}.cu"],
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/ub2004/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'wkv'
(gh_baize-chatbot) ub2004@ub2004-B85M-A0:~/llm_dev/RWKV-CUDA/wkv$

作者你好,想请教一下

作者你好,我们复现了您的代码,发现使用cuda确实可以实现很大程度的加速。我们目前在研究如何使用cuda加速Unet网络从而解决一些问题,想请教您一些有关cuda的问题。并且如果您了解之后,对我们感兴趣的话,我们想与您合作做一些研究。如果您觉得可以的话,请回复我。 @BlinkDL

在colab中运行报错

执行python run.py时有如下报错:

Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/timex/build.ninja...
Building extension module timex...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF timex_op.o.d -DTORCH_EXTENSION_NAME=timex -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 /wd4624 -c /content/RWKV-CUDA/depthwise_conv1d/cuda/timex_op.cpp -o timex_op.o 
FAILED: timex_op.o 
c++ -MMD -MF timex_op.o.d -DTORCH_EXTENSION_NAME=timex -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 /wd4624 -c /content/RWKV-CUDA/depthwise_conv1d/cuda/timex_op.cpp -o timex_op.o 
c++: error: /wd4624: No such file or directory
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=timex -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' --use_fast_math --extra-device-vectorization -DTmax=768 -DBF=8 -DBB=2 -std=c++14 -c /content/RWKV-CUDA/depthwise_conv1d/cuda/timex_cuda_v3.cu -o timex_cuda_v3.cuda.o 
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1746, in _run_ninja_build
    env=env)
  File "/usr/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 72, in <module>
    verbose=True, extra_cuda_cflags=['--use_fast_math', '--extra-device-vectorization', f'-DTmax={T_MAX}', f'-DBF={B_GROUP_FORWARD}', f'-DBB={B_GROUP_BACKWARD}'], extra_cflags=['/wd4624'])
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1156, in load
    keep_intermediates=keep_intermediates)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1367, in _jit_compile
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1472, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'timex'

Is the 1-D depthwise conv still critical for RWKV?

Seems the RWKV-v4 didn't use the 1-d depthwise kernel you developed before. Is the 1-D depthwise CUDA kernel in this repo still a critical operator for RWKV?

I just want to check if I intend to contribute to this project, which CUDA kernel should I work on? Should I work on the codes in the 1-D depthwise folder or the codes in the WKV folders of this repo?

Considering to contribute it to PyTorch?

Hey @BlinkDL, I already posted under your issue in the PyTorch main repo. Would you consider contributing the 1d conv code to PyTorch? This is very relevant to the project I am currently working on and I would also be down to helping you with it.

Best,
Julien

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.