In current implementation, the warps along T axis are computed in fully serialized man

improve efficiency of warps about warp-rnnt HOT 4 OPEN

1ytic commented on June 19, 2024

improve efficiency of warps

from warp-rnnt.

Comments (4)

maxwellzh commented on June 19, 2024 2

Will do further investigation later :)

As for the IO issue, I remember I have seen in somewhere that a thread block would instinctively load nearby memory whatever it is used or not. Have you ever tried using (N, U, T, V) layout instead of (N, T, U, V)? With the former's (and especially when gather=True), a warp (also a thread block) is able to load a chunk of consecutive memory and reuse it.

Indeed, I've been using the compact version loss function in our speech recognition tasks for a while. It should be technically correct (it's in my dev branch now, the main branch hasn't been updated for some time). I'll finish some merge from my dev to the main branch, and once it's finished, I would reopen the MR.

from warp-rnnt.

maxwellzh commented on June 19, 2024 1

I've been following the fast_rnnt work for a while, but haven't make a successful pruned rnn-t training yet.

They also have a paper about the implementation. https://arxiv.org/pdf/2206.13236.pdf

from warp-rnnt.

1ytic commented on June 19, 2024

Hello Huahuan Zheng, interesting theory! But I don't think it will be useful in practice. Optimising a forward pass doesn't make sense. Your can check the cuda profiler logs. The big issue is memory IO, and I really like your previous MR with compact memory version. I wish to finish reviewing it and reopen your MR in near feature.

from warp-rnnt.

1ytic commented on June 19, 2024

I’m not familiar with memory manager for cuda threads. But you right, having TxU matrix is the main bottleneck. Fortunately, there is solution for this, fast_rnnt. It looks really promising.

from warp-rnnt.

Recommend Projects

improve efficiency of warps about warp-rnnt HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs