GithubHelp home page GithubHelp logo

Why T>=S constraint? about fast_rnnt HOT 15 CLOSED

BuaaAlban avatar BuaaAlban commented on June 19, 2024
Why T>=S constraint?

from fast_rnnt.

Comments (15)

arkadyark avatar arkadyark commented on June 19, 2024 4

I would love to contribute those back, but unfortunately there's a fairly involved open-source contribution process at my organization that would take a while, it'd probably be best to find someone else to do so.

However, I did test this out locally, and re-ran the benchmarking at https://github.com/csukuangfj/transducer-loss-benchmarking - the results look really good, peak memory usage goes from 3820 all the way down to 1182 (!), and from 2647 to 835 when sorting utterances. Step time (on my hardware) went from 343k to 280k us.

Pretty cool! Always gotta be careful with those torch.gathers.

from fast_rnnt.

pkufool avatar pkufool commented on June 19, 2024 1

@danpovey Yifan has already made PRs here #26 and #24 , you can merge it.

from fast_rnnt.

csukuangfj avatar csukuangfj commented on June 19, 2024

In a regular rnnt

As you have mentioned, that is for regular RNN-T.


The version we are using is not regular. It has the same condition as CTC training, i.e., S <= T.

from fast_rnnt.

csukuangfj avatar csukuangfj commented on June 19, 2024

Here is the paper about fast_rnnt:

https://arxiv.org/pdf/2206.13236.pdf

from fast_rnnt.

csukuangfj avatar csukuangfj commented on June 19, 2024

Here is the code to filter data that don't satisfy S<=T in icefall:
https://github.com/k2-fsa/icefall/blob/f13cf61b05432a989e6a42c95b843a56639bcbde/egs/librispeech/ASR/pruned_transducer_stateless2/train.py#L958

        # In ./conformer.py, the conv module uses the following expression
        # for subsampling
        T = ((c.num_frames - 1) // 2 - 1) // 2
        tokens = sp.encode(c.supervisions[0].text, out_type=str)

        if T < len(tokens):
            logging.warning(
                f"Exclude cut with ID {c.id} from training. "
                f"Number of frames (before subsampling): {c.num_frames}. "
                f"Number of frames (after subsampling): {T}. "
                f"Text: {c.supervisions[0].text}. "
                f"Tokens: {tokens}. "
                f"Number of tokens: {len(tokens)}"
            )
            return False

from fast_rnnt.

BuaaAlban avatar BuaaAlban commented on June 19, 2024

Thanks for your fast reply.
I have tried to modify my code based on this example, I thinks it's a normal transducer. I can filter the data as you said to make it work. I just wonder why we has this limitation (for optimization? Actually I have read your paper yesterday but I didn't notice this condition, I will double check it), could I just comment this assert to make the pruned loss work just like the rnnt_loss (like in torchaudio or warp-transducer)

from fast_rnnt.

desh2608 avatar desh2608 commented on June 19, 2024

@BuaaAlban as you noted, this constraint is indeed not required for the "regular" RNNT topology. Only if you train with the "modified" topology, where you are constrained to emit exactly 1 symbol per time frame, will this constraint be required. We have a PR here (k2-fsa/k2#1149) to remove this constraint from k2. I will also make a similar PR for fast_rnnt.

from fast_rnnt.

arkadyark avatar arkadyark commented on June 19, 2024

@desh2608 are you still planning to make this PR? This would be very useful for my work!

from fast_rnnt.

desh2608 avatar desh2608 commented on June 19, 2024

@arkadyark sorry I forgot to actually push the changes. BTW, I believe Dan fixed some OOM issues in the pruned transducer loss in k2, which hasn't yet been merged in fast_rnnt. So you may want to make those changes yourself.

from fast_rnnt.

arkadyark avatar arkadyark commented on June 19, 2024

Thanks! Which changes are you referring to? Looking through recent changes to rnnt_loss.py I don't see anything there.

from fast_rnnt.

desh2608 avatar desh2608 commented on June 19, 2024

Thanks! Which changes are you referring to? Looking through recent changes to rnnt_loss.py I don't see anything there.

Check k2-fsa/k2#1177 and k2-fsa/k2#1183

from fast_rnnt.

danpovey avatar danpovey commented on June 19, 2024

Ah yes. Arkady, it would be great if you could make a PR to fast_rnnt with those changes, I had forgotten about that. If not LMK, I'll ask someone here.

from fast_rnnt.

arkadyark avatar arkadyark commented on June 19, 2024

Hey @danpovey , just wanted to follow up - is anybody able to make those changes here?

from fast_rnnt.

danpovey avatar danpovey commented on June 19, 2024

@pkufool could you please have a look at this?

from fast_rnnt.

pkufool avatar pkufool commented on June 19, 2024

closed by #29

from fast_rnnt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.