GithubHelp home page GithubHelp logo

usefulsensors / useful-transformers Goto Github PK

View Code? Open in Web Editor NEW
311.0 19.0 28.0 138.28 MB

Efficient Inference of Transformer models

License: GNU General Public License v3.0

CMake 1.11% C++ 58.51% C 39.91% Python 0.47%
cpp neural-networks npu openai-whisper rockchip transformer-models

useful-transformers's Introduction

Useful Transformers

Useful Transformers is a library for efficient inference of Transformer models. The focus is on low cost, low energy processors to run inference at the edge. The initial implementation is aimed at running OpenAI's Whisper speech-to-text model efficiently on the RK3588 processors' based single-board computers. The tiny.en Whisper model runs transcribes speech at 30x real-time speeds, and 2x better than best known implementation.

Getting started

The easiest way to try out Whisper transcription is to install the release wheel package.

# Preferably inside a virtual environment
$ python -m pip install https://github.com/usefulsensors/useful-transformers/releases/download/0.1_rk3588/useful_transformers-0.1-cp310-cp310-linux_aarch64.whl

Try transcribing a wav file.

$ taskset -c 4-7 python -m useful_transformers.transcribe_wav <wav_file>

If you don't have a wav file handy, running the above command will transcribe an example provided in the package.

$ taskset -c 4-7 python -m useful_transformers.transcribe_wav
Ever tried, ever failed. No matter, try again. Fail again. Fail better.

Performance

Performance comparison

The plot shows useful-transformers Whisper tiny.en model's inference times across the examples with varying durations. useful-transformer is 2x faster than faster-whisper's int8 implementation. useful-transformer uses FP16 matrix multiplication on the NPU available in the RK3588 processor. The majority of benefit comes from the large matrix multiplications (of sizes 1500x384x384 for the tiny.en model) in the encoder.

TODO

  • Whisper tiny.en
  • Whisper base.en
  • Larger Whisper models
  • Use int8 matmuls from the librknnrt
  • Use int4 matmuls (request Rockhip for int4 matmul kernels)
  • Use asynchronous kernel launches (request Rockchip for better APIs in general)
  • Decode with timestamps

Contributors

  • Nat Jeffries (@njeffrie)
  • Manjunath Kudlur (@keveman)
  • Guy Nicholson (@guynich)
  • James Wang (@JamesUseful)
  • Pete Warden (@petewarden)
  • Ali Zartash (@aliz64)

useful-transformers's People

Contributors

guynich avatar keveman avatar marty1885 avatar njeffrie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

useful-transformers's Issues

Support shorter n_audio_ctx

Taking inspiration from huggingface/transformers#25744, I tried to decrease n_audio_ctx from the default 1500 to 750 (corresponding to 15 seconds of audio vs 30 seconds). I modified torch_state_dict_to_npz.py to slice [:750, :] for 'encoder.positional_embedding', set 'n_audio_ctx' in 'dims' to 750, and modified the '480000' constants in whisper.py to '240000' (from 15 * 16k).

transcribe_wav gives an error like this however:

E RKNN: [...] failed to submit!, op id: 0, op name: MatMul, flags: 0x5, task start: 0, task number: 9, run task counter: 0, int status: 0, please try updating to the latest version of the toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn)

Does this have something to do with hard-coded params in the codebase like
self.N_FFT = 400
self.HOP_LENGTH = 160
in whisper.py, which needs to be modified accordingly?

How to run the transcribe_wav.py code directly instead of using the release wheel package?

I need to make certain modifications to the code, such as converting the frequency of the WAV file before reading it and then transcribing the speech. However, if I run transcribe_wav.py directly, it will throw an error:
Traceback (most recent call last):

  File "E:\useful-transformers-main\examples\whisper\transcribe_wav.py", line 4, in <module>
    from .whisper import decode_wav_file
ImportError: attempted relative import with no known parent package

How can I run the transcribe_wav.py code directly instead of using the release wheel package?
Thank you so much!

提问

请问为啥会把中文wav文件识别为英语

Share information

This is exactly what I was looking for / working on for the last couple of months. So great to see a whisper port coming to the Rk3588 NPU!

However, I was wondering where to find information on how the model is adapted? Perhaps you could share information on how to modify the whisper models ourselves (e.g. How the npz weight files are produced) and experiment with the process.

I'm looking forward to having a multi lingual model setup on my device and expand upon the integration with other services.

Thanks for the good work so far!

RK3566 support?

Trying to get this to work on my OrangePi 3b and running into taskset: failed to set pid 2139's affinity: Invalid argument

How to build for development?

I'm interested the project. But I don't get how to install the repo for development. I tried pip install -e . to install from repo. But running the provided sample shows it can't find pybind_whisper.

❯ cp /path/to/rknpu2/.../librknnrt.so example/whisper/
❯ python -m useful_transformers.transcribe_wav
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/marty/Documents/useful-transformers/examples/whisper/transcribe_wav.py", line 4, in <module>
    from .whisper import decode_wav_file
  File "/home/marty/Documents/useful-transformers/examples/whisper/whisper.py", line 13, in <module>
    from .pybind_whisper import WhisperModel as CWhisperModel
ModuleNotFoundError: No module named 'useful_transformers.pybind_whisper'

How can I run in a development setting?

Providing help and FLOSS stack

Hello,

Your project looks cool, as I was rather sad seeing that Rockchip's NN framework failed to load any useful model.

I've done some reverse engineering ( + reading the datasheet) of RK3588's NPU ( https://github.com/phhusson/rknpu-reverse-engineering/), and I think that maybe I can help.

Reading your TODO, you're using RKNN exclusively to do matrix (not higher order tensors?) multiplications, is that intended? (NPU can do RELu, max/min/average pooling, convolutions)

I see you're waiting for rockchip for int4 matmul, hoping there is no hardware bug preventing it, I should be able to provide one if that's the most useful thing you need?

Either way, seeing your usage I'll try to write a FLOSS reimplementation of rockchip's matmul, to get rid of that proprietary blob.

help

Hello!我用的tiny模型,在识别中文时,输出结果不稳定,并且有时会混杂着英文字符,请问这该如何修改呢?

Get confidence values?

I'm getting a substantial amount of low confidence values on background noise and am looking to filter it out. How can I grab segment/translation temperatures or confidence?

Understanding the use of taskset

line 139 of whisper.py includes the assertion that assert os.sched_getaffinity(os.getpid()) == set([4, 5, 6, 7]). What is the motivation in limiting operations to select CPUs?

please help me

目前的0.1版本的whl未支持translate_wav,可否更新一下0.2版本

Home Assistant addon support?

Hi
I have Whisper addon for home assistant, but i want to run ir on npu.
It is posible with your program?

Regards.

react native

Is it possible to run this model on react native projects for android ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.