GithubHelp home page GithubHelp logo

crimeisdown / trunk-transcribe Goto Github PK

View Code? Open in Web Editor NEW
19.0 7.0 2.0 1.51 MB

Transcription of calls from trunk-recorder using OpenAI Whisper

Shell 5.38% Dockerfile 0.36% Python 93.70% Batchfile 0.25% Makefile 0.32%
celery meilisearch openai-whisper telegram-bot trunk-recorder whisper

trunk-transcribe's Issues

Improve detection of non-voice audio and gibberish transcripts

Often with analog audio, FCC station identification morse code will get transcribed when it shouldn't be. Additionally, some noise can get transcribed as gibberish due to hallucination.

The detection of such issues should be improved so that the amount of transcript "noise" can be reduced.

Refactor worker logic into separate Celery tasks

Right now, the transcription job does the following:

  1. Downloads the audio
  2. Parses metadata and gets the audio to be ready to fed to Whisper
  3. Feeds a chunk of audio to Whisper for each speaker (digital only, analog will just feed the whole thing)
  4. Compiles the resulting transcript
  5. Geocodes any addresses
  6. Sends the transcript to the search API to be indexed
  7. Determines which notifications should be made, and sends the appropriate ones

Only step 3 really relies on the GPU, technically the other steps are CPU-only. To make the architecture more scalable, split out any logic (and immediate prerequisites) that requires a GPU from the logic that doesn't need a GPU into separate Celery tasks, so they can be taken on by different workers for improved performance.

Transcribe fails if S3 is not enabled

If the S3 env variables are not defined, the audio is successfully sent as base64, but it appears that the worker doesn't know how to process it. It fails with:

trunk-transcribe-worker-1       | [2023-03-28 19:23:54,679: INFO/ForkPoolWorker-2] Task transcribe[9ef51669-f8ea-4d1b-b74c-9c804702305c] retry: Retry in 2s: InvalidSchema("No connection adapters were found for 'data:audio/mpeg;base64,SUQzBAAA<...>'")

Fixed by using S3 instead - but it would probably be good to either support this or document that it doesn't work.

OpenAI Whisper docker image does not actually use CUDA even when enabled

The docker image that is publicly published does not actually use CUDA:

trunk-transcribe-worker-1       | [2023-03-28 18:56:16,152: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1       |   warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1       |
trunk-transcribe-worker-1       | [2023-03-28 18:56:23,461: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1       |   warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1       |
trunk-transcribe-worker-1       | [2023-03-28 18:56:24,606: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1       |   warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1       |
trunk-transcribe-meilisearch-1  | [2023-03-28T18:56:27Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /health HTTP/1.1" 200 22 "-" "Wget" 0.000401
trunk-transcribe-worker-1       | [2023-03-28 18:56:27,331: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1       |   warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1       |
trunk-transcribe-worker-1       | [2023-03-28 18:56:28,558: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1       |   warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1       |

It appears that 'nvidia-smi' is not installed on the system image, and the CUDA libraries are also not installed.

I fixed this by changing Dockerfile.whisper to build off Nvidia's CUDA image instead:

diff --git a/Dockerfile.whisper b/Dockerfile.whisper
index 8f9d690..3b1c265 100644
--- a/Dockerfile.whisper
+++ b/Dockerfile.whisper
@@ -3,7 +3,7 @@
 #
 # PLEASE DO NOT EDIT IT DIRECTLY.
 #
-FROM ubuntu:22.04
+FROM nvidia/cuda:11.7.1-base-ubuntu22.04

 RUN apt-get update && \
     apt-get -y upgrade && \

..and then rebuilt the image, and confirmed that it works. Might not be the ideal fix.. but at least confirms where the issue is happening.

Start collecting start and end timestamps of transcript segments

Whisper provides a start and end time for each transcript segment, for the purpose of making accurate subtitles. However, this data can also be used to finetune a Whisper model, in conjunction with corrected transcripts.

In order to make this happen, we need to first start collecting start and end timestamps for all transcript segments, and ensure the raw transcript data stores these segments. This involves modifying how we use the result dicts we get back from Whisper, and updating the transcript data structure in various places.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.