crimeisdown / trunk-transcribe Goto Github PK
View Code? Open in Web Editor NEWTranscription of calls from trunk-recorder using OpenAI Whisper
Transcription of calls from trunk-recorder using OpenAI Whisper
Often with analog audio, FCC station identification morse code will get transcribed when it shouldn't be. Additionally, some noise can get transcribed as gibberish due to hallucination.
The detection of such issues should be improved so that the amount of transcript "noise" can be reduced.
Right now, the transcription job does the following:
Only step 3 really relies on the GPU, technically the other steps are CPU-only. To make the architecture more scalable, split out any logic (and immediate prerequisites) that requires a GPU from the logic that doesn't need a GPU into separate Celery tasks, so they can be taken on by different workers for improved performance.
If the S3 env variables are not defined, the audio is successfully sent as base64, but it appears that the worker doesn't know how to process it. It fails with:
trunk-transcribe-worker-1 | [2023-03-28 19:23:54,679: INFO/ForkPoolWorker-2] Task transcribe[9ef51669-f8ea-4d1b-b74c-9c804702305c] retry: Retry in 2s: InvalidSchema("No connection adapters were found for 'data:audio/mpeg;base64,SUQzBAAA<...>'")
Fixed by using S3 instead - but it would probably be good to either support this or document that it doesn't work.
The docker image that is publicly published does not actually use CUDA:
trunk-transcribe-worker-1 | [2023-03-28 18:56:16,152: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1 | warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1 |
trunk-transcribe-worker-1 | [2023-03-28 18:56:23,461: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1 | warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1 |
trunk-transcribe-worker-1 | [2023-03-28 18:56:24,606: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1 | warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1 |
trunk-transcribe-meilisearch-1 | [2023-03-28T18:56:27Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /health HTTP/1.1" 200 22 "-" "Wget" 0.000401
trunk-transcribe-worker-1 | [2023-03-28 18:56:27,331: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1 | warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1 |
trunk-transcribe-worker-1 | [2023-03-28 18:56:28,558: WARNING/ForkPoolWorker-2] /usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
trunk-transcribe-worker-1 | warnings.warn("FP16 is not supported on CPU; using FP32 instead")
trunk-transcribe-worker-1 |
It appears that 'nvidia-smi' is not installed on the system image, and the CUDA libraries are also not installed.
I fixed this by changing Dockerfile.whisper to build off Nvidia's CUDA image instead:
diff --git a/Dockerfile.whisper b/Dockerfile.whisper
index 8f9d690..3b1c265 100644
--- a/Dockerfile.whisper
+++ b/Dockerfile.whisper
@@ -3,7 +3,7 @@
#
# PLEASE DO NOT EDIT IT DIRECTLY.
#
-FROM ubuntu:22.04
+FROM nvidia/cuda:11.7.1-base-ubuntu22.04
RUN apt-get update && \
apt-get -y upgrade && \
..and then rebuilt the image, and confirmed that it works. Might not be the ideal fix.. but at least confirms where the issue is happening.
Whisper provides a start and end time for each transcript segment, for the purpose of making accurate subtitles. However, this data can also be used to finetune a Whisper model, in conjunction with corrected transcripts.
In order to make this happen, we need to first start collecting start and end timestamps for all transcript segments, and ensure the raw transcript data stores these segments. This involves modifying how we use the result dicts we get back from Whisper, and updating the transcript data structure in various places.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.