GithubHelp home page GithubHelp logo

shashikg / whispers2t Goto Github PK

View Code? Open in Web Editor NEW
259.0 11.0 22.0 1.15 MB

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

License: MIT License

Shell 0.86% Python 48.60% Jupyter Notebook 50.26% Dockerfile 0.27%
asr deep-learning speech-recognition speech-to-text whisper tensorrt-llm tensorrt vad voice-activity-detection

whispers2t's Introduction

WhisperS2T ⚡

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine!

Downloads GitHub Contributors PyPi Release Version Issues



WhisperS2T is an optimized lightning-fast open-sourced Speech-to-Text (ASR) pipeline. It is tailored for the whisper model to provide faster whisper transcription. It's designed to be exceptionally fast than other implementation, boasting a 2.3X speed improvement over WhisperX and a 3X speed boost compared to HuggingFace Pipeline with FlashAttention 2 (Insanely Fast Whisper). Moreover, it includes several heuristics to enhance transcription accuracy.

Whisper is a general-purpose speech recognition model developed by OpenAI and not me. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Release Notes

  • [Feb 25, 2024]: Added prebuilt docker images and transcript exporter to txt, json, tsv, srt, vtt. (Check complete release note)
  • [Jan 28, 2024]: Added support for TensorRT-LLM backend.
  • [Dec 23, 2023]: Added support for word alignment for CTranslate2 backend (check benchmark).
  • [Dec 19, 2023]: Added support for Whisper-Large-V3 and Distil-Whisper-Large-V2 (check benchmark).
  • [Dec 17, 2023]: Released WhisperS2T!

Quickstart

Checkout the Google Colab notebooks provided here: notebooks

Future Roadmaps

  • Ready to use docker container.
  • WhisperS2T-Server: Optimized end-to-end deployment ready server codebase.
  • In depth documentation, use github pages to host it.
  • Explore possibility of integrating Meta's SeamlessM4T model.
  • Add more datasets for WER benchmarking.

Benchmark and Technical Report

Stay tuned for a technical report comparing WhisperS2T against other whisper pipelines. Meanwhile, check some quick benchmarks on A30 GPU. See scripts/ directory for the benchmarking scripts that I used.

A30 Benchmark

NOTE: I conducted all the benchmarks using the without_timestamps parameter set as True. Adjusting this parameter to False may enhance the Word Error Rate (WER) of the HuggingFace pipeline but at the expense of increased inference time. Notably, the improvements in inference speed were achieved solely through a superior pipeline design, without any specific optimization made to the backend inference engines (such as CTranslate2, FlashAttention2, etc.). For instance, WhisperS2T (utilizing FlashAttention2) demonstrates significantly superior inference speed compared to the HuggingFace pipeline (also using FlashAttention2), despite both leveraging the same inference engine—HuggingFace whisper model with FlashAttention2. Additionally, there is a noticeable difference in the WER as well.

Features

  • 🔄 Multi-Backend Support: Support for various Whisper model backends including Original OpenAI Model, HuggingFace Model with FlashAttention2, and CTranslate2 Model.
  • 🎙️ Easy Integration of Custom VAD Models: Seamlessly add custom Voice Activity Detection (VAD) models to enhance control and accuracy in speech recognition.
  • 🎧 Effortless Handling of Small or Large Audio Files: Intelligently batch smaller speech segments from various files, ensuring optimal performance.
  • Streamlined Processing for Large Audio Files: Asynchronously loads large audio files in the background while transcribing segmented batches, notably reducing loading times.
  • 🌐 Batching Support with Multiple Language/Task Decoding: Decode multiple languages or perform both transcription and translation in a single batch for improved versatility and transcription time. (Best support with CTranslate2 backend)
  • 🧠 Reduction in Hallucination: Optimized parameters and heuristics to decrease repeated text output or hallucinations. (Some heuristics works only with CTranslate2 backend)
  • ⏱️ Dynamic Time Length Support (Experimental): Process variable-length inputs in a given input batch instead of fixed 30 seconds, providing flexibility and saving computation time during transcription. (Only with CTranslate2 backend)

Getting Started

From Docker Container

Prebuilt containers

docker pull shashikg/whisper_s2t:dev-trtllm

Dockerhub repo: https://hub.docker.com/r/shashikg/whisper_s2t/tags

Building your own container

Build from main branch.

docker build --build-arg WHISPER_S2T_VER=main --build-arg SKIP_TENSORRT_LLM=1 -t whisper_s2t:main .

Build from specific release v1.3.0.

git checkout v1.3.0
docker build --build-arg WHISPER_S2T_VER=v1.3.0 --build-arg SKIP_TENSORRT_LLM=1 -t whisper_s2t:1.3.0 .

To build the container with TensorRT-LLM support:

docker build --build-arg WHISPER_S2T_VER=main -t whisper_s2t:main-trtllm .

Local Installation

Install audio packages required for resampling and loading audio files.

For Ubuntu

apt-get install -y libsndfile1 ffmpeg

For MAC

brew install ffmpeg

For Ubuntu/MAC/Windows/AnyOther With Conda for Python

conda install conda-forge::ffmpeg

To install or update to the latest released version of WhisperS2T use the following command:

pip install -U whisper-s2t

Or to install from latest commit in this repo:

pip install -U git+https://github.com/shashikg/WhisperS2T.git

NOTE: If your CUDNN and CUBLAS installation is done using pip wheel, you can run the following to add CUDNN path to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

To use TensorRT-LLM Backend

For TensortRT-LLM backend, you will need to install TensorRT and TensorRT-LLM.

bash <repo_dir>/install_tensorrt.sh

For most of the debian system the given bash script should work, if it doesn't/other system please follow the official TensorRT-LLM instructions here.

Usage

CTranslate2 Backend

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0]) # Print first utterance for first file
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
 'avg_logprob': -0.25426941679184695,
 'no_speech_prob': 8.147954940795898e-05,
 'start_time': 0.0,
 'end_time': 24.8}
"""

To use word alignment load the model using this:

model = whisper_s2t.load_model("large-v2", asr_options={'word_timestamps': True})

TensorRT-LLM Backend

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='TensorRT-LLM')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=24)

print(out[0][0]) # Print first utterance for first file
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive", 
 'start_time': 0.0, 
 'end_time': 24.8}
"""

Check this Documentation for more details.

NOTE: For first run the model may give slightly slower inference speed. After 1-2 runs it will give better inference speed. This is due to the JIT tracing of the VAD model.

Acknowledgements

  • OpenAI Whisper Team: Thanks to the OpenAI Whisper Team for open-sourcing the whisper model.
  • HuggingFace Team: Thanks to the HuggingFace Team for their integration of FlashAttention2 and the Whisper model in the transformers library.
  • CTranslate2 Team: Thanks to the CTranslate2 Team for providing a faster inference engine for Transformers architecture.
  • NVIDIA NeMo Team: Thanks to the NVIDIA NeMo Team for their contribution of the open-source VAD model used in this pipeline.
  • NVIDIA TensorRT-LLM Team: Thanks to the NVIDIA TensorRT-LLM Team for their awesome LLM inference optimizations.

License

This project is licensed under MIT License - see the LICENSE file for details.

whispers2t's People

Contributors

mahmoudashraf97 avatar shashikg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whispers2t's Issues

[`large-v3`] Error during transcription: Invalid input features shape: expected an input with shape (3, 80, 3000), but got an input with shape (3, 128, 3000) instead

Cell 1

!apt install ffmpeg
!pip install whisper-s2t yt-dlp gradio pydantic ffmpeg-python

Cell 2

import logging
from pathlib import Path
import whisper_s2t

from google.colab import drive

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Configuration
class Config:
    model_identifier = "large-v3" # This causes a problem
    backend = "CTranslate2"
    output_format = "vtt"
    max_workers = 16
    path_root = "/content/drive"
    cwd = Path(path_root, "MyDrive/Colab Notebooks/YouTube Videos")


drive.mount(Config.path_root)

Cell 3

whisper_s2t_model = whisper_s2t.load_model(
    model_identifier=Config.model_identifier,
    backend=Config.backend,
    asr_options={"word_timestamps": True},
    # n_mels=128 # This doesn't matter
)

Cell 4

import asyncio
import os
import shutil
from concurrent.futures import ThreadPoolExecutor

import ffmpeg
import yt_dlp
from pydantic import BaseModel


# Pydantic model for VideoToTranscribe
class VideoToTranscribe(BaseModel):
    video_path: Path
    audio_path: Path
    metadata: dict | None = None
    lang_code: str = "en"
    initial_prompt: str | None = None
    vtt_path: Path


class VideoTranscriptor:
    def __init__(self, cwd: Path, whisper_s2t_model):
        self.cwd = cwd
        self.input_videos_dir = cwd / "input_videos"
        self.input_audios_dir = cwd / "input_audios"
        self.transcribed_dir = cwd / "transcribed"

        # Create directories if they don't exist
        self.input_videos_dir.mkdir(parents=True, exist_ok=True)
        self.input_audios_dir.mkdir(parents=True, exist_ok=True)
        self.transcribed_dir.mkdir(parents=True, exist_ok=True)
        self.whisper_s2t_model = whisper_s2t_model

    async def download_youtube_videos(self, url: str):
        ydl_opts = {
            "format": "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best",
            "outtmpl": str(self.input_videos_dir / "%(id)s.%(ext)s"),
        }

        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])

    async def transcribe_audio(
        self, audio_paths: list[Path], lang_codes: list[str], tasks: list[str]
    ):
        vtt_paths = [
            self.transcribed_dir / f"{audio_path.stem}.{Config.output_format}"
            for audio_path in audio_paths
        ]

        out = self.whisper_s2t_model.transcribe_with_vad(
            [str(audio_path) for audio_path in audio_paths],
            lang_codes=lang_codes,
            tasks=tasks,
            initial_prompts=[None] * len(audio_paths),
            batch_size=Config.max_workers,
        )

        whisper_s2t.write_outputs(
            out,
            format=Config.output_format,
            op_files=[str(vtt_path) for vtt_path in vtt_paths],
        )

        return vtt_paths

    async def process_videos(self, lang_code: str, output_lang_code: str):
        video_paths = list(self.input_videos_dir.glob("*.mp4"))

        def extract_audio(video_path: Path):
            audio_path = self.input_audios_dir / f"{video_path.stem}.wav"

            try:
                (
                    ffmpeg.input(str(video_path))
                    .output(str(audio_path), acodec="pcm_s16le", ar=16000, ac=1)
                    .overwrite_output()
                    .run(capture_stdout=True, capture_stderr=True)
                )
            except ffmpeg.Error as e:
                logger.error(
                    f"Error while extracting audio from {video_path}: {e.stderr.decode()}"
                )
                raise e

            return audio_path

        with ThreadPoolExecutor(max_workers=Config.max_workers) as executor:
            audio_extraction_tasks = [
                asyncio.get_event_loop().run_in_executor(
                    executor, extract_audio, video_path
                )
                for video_path in video_paths
            ]
            audio_paths = await asyncio.gather(*audio_extraction_tasks)

            task = "transcribe" if lang_code == output_lang_code else "translate"
            tasks = [task] * len(audio_paths)
            lang_codes = [output_lang_code] * len(audio_paths)

            vtt_paths = await self.transcribe_audio(audio_paths, lang_codes, tasks)

        videos_to_transcribe = [
            VideoToTranscribe(
                video_path=video_path, audio_path=audio_path, vtt_path=vtt_path
            )
            for video_path, audio_path, vtt_path in zip(
                video_paths, audio_paths, vtt_paths
            )
        ]

        return videos_to_transcribe

    async def cleanup(self, videos_to_transcribe: list[VideoToTranscribe]):
        for video in videos_to_transcribe:
            if video.vtt_path.exists():
                if video.video_path.exists():
                    shutil.move(str(video.video_path), str(self.transcribed_dir))
                    os.remove(str(video.audio_path))
                else:
                    shutil.move(str(video.audio_path), str(self.transcribed_dir))

    async def transcribe(self, youtube_url: str, lang_code: str, output_lang_code: str):
        if youtube_url:
            logger.info(f"Downloading YouTube video(s) from: {youtube_url}")
            await self.download_youtube_videos(youtube_url)

        logger.info("Processing videos...")
        videos_to_transcribe = await self.process_videos(lang_code, output_lang_code)

        logger.info("Cleaning up temporary files...")
        await self.cleanup(videos_to_transcribe)

        return f"Transcription completed. Files saved in {self.transcribed_dir}"

Cell 5

import gradio as gr

# Gradio UI


def launch_ui():
    cwd = Path(Config.cwd)
    transcriptor = VideoTranscriptor(cwd, whisper_s2t_model)

    async def transcribe_wrapper(
        youtube_url: str, lang_code: str, output_lang_code: str
    ):
        try:
            result = await transcriptor.transcribe(
                youtube_url, lang_code, output_lang_code
            )
            return result
        except Exception as e:
            logger.error(f"Error during transcription: {str(e)}")
            return f"An error occurred during transcription: {str(e)}"

    input_components = [
        gr.Textbox(
            label="YouTube URL (optional)",
            placeholder="Enter a YouTube video or playlist URL",
        ),
        gr.Textbox(label="Source Language Code", value="en"),
        gr.Textbox(label="Target Language Code", value="en"),
    ]

    iface = gr.Interface(
        fn=transcribe_wrapper,
        inputs=input_components,
        outputs="text",
        title="YouTube Video Transcriptor",
        description="Transcribe YouTube videos or local video files using Whisper",
        allow_flagging="never",
    )

    iface.launch(debug=False, share=True)


if __name__ == "__main__":
    launch_ui()

When trying to run this code with large-v3 model identifier, I keep getting:

ERROR:__main__:Error during transcription: Invalid input features shape: expected an input with shape (3, 80, 3000), but got an input with shape (3, 128, 3000) instead

With large-v2, it works fine.

problems with using huggingface flash attention 2 backend on windows

I'm having multiple problems testing out the huggingface backend. Here's one example error:

  File "C:\PATH\Scripts\WhisperS2T-batch-process\Lib\site-packages\transformers\tokenization_utils.py", line 391, in added_tokens_encoder
    return {k.content: v for v, k in sorted(self._added_tokens_decoder.items(), key=lambda item: item[0])}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PATH\Scripts\WhisperS2T-batch-process\Lib\site-packages\transformers\tokenization_utils.py", line 391, in <dictcomp>
    return {k.content: v for v, k in sorted(self._added_tokens_decoder.items(), key=lambda item: item[0])}
                             ^^^^
TypeError: 'tokenizers.AddedToken' object does not support the context manager protocol

And here's the script that produced it:

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='HuggingFace')

files = ['test_audio_flac.flac']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=8)

op_files = ["transcription.txt"]

whisper_s2t.write_outputs(out, format='txt', op_files=op_files)

if __name__ == "__main__":
    main()

Here's another kind of error that I got:

  File "C:\PATH\Scripts\WhisperS2T-batch-process\Lib\site-packages\transformers\tokenization_utils.py", line 391, in added_tokens_encoder
    return {k.content: v for v, k in sorted(self._added_tokens_decoder.items(), key=lambda item: item[0])}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PATH\Scripts\WhisperS2T-batch-process\Lib\site-packages\transformers\tokenization_utils.py", line 391, in <dictcomp>
    return {k.content: v for v, k in sorted(self._added_tokens_decoder.items(), key=lambda item: item[0])}
            ^^^^^^^^^
AttributeError: 'list_iterator' object has no attribute 'content'

It DOES NOT make sense because this error is different even though I only changed the "batch_size" from 8 to 16 in the above script?

I'm on Windows and am using a custom flash attention 2 wheel from here:

https://github.com/bdashore3/flash-attention/releases/

I've struggled for hours to get flash attention 2 to work on Windows and with whispers2t specifically, using multiple scripts, not just the one above.

Any help would be much appreciated. I'd love to get the huggingface backend working along with ctranslate2's...Here is my pip freeze if it helps...

accelerate==0.25.0
aiohttp==3.9.3
aiosignal==1.3.1
attrs==23.2.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
coloredlogs==15.0.1
ctranslate2==4.0.0
datasets==2.18.0
dill==0.3.8
einops==0.7.0
filelock==3.13.1
flash_attn @ https://github.com/bdashore3/flash-attention/releases/download/v2.5.2/flash_attn-2.5.2+cu122torch2.2.0cxx11abiFALSE-cp311-cp311-win_amd64.whl#sha256=9a6a9bd30861a988b95e64402adb4fa15f84b1fdcae31251ab5fc0e7f691c0f2
frozenlist==1.4.1
fsspec==2024.2.0
huggingface-hub==0.21.3
humanfriendly==10.0
idna==3.6
Jinja2==3.1.3
llvmlite==0.42.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
more-itertools==10.2.0
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==3.2.1
ninja==1.11.1.1
numba==0.59.0
numpy==1.26.4
nvidia-ml-py==12.535.133
openai-whisper==20231117
optimum==1.15.0
packaging==23.2
pandas==2.2.1
platformdirs==4.2.0
protobuf==4.25.3
psutil==5.9.8
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
Pygments==2.17.2
pyreadline3==3.4.1
PySide6==6.6.1
PySide6-Addons==6.6.1
PySide6-Essentials==6.6.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
rich==13.7.0
safetensors==0.4.2
sentencepiece==0.2.0
shiboken6==6.6.1
six==1.16.0
sounddevice==0.4.6
soundfile==0.12.1
sympy==1.12
tiktoken==0.6.0
tokenizers==0.15.2
torch @ https://download.pytorch.org/whl/cu121/torch-2.2.0%2Bcu121-cp311-cp311-win_amd64.whl#sha256=d79324159c622243429ec214a86b8613c1d7d46fc4821374d324800f1df6ade1
tqdm==4.66.2
transformers==4.37.2
typing_extensions==4.10.0
tzdata==2024.1
urllib3==2.2.1
whisper_s2t @ git+https://github.com/shashikg/WhisperS2T.git@33e305fd447004c18fbc73848fa1e3385f42c93c
xxhash==3.4.1
yarl==1.9.4

Has this repository been abandoned?

This repository was active a lot initially and then nothing for over a month. Anyone have any idea if the repository owner has abandoned it or something? I know that Huggingface basically tried to recruit the owner after they didn't like the fact that his program is better and/or faster than their product...maybe they got to him somehow? lol

SPEED TESTING; add speed tests here folks!

In my program I used faster-whisper to transcribe an audio file. The large-v2 model running in float16 took 10 minutes to process the Sam Altman audio file.

After implementing this library I got the following:

Large-v2 running on float16 with batch size of 50 = 54 seconds
Medium.en, float16, batch size of 75 = 32 seconds
Small.en, float16, batch size of 100 = 15 seconds!

Amazing!

Tests run on RTX 4090 with CUDA 12 and pytorch 2.2.0. Just thought you'd like to know.

Also, that's using the higher quality ASR parameters:

            'asr_options': {
                "beam_size": 5,
                "best_of": 1,
                "patience": 2,
                "length_penalty": 1,
                "repetition_penalty": 1.01,
                "no_repeat_ngram_size": 0,
                "compression_ratio_threshold": 2.4,
                "log_prob_threshold": -1.0,
                "no_speech_threshold": 0.5,
                "prefix": None,
                "suppress_blank": True,
                "suppress_tokens": [-1],
                "without_timestamps": True,
                "max_initial_timestamp": 1.0,
                "word_timestamps": False,
                "sampling_temperature": 1.0,
                "return_scores": True,
                "return_no_speech_prob": True,
                "word_aligner_model": 'tiny',
            },
            'model_identifier': model_identifier,
            'backend': 'CTranslate2',
        }

If you increase the batch size (regardless of size of the whisper model) where it exceeds available VRAM, the speeds dropped significantly, but this is expected behavior).

image

https://github.com/BBC-Esq/ChromaDB-Plugin-for-LM-Studio/releases/tag/v4.0.0

Batch Transcribing

In your example:

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0])
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
 'avg_logprob': -0.25426941679184695,
 'no_speech_prob': 8.147954940795898e-05,
 'start_time': 0.0,
 'end_time': 24.8}
"""

The method "transcribe_with_vad" accepts a list of files, but when I try to use a list, only the first audio is transcribed. Does this implementation not support batch transcription?

Randomly getting error while generating word timestamps

code
`model = whisper_s2t.load_model(model_identifier="large-v2", asr_options={'word_timestamps': True},backend='TensorRT-LLM')

files = ['output.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
lang_codes=lang_codes,
tasks=tasks,
initial_prompts=initial_prompts,
batch_size=16)`

For above code sometime it throws in below error for same file. Is there any explanation for it.
`RuntimeError Traceback (most recent call last)
Cell In[15], line 10
8 initial_prompts = [None]
9 start =time.time()
---> 10 out = model.transcribe_with_vad(files,
11 lang_codes=lang_codes,
12 tasks=tasks,
13 initial_prompts=initial_prompts,
14 batch_size=16)
15 end =time.time()
16 print(f"batch :: {16} time:: {end-start}")

File ~/temp_triton/triton_env/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/init.py:171, in WhisperModel.transcribe_with_vad(self, audio_files, lang_codes, tasks, initial_prompts, batch_size)
169 for signals, prompts, seq_len, seg_metadata, pbar_update in self.data_loader(audio_files, lang_codes, tasks, initial_prompts, batch_size=batch_size):
170 mels, seq_len = self.preprocessor(signals, seq_len)
--> 171 res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
173 for res_idx, _seg_metadata in enumerate(seg_metadata):
174 responses[_seg_metadata['file_id']].append({**res[res_idx],
175 'start_time': round(_seg_metadata['start_time'], 3),
176 'end_time': round(_seg_metadata['end_time'], 3)})

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/tensorrt/model.py:248, in WhisperModelTRT.generate_segment_batched(self, features, prompts, seq_lens, seg_metadata)
246 text_tokens = [[_t for _t in x[0] if t < self.tokenizer.eot]+[self.tokenizer.eot] for x in result]
247 sot_seqs = [tuple(
[-4:]) for _ in prompts]
--> 248 word_timings = self.align_words(features, texts, text_tokens, sot_seqs, seq_lens, seg_metadata)
250 for _response, _word_timings in zip(response, word_timings):
251 _response['word_timestamps'] = _word_timings

File ~/temp_triton/triton_env/lib/python3.10/site-packages/whisper_s2t/backends/tensorrt/model.py:200, in WhisperModelTRT.align_words(self, features, texts, text_tokens, sot_seqs, seq_lens, seg_metadata)
198 token_alignments = [[] for _ in seg_metadata]
199 for start_seq, req_idx in start_seq_wise_req.items():
--> 200 res = self.aligner_model.align(ctranslate2.StorageView.from_array(features[req_idx]),
201 start_sequence=list(start_seq),
202 text_tokens=[text_tokens[_] for _ in req_idx],
203 num_frames=list(seq_lens[req_idx].detach().cpu().numpy()),
204 median_filter_width=7)
206 for _res, _req_idx in zip(res, req_idx):
207 token_alignments[_req_idx] = _res

RuntimeError: No position encodings are defined for positions >= 448, but got position 454`

temp directory absolutely necessary?

Is the variable for temp directory here absolutely necessary?

tmpdir

The reason I ask is because on my Windows system my "User" environment variables have TEMP and TMP. My "System" variables have TEMP, TMP, and TMPDIR.

My TMPDIR is specific to a PDF-related program (Wondershare) for some reason whereas the other two are generally where all temp files go...which makes me wonder why you didn't use the "standard" or more widely used variable for the temp directory? Moreover, TMPDIR isn't even listed as an environment variable for "User" but it is for "System"...

image

word error rate mystique?

I'm curious how you obtained a word error rate so low for ctranslate2 with beam size 5. The faster-whisper repository shows about 10, slightly higher. Moreover, you use harnessed the batch processing capabilities of the "generate" method of the whisper model's implementation in ctranslate2, in contrast to faster-whisper, if I understand correctly. Moreover, faster-whisper's "transcribe" script apparently, if I understand it correctly, uses a "fallback" mechanism whereby it'll process the same segment multiple using different "temperatures" if/when the quality isn't high enough...which should lead to a higher quality overall...

I didn't see anything similar with WhisperS2T, thus I'm curious how the WER rate is so good...must have taken some creative thinking? ;-)

support for word timestamps

I cannot figure out how to enable word-level timestamps.
I have tried the following with CTranslate2

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2', asr_options={"word_timestamps":True})

files = ['test.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]
out = model.transcribe_with_vad(files, lang_codes=lang_codes, tasks=tasks, initial_prompts=initial_prompts, batch_size=32)

suppress or remove annoying print statement

Can we please please have a way to remove this message...Every time I run the program from my python script it checks for ffmpeg, which is fine, but i wish there was a way to remove or temporarily suppress it. I have important messages printed to the command prompt when my program runs and this clutters it up...

Also, is there a way to REMOVE FFMPEG requirement entirely? For example, the pyav library includes it when you pip install that library.

https://pypi.org/project/av/

This is why the faster-whisper library uses it. See here:
image

https://github.com/SYSTRAN/faster-whisper

Anyways, here is the print that's annoying me:

ffmpeg version 6.1.1-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil      58. 29.100 / 58. 29.100
libavcodec     60. 31.102 / 60. 31.102
libavformat    60. 16.100 / 60. 16.100
libavdevice    60.  3.100 / 60.  3.100
libavfilter     9. 12.100 /  9. 12.100
libswscale      7.  5.100 /  7.  5.100
libswresample   4. 12.100 /  4. 12.100
libpostproc    57.  3.100 / 57.  3.100

I couldn't get it to transcribe the entire audio file

I tried transcribing the sam altman audio file and couldn't get it to transcribe the entire file...I used this script here:

import whisper_s2t
from whisper_s2t.backends.ctranslate2.model import BEST_ASR_CONFIG

model_kwargs = {
    'compute_type': 'int8', # Note int8 is only supported for CTranslate2 backend, for others only float16 is supported for lower precision.
    'asr_options': BEST_ASR_CONFIG
}

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2', **model_kwargs)

files = ['test_audio_flac.flac']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=24)

transcription = out[0][0]['text'] if 'text' in out[0][0] else "Transcription not available"

with open('transcription.txt', 'w') as f:
    f.write(transcription)

The timestamp mode is not working

Thanks again for this very brilliant piece of work.

I realize that the mode where without_timestamps=False is actually not working. (Whisper doesn't detect anything)

I suspect that your prompt definition in the timestamp case is wrong (I am not 100% sure). For the reference, please find the faster_whisper implementation

        if self.without_timestamps:
            prompt.append(self.tokenizer.no_timestamps)
        else:
            prompt.append(self.tokenizer.timestamp_begin)

Can you give me idea how to fix that ?

Heuristics

"Reduction in Hallucination: Optimized parameters and heuristics to decrease repeated text output or hallucinations."

What's the heuristics you've tested?

Is it possible support real-time transcription with websockets?

I would like to know if it is possible to use CTranslate2 hosted model pipeline linked with a websocket service like twilio to receive audio streams. Like https://github.com/ufal/whisper_streaming or https://github.com/collabora/WhisperLive that uses faster-whisper. Is it possible now or how could that be implemented if I need to dive into repository code?

I want to code and test this scenario to build a multi-client server to transcript multiple audio streams at the same time using GPU.

setting cpu threads at runtime made easier/given as example?

Might it be possible to have an example to set the cpu_threads in the examples .md or perhaps even set it dynamically instead of hardcode it to 4?

Here's what I've tested as a way to set it dynamically...leaving a certain number of threads available for a user's system for other stuff in the meantime...

max_threads = os.cpu_count()
cpu_threads = max(max_threads - 8, 2) if max_threads is not None else 2

Language Auto Detection

I noticed that it would fallback to English if no language is specified. Is there a way to automatically predict the language?

huggingface repository for ctranslate2 models or provide as alternative source

Again, I am extremely impressed by this program. I've been pining for a long time for something that can do batch processing based on ctranslate2, especially since huggingface claimed to have the fastest implemantation.

What are your thoughts about a pull request to change the hf_utils.py script to list my repository for all various quantizations of the whisper models?

https://huggingface.co/ctranslate2-4you

I realize the ctranslate2 can quantize at runtime, but it does take some additional time and I figured users might want the option. Alternatively, would you be willing to a pull request on your repository linking my huggingface repository as an alternative source to the default "systran" models?

The difference is I've quantized every whisper model size to every ctranslate2 available quantization (except int16 of course)...EXCEPT large-v3 since I've noticed regressions with it. Either way, great job and I'm excited to include it into my programs!

If you're a visual person, here's an example of just my conversions of the large-v2 model:

image

15s of silence causes exception

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['silence.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0])
Transcribing:   0%|                                                                                                                                               | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/myapp/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/myapp/lib/python3.11/site-packages/whisper_s2t/backends/__init__.py", line 138, in transcribe_with_vad
    for signals, prompts, seq_len, seg_metadata, pbar_update in self.data_loader(audio_files, lang_codes, tasks, initial_prompts, batch_size=batch_size):
  File "/opt/myapp/lib/python3.11/site-packages/whisper_s2t/data.py", line 193, in get_data_loader_with_vad
    new_segmented_audio_signal = self.get_segmented_audio_signal(audio_signal, file_id, lang, task, initial_prompt)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/myapp/lib/python3.11/site-packages/whisper_s2t/data.py", line 146, in get_segmented_audio_signal
    start_ends, audio_signal = self.speech_segmenter(audio_signal=audio_signal)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/myapp/lib/python3.11/site-packages/whisper_s2t/speech_segmenter/__init__.py", line 127, in __call__
    start_ends[0][0] = max(0.0, start_ends[0][0]) # fix edges
                                ~~~~~~~~~~^^^
IndexError: list index out of range

silence.wav.zip

I know this sounds like an odd use-case but when processing dual channel audio (splitting, then transcribing), often the single channel can be left with silence. Is there anyway to handle this exception by returning an empty result set if no audio is detected?

speaker diarization

Thanks for putting so much work into this, its so polished already!

Just want to understand if speaker diarization is something planned in the future?

Thanks!

Possible to run WhisperS2T without GPU? (Issue with CUDA)

I have received this error when trying to run with CTranslate2:

Fetching 4 files: 100%|██████████| 4/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\xxx\PycharmProjects\LifeLogs\LifeLogsHandler\scripts\audio\whisper_setup.py", line 3, in
model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\PycharmProjects\LifeLogs\LifeLogsHandler\venv\Lib\site-packages\whisper_s2t_init_.py", line 44, in load_model
return WhisperModel(model_identifier, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\PycharmProjects\LifeLogs\LifeLogsHandler\venv\Lib\site-packages\whisper_s2t\backends\ctranslate2\model.py", line 80, in init
self.model = ctranslate2.models.Whisper(self.model_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

Process finished with exit code 1

I see that there is a way to configure CTranslate2 to not require CUDA (as I am running from a simple laptop with CPU, for testing, before running this code on some cloud processing system), but I do not know if there is a way to access that without rewriting the CTranslate2 source code?

I apologize if I am misunderstanding the problem. Thank you for your help.

Best,
tlc

'RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0' when specify "device_index = 1" of "whisper_s2t.load_model"

I have 4 Tesla V100s when I specify the device_index =1 , the "transcribe_with_vad" method executes incorrectly.

here is my code :
model = whisper_s2t.load_model( model_identifier="large-v2", backend="CTranslate2", compute_type="int8", device_index=1, )

here is the error log:

`
File "/home/asr/code/src/asr_server/modules/transcribe/test.py", line 34, in test
out, infos = model.transcribe_with_vad(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/backends/init.py", line 215, in transcribe_with_vad
for (
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/data.py", line 264, in get_data_loader_with_vad
start_ends, audio_signal, audio_duration = self.speech_segmenter(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/init.py", line 148, in call
speech_probs = self.vad_model(audio_signal)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 128, in call
speech_probs = self.forward(input_signal, input_signal_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 104, in forward
x, x_len = self.vad_pp(input_signal_pt, input_signal_length_pt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/preprocessor/___torch_mangle_6.py", line 23, in forward
_9 = [torch.size(input0, 1), torch.size(input0, 2)]
input1 = torch.view(input0, _9)
x = torch.stft(input1, 512, 160, 400, CONSTANTS.c4, False, None, True)
~~~~~~~~~~ <--- HERE
x0 = torch.view_as_real(x)
x1 = torch.sqrt(torch.sum(torch.pow(x0, 2), [-1]))

Traceback of TorchScript, original code (most recent call last):
/usr/local/lib/python3.10/dist-packages/torch/functional.py(650): stft
/content/preprocessor.py(79): forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/content/preprocessor.py(252): forward
/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py(16): decorate_autocast
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/content/preprocessor.py(448): forward
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py(1065): trace_module
/content/preprocessor.py(463): export
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
(17): <cell line: 17>
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3553): run_code
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3257): run_cell_async
/usr/local/lib/python3.10/dist-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3030): _run_cell
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(2975): run_cell
/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py(539): run_cell
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py(302): do_execute
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(539): execute_request
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(261): dispatch_shell
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(361): process_one
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(786): run
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(825): inner
/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py(738): _run_callback
/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py(685):
/usr/lib/python3.10/asyncio/events.py(80): _run
/usr/lib/python3.10/asyncio/base_events.py(1909): _run_once
/usr/lib/python3.10/asyncio/base_events.py(603): run_forever
/usr/local/lib/python3.10/dist-packages/tornado/platform/asyncio.py(195): start
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.py(619): start
/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py(992): launch_instance
/usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py(37):
/usr/lib/python3.10/runpy.py(86): _run_code
/usr/lib/python3.10/runpy.py(196): _run_module_as_main
RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0

`

Non latin characters cannot get exported to files

When exporting a transcript in Japanese I got:

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/whisper_s2t/utils.py:95, in ExportVTT(transcript, file, single_sentence_in_one_utterance, end_punct_marks)
     93 f.write("WEBVTT\n\n")
     94 for _utt in transcript:
---> 95     f.write(f"{format_timestamp(_utt['start_time'])} --> {format_timestamp(_utt['end_time'])}\n{_utt['text']}\n\n")

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-25: ordinal not in range(128)

Proposing a fix for exporting results as .srt, .txt, etc. files #52

running on macos without cuda

Hello,

Is it possible to run the project on macos, with CTranslate2 backend, without Cuda support for testing on the local environment?

Thx

initial_prompt for tensorrt backend

Hi,

Can someone point parts of code that needs to be updated so initial_prompts can be used with tensorrt backend,
and if there are some limitations why it can't be done?

Thanks.

TensorRT-LLM Backend Exported Model

Hey everyone!

WhisperS2T now supports the TensorRT-LLM backend, achieving double the inference speed compared to the CTranslate2 backend! The current optimal configuration on an A30 GPU achieves transcription of 1-hour files in approximately 18 seconds.

After TensorRT-LLM optimization, the exported model only works on NVIDIA GPUs with the same cuda_compute_capability. This means a model exported on a T4 GPU won't work on an A100, and vice versa.

Help Needed: Model export takes about 3-6 minutes. I need volunteers out there to export the model for a specific GPU and share it. It would be a huge help to the community! I have access to A30, A100, and T4 GPUs for which I will add the exported models.

PS: I will update this discussion in a few weeks on how to contribute your exported model.

Thanks,
Shashi

ffmpeg issue - semi-IMPORTANT

When trying to transcribe to a VTT file, for example, I get errors when the file to be transcribed has spaces in its name. I took some time to pinpoint this issue. Apparently, it's because audio.py uses os.system to run certain ffmpeg commands and not subprocess.run, for example. Apparently, when os.system is used with ffmpeg there's a problem when a file to be transcribed has spaces in it.

For example:

"audio - recording for client.mp3"

According to gpt...the problematic line is:

ret_code = os.system(f'ffmpeg -hide_banner -loglevel panic -i "{input_file}" -threads 1 -acodec pcm_s16le -ac 1 -af aresample=resampler={RESAMPLING_ENGINE} -ar {sr} "{wav_file}" -y')

Here is the relevant portions of the traceback...I didn't include the entire paths for privacy reasons. Hope it helps though. whispers2t_batch.py refers to my script that's supposed to batch process all files in a directory. It works perfectly if I remove all files with spaces in their names...

    with wave.open(input_file, 'rb') as wf:```

```Python311\Lib\wave.py", line 631, in open
    return Wave_read(f)```

```Python311\Lib\wave.py", line 283, in __init__
    self.initfp(f)```

```Python311\Lib\wave.py", line 250, in initfp
    raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id```

```During handling of the above exception, another exception occurred:```

```whispers2t_batch.py", line 58, in <module>
    transcribe_audio_files(audio_files_directory)```

```whispers2t_batch.py", line 46, in transcribe_audio_files
    out = model.transcribe_with_vad([audio_file], lang_codes=lang_codes, tasks=tasks, initial_prompts=initial_prompts, batch_size=70)```

```site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)```

```site-packages\whisper_s2t\backends\__init__.py", line 169, in transcribe_with_vad
    for signals, prompts, seq_len, seg_metadata, pbar_update in self.data_loader(audio_files, lang_codes, tasks, initial_prompts, batch_size=batch_size):```

```site-packages\whisper_s2t\data.py", line 212, in get_data_loader_with_vad
    for file_id, (audio_signal, lang, task, initial_prompt) in enumerate(zip(audio_batch_generator(audio_files), lang_codes, tasks, initial_prompts)):```

```Python311\Lib\multiprocessing\pool.py", line 873, in next
    raise value```

```Python311\Lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))```

```whisper_s2t\audio.py", line 42, in load_audio
    if ret_code != 0: raise RuntimeError("ffmpeg failed to resample the input audio file, make sure ffmpeg is compiled properly!")
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: ffmpeg failed to resample the input audio file, make sure ffmpeg is compiled properly!```

CUDA 11.8 support please?

I tried running using CUDA 11.8 that's already installed on my computer, but it state that it needed cublas64_12.dll or what not...whereas CUDA 11.8 uses cublas64_11.dll. I had to locate the newer cublas version within a python wheel and put that in my system PATH. Is there a way to allow CUDA 11.8 support as well?

Reproducing benchmarks

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌

I'm working on the Hugging Face implementation, and keen to understand better how we can reproduce the numbers from your benchmark. In particular, I'm looking at reproducing the numbers from this table.

The benchmark scripts currently use a local version of the Kincaid dataset:

data = pd.read_csv(f'{repo_path}/data/KINCAID46/manifest_mp3.tsv', sep="\t")

Would it be possible to share this dataset, in order to re-run the numbers locally? You could push it as a Hugging Face Audio dataset to the Hugging Face Hub, which should be quite straightforward by following this guide: https://huggingface.co/docs/datasets/audio_dataset

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Many thanks!

Suggestion to deal with omission of periods

There is a frequent hallucination in Whisper in which segments of the transcript are stripped of a period or full stop. Example (not a real transcription, just to illustrate the issue:

Meghan Elizabeth Trainor is an American singer-songwriter and television personality She rose to prominence after signing with Epic Records in 2014 and releasing her debut single All About That Bass, which reached number one on the U.S. Billboard Hot 100 chart and sold 11 million copies worldwide Trainor has released five studio albums with the label and has received various accolades, including the 2016 Grammy Award for Best New Artist.

I have found that adding about 5 seconds of whitenoise to the beginning of the affected excerpt and retranscribing it usually corrects the punctuation.

Perhaps this could be incorporated to the code. Or, if there were a way to separate the affected region (e.g. with information from the VAD), a separate function could be written to check for this hallucination, export the WAV for the affected region and retranscribe.

tensorrt failures

If I install your pinned version of tensorrt (0.8.0.dev2024012301 - from your 'install_tensorrt.sh' file), then I get an undefined symbol when I load the model

model = whisper_s2t.load_model(model_identifier="large-v2", backend='TensorRT-LLM')

I get

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9is_pinned4callERKNS_6TensorEN3c108optionalINS5_6DeviceEEE.

If I don't use the pinned version (I just use -pre), then I get a different error:

  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/attention.py", line 1174, in forward
    assert qkv.ndim() == 2
AssertionError

... so just having some trouble getting the tensorrt thing to work. Your idea of providing a dockerfile would probably help a bunch...

TensorRT - avg_logprob

Thanks for your really impressive work.

I was wondering how to extract the token probability with TensorRT (a little bit lit what you did in this example with ctranslate2)

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0])
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
 'avg_logprob': -0.25426941679184695,
 'no_speech_prob': 8.147954940795898e-05,
 'start_time': 0.0,
 'end_time': 24.8}
"""

way to install just ctranslate2 backend

I use ctranslate2 in other parts of my program and was wondering if there's a way to simply install the ctranslate2 related dependencies? That way, it'll help avoid errors or version conflicts with the other libraries that my program may require?

AWESOME SPEED

I'm attaching this script for people who are interested since it worked for me. It's awesomespeed.

import whisper_s2t
from whisper_s2t.backends.ctranslate2.model import BEST_ASR_CONFIG, FAST_ASR_OPTIONS

model_kwargs = {
    'compute_type': 'float16',
    #'asr_options': BEST_ASR_CONFIG
    'asr_options': FAST_ASR_OPTIONS
}

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2', **model_kwargs)

files = ['test_audio_flac.flac']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=20)

# Concatenate the text from all utterances
transcription = " ".join([_['text'] for _ in out[0]]).strip()

with open('transcription.txt', 'w') as f:
    f.write(transcription)

Error using tensorRT-LLM as backend

Hi, I've encountered an error when using tensorRT-LLM as backend. Here're my steps.
First I installed latest whisperS2T using pip install -U git+https://github.com/shashikg/WhisperS2T.git in an existng docker container of nvidia's TensorRT with tags nvcr.io/nvidia/tensorrt:24.04-py3 and tensorRT-LLM version 0.10.0.dev2024043000.
Then I installed ffmpeg.
Then I tried to transcribe my audio file using

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='TensorRT-LLM',asr_options={'without_timestamp': False})

files = ['/workspace/2cf4b4ae9aaa699b0d3e945b8d1a861f.mp3']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=24)

print(out) # Print first utterance for first file

The logs are as follows:

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024043000
'trt_build_args' not provided in model_kwargs, using default configs.
100%|█████████████████████████████████████| 2.10M/2.10M [00:01<00:00, 1.66MiB/s]
100%|█████████████████████████████████████| 2.87G/2.87G [01:06<00:00, 46.3MiB/s]
⠹ Exporting Model To TensorRT Engine (3-6 mins) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0:00:01Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/engine_builder/builder.py", line 29, in <module>
    from tensorrt_llm.models import quantize_model
ImportError: cannot import name 'quantize_model' from 'tensorrt_llm.models' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py)
⠹ Exporting Model To TensorRT Engine (3-6 mins) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0:00:02
Traceback (most recent call last):
  File "/workspace/example.py", line 4, in <module>
    model = whisper_s2t.load_model(model_identifier="large-v2", backend='TensorRT-LLM',asr_options={'without_timestamp': False})
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/__init__.py", line 44, in load_model
    return WhisperModel(model_identifier, **model_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 108, in __init__
    self.model = WhisperTRT(self.model_path)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 168, in __init__
    self.encoder = WhisperEncoding(engine_dir)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 16, in __init__
    self.session = self.get_session(engine_dir)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 20, in get_session
    with open(config_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/whisper_s2t/models/trt/large-v2/c55664fdf5b447062c4cd7a0b64b72fc/encoder_config.json'

I've tried to figure out the cause. I think something was wrong in the building process of tensorRT-LLM. But I am not familiar with it. Could you please help me figure out the cause? Thanks!

Save output as file

Hey, is there a way to save output as file (json/vtt/srt)?

I've tried doing that yesterday using f.write, but failed to do so.
I have completly zero experience with Python, so this may be easy... or may be not.

I was used to whisper CLI tools, but I wanted to see if WhisperS2T is faster than insanely-fast-whisper.

Prompting causes crashes

Not sure how to supply a list of prompts corresponding to a list of filenames. If I pass in two different audio files to transcribe, with initial_prompts=None it works. If I pass initial_prompts=["",""], it works. If I pass in two non-empty strings for initial prompts ["prompt1", "prompt2"], I get:

[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301'trt_build_args' not provided in model_kwargs, using default configs.
Transcribing:   0%|                                                                                                                                                                                                                              | 0/200 [00:00<?, ?it/s][02/22/2024-16:05:09] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/22/2024-16:05:09] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
... lots more...

Question : How do i use it without the VAD

I am trying to use this project for translating a malayalam real world noise audio that i have to english . the VAD filter filters away a lot of important information as well , so i wanted to test things with it disabled . But the transcribe function requires a few extra parameters and I go no idea how to extract those out of the pipe that you built

Handle batch processing when few files fails in the whole batch

When my script batch processes a bunch of audio files using the approach you gave me to use a list of files and their settings when processing, if a single file fails for any reason, it prevents the transcriptions of all files' transcriptions from being done? I created a workaround to process each file to the transcribe_with_vad method (each using its own tqdm) and added error handling, which works. I was wondering if there's a way to make it so I can you your most efficient approach and still have error handling for a specific audio file? Here is the original script and a comparison with the single audio file processing with error handling:

import os
from PySide6.QtCore import QThread, Signal
from pathlib import Path
import whisper_s2t
import time

class Worker(QThread):
    finished = Signal(str)
    progress = Signal(str)

    def __init__(self, directory, recursive, output_format, device, size, quantization, beam_size, batch_size, task):
        super().__init__()
        self.directory = directory
        self.recursive = recursive
        self.output_format = output_format
        self.device = device
        self.size = size
        self.quantization = quantization
        self.beam_size = beam_size
        self.batch_size = batch_size
        self.task = task.lower()

    def run(self):
        directory_path = Path(self.directory)
        patterns = ['*.mp3', '*.wav', '*.flac', '*.wma']
        audio_files = []

        if self.recursive:
            for pattern in patterns:
                audio_files.extend(directory_path.rglob(pattern))
        else:
            for pattern in patterns:
                audio_files.extend(directory_path.glob(pattern))

        max_threads = os.cpu_count()
        cpu_threads = max((2 * max_threads) // 3, 4) if max_threads is not None else 4

        model_identifier = f"ctranslate2-4you/whisper-{self.size}-ct2-{self.quantization}"
        model = whisper_s2t.load_model(model_identifier=model_identifier, backend='CTranslate2', device=self.device, compute_type=self.quantization, asr_options={'beam_size': self.beam_size}, cpu_threads=cpu_threads)

        audio_files_str = [str(file) for file in audio_files]
        output_file_paths = [str(file.with_suffix(f'.{self.output_format}')) for file in audio_files]

        lang_codes = 'en'
        tasks = self.task
        initial_prompts = None

        start_time = time.time()

        if audio_files_str:
            self.progress.emit(f"Processing {len(audio_files_str)} files...")
            out = model.transcribe_with_vad(audio_files_str, lang_codes=lang_codes, tasks=tasks, initial_prompts=initial_prompts, batch_size=self.batch_size)
            whisper_s2t.write_outputs(out, format=self.output_format, op_files=output_file_paths)

            for original_audio_file, output_file_path in zip(audio_files, output_file_paths):
                self.progress.emit(f"{tasks.capitalize()} {original_audio_file} to {output_file_path}")

        processing_time = time.time() - start_time
        self.finished.emit(f"Total processing time: {processing_time:.2f} seconds")

image

mismatch in compute_type when running on cpu

I noticed that CPUs only support float32 per here:

https://opennmt.net/CTranslate2/quantization.html

However, your default "ASR" settings always put float16.

And then all of the models uploaded by Systran are in float16 as well.

It's my understanding that ctranslate2 will change the to the correct compute_type at runtime - even if "compute_type" is explicitly specified...but this results in, when running on CPU at least, a float16 model being run as float32. It's my understanding that this results in a quality loss (how much I'm not sure) as well as additional time to convert at runtime (again, how much time I'm not sure).

Not sure how you'd want to handle this...Again, my huggingface repository contains all sizes of the Whisper models converted to ctranslate2 format in float32 if you wanted...

I've tested CPU usage (specifying "cpu_threads" as well) in yet another script of mine but it's too long to paste here...let me know if you want the script or if this helps. Thanks!

dependency conflicts, please help me use your library!

In an effort to incorporate your awesome library into my program, I'm trying to make sure that all the versions of my dependencies will work with yours. Would you mind specifying which versions your program requires as well as which range? For example, you just state "torch," "accelerate," "transformers" without specifying a release version (or a range).

For example, I'm currently using faster-whisper==0.10.0 . This version only supports CUDA 11.8 because the maximum version of ctranslate2 it supports is 3.24.

Ctranslate2 has come out, which supports CUDA 12. Finally, faster-whisper released version 1.0 today, which can use ctranslate2 4.0, but it's not on pypi.org yet...

Rather than way for it be uploaded to pypi.org, I'd like to switch to your program instead...it's faster anyways...

Here's my planned dependencies (including your library of course), and if you could please let me know of any conflicts with your library I'd appreciate it:

torch==2.2.0+cu121
torchvision==0.17.0+cu121
torchaudio==2.2.0+cu121

  • I don't plan to support pytorch 2.1.2 anymore...
  • I don't plan to support CUDA 11.8 anymore...

accelerate==0.25.0
optimum==1.15.0
numpy 1.26.4
tokenizers==0.15.2
huggingface_hub==0.20.3
transformers==4.37.2
openai==1.12.0 (not openai-whisper)
nvidia-ml-py==12.535.133

My program of course has other dependencies that are installed (i.e. dependencies of dependencies), but these are all of the ones that are also listed as dependencies in your requirements.txt file.

PLEASE keep in mind that I would solely be using the ctranslate2 backend in your program. Thus, I would not need flash attention 2, for example, since ctranslate2 doesn't use it like I'm assuming your huggingface backend does. Any advice is much appreciated. Thanks.

depricated flag for flash attention 2 with huggingface backend

Hello, just FYI in case you didn't know, apparently Huggingface changed the flag/parameter or what not when trying to specify flash attention 2. Here's the message I got:

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

And here's the script I am testing:

import whisper_s2t

model_kwargs = {
    'compute_type': 'float16',
    'asr_options': {
    "beam_size": 5,
    "without_timestamps": True,
    "return_scores": False,
    "return_no_speech_prob": False,
    "use_flash_attention": True,
    "use_better_transformer": False,
},
    'model_identifier': "small",
    'backend': 'HuggingFace',
}

model = whisper_s2t.load_model(**model_kwargs)

files = ['test_audio_flac.flac']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=20)

transcription = " ".join([_['text'] for _ in out[0]]).strip()

with open('transcription.txt', 'w') as f:
    f.write(transcription)

BTW, I tried using the newer attn_implementation="flash_attention_2" with Bark and COULD NOT get it to work...yet with your program that uses the old use_flash_attention_2=Trueit works. I don't know if it was my script or the different flags....but just be aware in case.

Support for custom audio file

When I try to run the pipeline on the following file: https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac

I get the following error

     27 wav_file = f"{tmpdir}/tmp.wav"
     28 os.system(f'ffmpeg -hide_banner -loglevel panic -i {input_file} -threads 1 -acodec pcm_s16le -ac 1 -af aresample=resampler=soxr -ar {sr} {wav_file} -y')
---> 30 with wave.open(wav_file, 'rb') as wf:
     31     frames = wf.getnframes()
     32     x = wf.readframes(int(frames))

File /nfs/students/rachwan/miniconda3/envs/pruna_pypi/lib/python3.9/wave.py:509, in open(f, mode)
    507         mode = 'rb'
    508 if mode in ('r', 'rb'):
--> 509     return Wave_read(f)
    510 elif mode in ('w', 'wb'):
    511     return Wave_write(f)

File /nfs/students/rachwan/miniconda3/envs/pruna_pypi/lib/python3.9/wave.py:163, in Wave_read.__init__(self, f)
    161 # else, assume it is an open file object already
    162 try:
--> 163     self.initfp(f)
    164 except:
    165     if self._i_opened_the_file:

File /nfs/students/rachwan/miniconda3/envs/pruna_pypi/lib/python3.9/wave.py:128, in Wave_read.initfp(self, file)
    126 self._convert = None
    127 self._soundpos = 0
--> 128 self._file = Chunk(file, bigendian = 0)
    129 if self._file.getname() != b'RIFF':
    130     raise Error('file does not start with RIFF id')

File /nfs/students/rachwan/miniconda3/envs/pruna_pypi/lib/python3.9/chunk.py:63, in Chunk.__init__(self, file, align, bigendian, inclheader)
     61 self.chunkname = file.read(4)
     62 if len(self.chunkname) < 4:
---> 63     raise EOFError
     64 try:
     65     self.chunksize = struct.unpack_from(strflag+'L', file.read(4))[0]

EOFError:

Any idea why this could be happening ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.