An OpenAI Audio Speech-to-Text API-compatible inference service using Whisper-JAX and FastAPI.
NVIDIA graphics cards supporting CUDA 12 with at least 10GB of VRAM
docker nvidia-container-toolkit container-toolkit
docker build . -t whisper-jax-infer
docker run -d -p 8050:8050 --runtime=nvidia --gpus all whisper-jax-infer