This project provides a Python implementation for generating speaker-diarized dialogue transcripts from audio recordings of conference meetings. It leverages the Whisper library for fast and flexible speech recognition.
- Transcribes audio files using Whisper models.
- Performs speaker diarization to identify and separate speakers.
- Saves transcripts with speaker information in a human-readable format.
Ensure you have Python 3.9 or later installed.
pip install -r requirements.txt
pip install git+https://github.com/m-bain/whisperx.git
bash run_app.sh
- This project is under active development. Refer to the code for the latest features and functionalities.
- For more advanced usage or customization, explore the Audio2Dia class implementation and experiment with different configuration options.
- Consider responsible use of this technology, ensuring you have appropriate permissions and consent for processing audio recordings.
- This project uses modules from the Whisper repository (https://github.com/openai/whisper)
We welcome contributions to this project! Please refer to the CONTRIBUTING.md file for guidelines on how to submit pull requests.
This project is licensed under the MIT License (see LICENSE file for details).
This project is provided for educational and research purposes only. The accuracy and performance may vary depending on audio quality, model configuration, and other factors. Use it responsibly and at your own discretion.