GithubHelp home page GithubHelp logo

ml_sound_demo's Introduction

ml_sound_demo

Machine learning demonstration using Whisper for audio transcription and Wav2Vec2 for audio gender classification.

Installation

External Dependencies

ffmpeg

Python Dependencies

pytorch transformers datasets soundfile librosa evaluate jiwer scikit-learn matplotlib

Using pip

$ pip install .

Using nix

$ nix develop

ensure experimental commands are enabled

Examples

Transcribe first 100 audio clips of Common Voice 13.0:

$ python tests/transcribe_100.py

Extract gender from first 1000 audio clips of Common Voice 13.0:

$ python tests/gender_1000.py

Train gender classification model using Common Voice 13.0:

$ python training/gender.py

Project Structure

ml_sound_demo/
├── flake.lock                      # nix flake lock file for deterministic builds
├── flake.nix                       # dev environment flake for the nix package manager
├── LICENSE.md
├── ml_sound_demo
│   ├── gender.py                   # dataset gender classification interface
│   ├── __init__.py                 # export public interface
│   └── transcribe.py               # dataset transcription interface
├── output
│   ├── gender_classification_model # trained gender classification model
│   ├── transcription_100.csv       # first 100 transcriptions in csv
│   └── transcription_100.txt       # first 100 transcriptions in txt
├── pyproject.toml                  # project manifest
├── README.md                       # you know...
├── tests
│   ├── gender_1000.py              # gender of 1000 audio clips in common voice 13.0
│   ├── __init__.py
│   └── transcribe_100.py           # transcription of 100 audio clips in common voice 13.0
└── training
    └── gender.py                   # gender classification model training

Outstanding Questions

  • How are projects similar to this usually structured? File structure? API structure?
  • Which libraries are actually needed/recommended?
  • How can I make the code better/more optimal? Which functions to use? Which properties to tweak?
    • Conventions?

ml_sound_demo's People

Contributors

ok-nick avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.