GithubHelp home page GithubHelp logo

hstehstehste / lhotse Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lhotse-speech/lhotse

0.0 0.0 0.0 31.06 MB

Tools for handling speech data in machine learning projects.

Home Page: https://lhotse.readthedocs.io/en/latest/

License: Apache License 2.0

Shell 0.03% Python 99.97%

lhotse's Introduction

PyPI Status Python Versions PyPI Status Build Status Documentation Status codecov Code style: black Open In Colab arXiv

Lhotse

Lhotse is a Python library aiming to make speech and audio data preparation flexible and accessible to a wider community. Alongside k2, it is a part of the next generation Kaldi speech processing library.

Tutorial presentations and materials

About

Main goals

  • Attract a wider community to speech processing tasks with a Python-centric design.
  • Accommodate experienced Kaldi users with an expressive command-line interface.
  • Provide standard data preparation recipes for commonly used corpora.
  • Provide PyTorch Dataset classes for speech and audio related tasks.
  • Flexible data preparation for model training with the notion of audio cuts.
  • Efficiency, especially in terms of I/O bandwidth and storage capacity.

Tutorials

We currently have the following tutorials available in examples directory:

  • Basic complete Lhotse workflow Colab
  • Transforming data with Cuts Colab
  • WebDataset integration Colab
  • How to combine multiple datasets Colab
  • Lhotse Shar: storage format optimized for sequential I/O and modularity Colab

Examples of use

Check out the following links to see how Lhotse is being put to use:

  • Icefall recipes: where k2 and Lhotse meet.
  • Minimal ESPnet+Lhotse example: Colab

Main ideas

Like Kaldi, Lhotse provides standard data preparation recipes, but extends that with a seamless PyTorch integration through task-specific Dataset classes. The data and meta-data are represented in human-readable text manifests and exposed to the user through convenient Python classes.

image

Lhotse introduces the notion of audio cuts, designed to ease the training data construction with operations such as mixing, truncation and padding that are performed on-the-fly to minimize the amount of storage required. Data augmentation and feature extraction are supported both in pre-computed mode, with highly-compressed feature matrices stored on disk, and on-the-fly mode that computes the transformations upon request. Additionally, Lhotse introduces feature-space cut mixing to make the best of both worlds.

image

Installation

Lhotse supports Python version 3.7 and later.

Pip

Lhotse is available on PyPI:

pip install lhotse

To install the latest, unreleased version, do:

pip install git+https://github.com/lhotse-speech/lhotse

Development installation

For development installation, you can fork/clone the GitHub repo and install with pip:

git clone https://github.com/lhotse-speech/lhotse
cd lhotse
pip install -e '.[dev]'
pre-commit install  # installs pre-commit hooks with style checks

# Running unit tests
pytest test

# Running linter checks
pre-commit run

This is an editable installation (-e option), meaning that your changes to the source code are automatically reflected when importing lhotse (no re-install needed). The [dev] part means you're installing extra dependencies that are used to run tests, build documentation or launch jupyter notebooks.

Optional dependencies

Other pip packages. You can leverage optional features of Lhotse by installing the relevant supporting package like this: pip install lhotse[package_name]. The supported optional packages include:

  • pip install lhotse[kaldi] for a maximal feature set related to Kaldi compatibility. It includes libraries such as kaldi_native_io (a more efficient variant of kaldi_io) and kaldifeat that port some of Kaldi functionality into Python.
  • pip install lhotse[orjson] for up to 50% faster reading of JSONL manifests.
  • pip install lhotse[webdataset]. We support "compiling" your data into WebDataset tarball format for more effective IO. You can still interact with the data as if it was a regular lazy CutSet. To learn more, check out the following tutorial: Colab
  • pip install h5py if you want to extract speech features and store them as HDF5 arrays.
  • pip install dill. When dill is installed, we'll use it to pickle CutSet that uses a lambda function in calls such as .map or .filter. This is helpful in PyTorch DataLoader with num_jobs>0. Without dill, depending on your environment, you'll see an exception or a hanging script.
  • pip install smart_open to read and write manifests and data in any location supported by smart_open (e.g. cloud, http).
  • pip install opensmile for feature extraction using the OpenSmile toolkit's Python wrapper.

sph2pipe. For reading older LDC SPHERE (.sph) audio files that are compressed with codecs unsupported by ffmpeg and sox, please run:

# CLI
lhotse install-sph2pipe

# Python
from lhotse.tools import install_sph2pipe
install_sph2pipe()

It will download it to ~/.lhotse/tools, compile it, and auto-register in PATH. The program should be automatically detected and used by Lhotse.

Examples

We have example recipes showing how to prepare data and load it in Python as a PyTorch Dataset. They are located in the examples directory.

A short snippet to show how Lhotse can make audio data preparation quick and easy:

from torch.utils.data import DataLoader
from lhotse import CutSet, Fbank
from lhotse.dataset import VadDataset, SimpleCutSampler
from lhotse.recipes import prepare_switchboard

# Prepare data manifests from a raw corpus distribution.
# The RecordingSet describes the metadata about audio recordings;
# the sampling rate, number of channels, duration, etc.
# The SupervisionSet describes metadata about supervision segments:
# the transcript, speaker, language, and so on.
swbd = prepare_switchboard('/export/corpora3/LDC/LDC97S62')

# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.
cuts = CutSet.from_manifests(
    recordings=swbd['recordings'],
    supervisions=swbd['supervisions']
).cut_into_windows(duration=5)

# We compute the log-Mel filter energies and store them on disk;
# Then, we pad the cuts to 5 seconds to ensure all cuts are of equal length,
# as the last window in each recording might have a shorter duration.
# The padding will be performed once the features are loaded into memory.
cuts = cuts.compute_and_store_features(
    extractor=Fbank(),
    storage_path='feats',
    num_jobs=8
).pad(duration=5.0)

# Construct a Pytorch Dataset class for Voice Activity Detection task:
dataset = VadDataset()
sampler = SimpleCutSampler(cuts, max_duration=300)
dataloader = DataLoader(dataset, sampler=sampler, batch_size=None)
batch = next(iter(dataloader))

The VadDataset will yield a batch with pairs of feature and supervision tensors such as the following - the speech starts roughly at the first second (100 frames):

image

lhotse's People

Contributors

pzelasko avatar desh2608 avatar jimbozhang avatar jtrmal avatar amirhussein96 avatar luomingshuang avatar csukuangfj avatar oplatek avatar pkufool avatar zjwang21 avatar janvainer avatar marcinwitkowski avatar lassewolter avatar yfyeung avatar wgb14 avatar jinzr avatar popcornell avatar teowenshen avatar mikuchar avatar flyingleafe avatar karelvesely84 avatar freewym avatar lifeiteng avatar tomiinek avatar boren-ms avatar m-wiesner avatar s-mousmita avatar shanguanma avatar manbaaaa avatar rouseabout avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.