GithubHelp home page GithubHelp logo

FeatureSet - scope about lhotse HOT 10 CLOSED

lhotse-speech avatar lhotse-speech commented on September 7, 2024
FeatureSet - scope

from lhotse.

Comments (10)

entn-at avatar entn-at commented on September 7, 2024 1

PyTorch audio also uses librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16

torchaudio only uses librosa for running compatibility tests; they wrote their own (compatible) feature extraction routines as PyTorch jit-able modules (including deltas and sliding CMN). They seem to have implemented support for two backends for reading audio files (sox and libsoundfile, the latter also works on Windows..) and are working on replacing sox effects with PyTorch versions (see pytorch/audio#260 for a list of what they already implemented). I guess the point of that effort is to be able to use them on the fly during training.

from lhotse.

danpovey avatar danpovey commented on September 7, 2024 1

Makes sense I guess (although we'd have to make sure the defaults were stable when we do the release).

It might make sense to support writing the manifest files compressed, as they could get large and should be highly compressible.

from lhotse.

jtrmal avatar jtrmal commented on September 7, 2024

from lhotse.

pzelasko avatar pzelasko commented on September 7, 2024

from lhotse.

csukuangfj avatar csukuangfj commented on September 7, 2024

PyTorch audio also uses librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16

from lhotse.

pzelasko avatar pzelasko commented on September 7, 2024

Thanks, that's actually very useful. It seems they put some effort into being compatible with Kaldi since the last time I checked. Given that Pytorch is going to be our core dependency anyway, I'll just use torchaudio.

from lhotse.

csukuangfj avatar csukuangfj commented on September 7, 2024

Another useful tool from PyTorch audio is the wrapper around sox, which can perform on the fly data augmentation.
https://pytorch.org/audio/sox_effects.html

from lhotse.

danpovey avatar danpovey commented on September 7, 2024

That looks useful.

from lhotse.

danpovey avatar danpovey commented on September 7, 2024

from lhotse.

pzelasko avatar pzelasko commented on September 7, 2024

I'm thinking of refactoring how the feature extraction configuration is stored: Instead of storing a "global" config for the features in the manifest, store only the non-default settings in each Features objects manifest (along with the feature type).

It would result in sth like:

features:
- channel_id: 0
  config:
    feature_type: fbank
    frame_shift: 12.0
    snip_edges: true
  duration: 4.3275
  recording_id: 100-121669-0026_718-129597-0003
  start: 0.0
  storage_path: librimix/storage/5a77fc36-2ec4-48d2-b2fb-ffc878840c03.llc
  storage_type: lilcom
- channel_id: 1
  config:
    feature_type: fbank
    frame_shift: 10.0
    snip_edges: true
  duration: 4.3275
  recording_id: 100-121669-0026_718-129597-0003
  start: 0.0
  storage_path: librimix/storage/1e19dc6f-9809-4a7a-b9d8-43c1652b0bc1.llc
  storage_type: lilcom
- channel_id: 0
  config:
    feature_type: fbank
    frame_shift: 12.0
    snip_edges: false
  duration: 7.0175
  recording_id: 1025-92820-0032_8410-278217-0015
  start: 0.0
  storage_path: librimix/storage/03aeab68-5605-4731-bcc7-a7e7d84f7f3f.llc
  storage_type: lilcom
...

It'll make it possible (or much easier) to gather together features with perturbed parametrization should we want to explore that.

from lhotse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.