FeatureSet - scope about lhotse HOT 10 CLOSED

lhotse-speech commented on September 7, 2024

FeatureSet - scope

from lhotse.

Comments (10)

entn-at commented on September 7, 2024 1

PyTorch audio also uses librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16

torchaudio only uses librosa for running compatibility tests; they wrote their own (compatible) feature extraction routines as PyTorch jit-able modules (including deltas and sliding CMN). They seem to have implemented support for two backends for reading audio files (sox and libsoundfile, the latter also works on Windows..) and are working on replacing sox effects with PyTorch versions (see pytorch/audio#260 for a list of what they already implemented). I guess the point of that effort is to be able to use them on the fly during training.

from lhotse.

danpovey commented on September 7, 2024 1

Makes sense I guess (although we'd have to make sure the defaults were stable when we do the release).

It might make sense to support writing the manifest files compressed, as they could get large and should be highly compressible.

from lhotse.

jtrmal commented on September 7, 2024

…

On Wed, Apr 29, 2020 at 7:44 PM Piotr Żelasko ***@***.***> wrote: I'm thinking about the FeatureSet, and I'm not sure what's the scope of operations we'd like to support in lhotse. We will use lilcom to load/store the feature matrices, but what about feature extraction? Should we just use something precomputed e.g. with Kaldi, or also extract them on from the FeatureSet API level? If the second is true, we'll either need to use some other library (e.g. librosa) or delegate feature extraction to Kaldi by running it as a subprocess (unless there are some Python bindings available). I guess the same questions apply to data augmentation (we'll get to that after having something initial working for features and having some example dataset represented in lhotse). Of course, having the whole data augmentation + feature extraction pipeline as a part of lhotse would be more convenient in the long run. It'll just take longer to get there. @danpovey <https://github.com/danpovey> @jtrmal <https://github.com/jtrmal> WDYT? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYX4666F7X22LYC2OWCLRPC3V3ANCNFSM4MURQR2Q> .

from lhotse.

pzelasko commented on September 7, 2024

Yes, it does, thanks. I might code up a prototype with librosa and we'll see then. śr., 29 kwi 2020, 20:05 użytkownik jtrmal <[email protected]> napisał:

…

I think librosa is a sane choice. I think we need.a support on Set Level, because at this stage you still cannot handle the chunks/blocks independently, due some edge effects during the feature extraction process. Does that count as opinion on the question you asked or am I answering a different question? :) y. On Wed, Apr 29, 2020 at 7:44 PM Piotr Żelasko ***@***.***> wrote: > I'm thinking about the FeatureSet, and I'm not sure what's the scope of > operations we'd like to support in lhotse. We will use lilcom to load/store > the feature matrices, but what about feature extraction? Should we just use > something precomputed e.g. with Kaldi, or also extract them on from the > FeatureSet API level? If the second is true, we'll either need to use > some other library (e.g. librosa) or delegate feature extraction to Kaldi > by running it as a subprocess (unless there are some Python bindings > available). I guess the same questions apply to data augmentation (we'll > get to that after having something initial working for features and having > some example dataset represented in lhotse). > > Of course, having the whole data augmentation + feature extraction > pipeline as a part of lhotse would be more convenient in the long run. > It'll just take longer to get there. @danpovey > <https://github.com/danpovey> @jtrmal <https://github.com/jtrmal> WDYT? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8>, or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ACUKYX4666F7X22LYC2OWCLRPC3V3ANCNFSM4MURQR2Q > > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZRKQBQ52XWCICENWDBESDRPC6EVANCNFSM4MURQR2Q> .

from lhotse.

csukuangfj commented on September 7, 2024

PyTorch audio also uses librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16

from lhotse.

pzelasko commented on September 7, 2024

Thanks, that's actually very useful. It seems they put some effort into being compatible with Kaldi since the last time I checked. Given that Pytorch is going to be our core dependency anyway, I'll just use torchaudio.

from lhotse.

csukuangfj commented on September 7, 2024

Another useful tool from PyTorch audio is the wrapper around sox, which can perform on the fly data augmentation.
https://pytorch.org/audio/sox_effects.html

from lhotse.

danpovey commented on September 7, 2024

That looks useful.

from lhotse.

danpovey commented on September 7, 2024

Thanks for the info, Ewald!

…

On Fri, May 1, 2020 at 11:39 PM Ewald Enzinger ***@***.***> wrote: PyTorch audio also uses librosa https://github.com/pytorch/audio/blob/master/requirements.txt#L16 torchaudio only uses librosa for running compatibility tests; they wrote their own (compatible) feature extraction routines as PyTorch jit-able modules (including deltas and sliding CMN). They seem to have implemented support for two backends for reading audio files (sox and libsoundfile, the latter also works on Windows..) and are working on replacing sox effects with PyTorch versions (see pytorch/audio#260 <pytorch/audio#260> for a list of what they already implemented). I guess the point of that effort is to be able to use them on the fly during training. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO5Q4NBXZOTXCBG7WPDRPLUL7ANCNFSM4MURQR2Q> .

from lhotse.

pzelasko commented on September 7, 2024

I'm thinking of refactoring how the feature extraction configuration is stored: Instead of storing a "global" config for the features in the manifest, store only the non-default settings in each Features objects manifest (along with the feature type).

It would result in sth like:

features:
- channel_id: 0
  config:
    feature_type: fbank
    frame_shift: 12.0
    snip_edges: true
  duration: 4.3275
  recording_id: 100-121669-0026_718-129597-0003
  start: 0.0
  storage_path: librimix/storage/5a77fc36-2ec4-48d2-b2fb-ffc878840c03.llc
  storage_type: lilcom
- channel_id: 1
  config:
    feature_type: fbank
    frame_shift: 10.0
    snip_edges: true
  duration: 4.3275
  recording_id: 100-121669-0026_718-129597-0003
  start: 0.0
  storage_path: librimix/storage/1e19dc6f-9809-4a7a-b9d8-43c1652b0bc1.llc
  storage_type: lilcom
- channel_id: 0
  config:
    feature_type: fbank
    frame_shift: 12.0
    snip_edges: false
  duration: 7.0175
  recording_id: 1025-92820-0032_8410-278217-0015
  start: 0.0
  storage_path: librimix/storage/03aeab68-5605-4731-bcc7-a7e7d84f7f3f.llc
  storage_type: lilcom
...

It'll make it possible (or much easier) to gather together features with perturbed parametrization should we want to explore that.

from lhotse.

FeatureSet - scope about lhotse HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs