Comments (10)
PyTorch audio also uses
librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16
torchaudio
only uses librosa
for running compatibility tests; they wrote their own (compatible) feature extraction routines as PyTorch jit-able modules (including deltas and sliding CMN). They seem to have implemented support for two backends for reading audio files (sox
and libsoundfile
, the latter also works on Windows..) and are working on replacing sox effects with PyTorch versions (see pytorch/audio#260 for a list of what they already implemented). I guess the point of that effort is to be able to use them on the fly during training.
from lhotse.
Makes sense I guess (although we'd have to make sure the defaults were stable when we do the release).
It might make sense to support writing the manifest files compressed, as they could get large and should be highly compressible.
from lhotse.
from lhotse.
from lhotse.
PyTorch audio also uses librosa
https://github.com/pytorch/audio/blob/master/requirements.txt#L16
from lhotse.
Thanks, that's actually very useful. It seems they put some effort into being compatible with Kaldi since the last time I checked. Given that Pytorch is going to be our core dependency anyway, I'll just use torchaudio.
from lhotse.
Another useful tool from PyTorch audio is the wrapper around sox
, which can perform on the fly data augmentation.
https://pytorch.org/audio/sox_effects.html
from lhotse.
That looks useful.
from lhotse.
from lhotse.
I'm thinking of refactoring how the feature extraction configuration is stored: Instead of storing a "global" config for the features in the manifest, store only the non-default settings in each Features
objects manifest (along with the feature type).
It would result in sth like:
features:
- channel_id: 0
config:
feature_type: fbank
frame_shift: 12.0
snip_edges: true
duration: 4.3275
recording_id: 100-121669-0026_718-129597-0003
start: 0.0
storage_path: librimix/storage/5a77fc36-2ec4-48d2-b2fb-ffc878840c03.llc
storage_type: lilcom
- channel_id: 1
config:
feature_type: fbank
frame_shift: 10.0
snip_edges: true
duration: 4.3275
recording_id: 100-121669-0026_718-129597-0003
start: 0.0
storage_path: librimix/storage/1e19dc6f-9809-4a7a-b9d8-43c1652b0bc1.llc
storage_type: lilcom
- channel_id: 0
config:
feature_type: fbank
frame_shift: 12.0
snip_edges: false
duration: 7.0175
recording_id: 1025-92820-0032_8410-278217-0015
start: 0.0
storage_path: librimix/storage/03aeab68-5605-4731-bcc7-a7e7d84f7f3f.llc
storage_type: lilcom
...
It'll make it possible (or much easier) to gather together features with perturbed parametrization should we want to explore that.
from lhotse.
Related Issues (20)
- Create a custom audio transformation HOT 1
- Describe on cuts does not display supervision custom info
- PR #1332 breaks many operations HOT 1
- 'ascii' codec can't encode characters in position 219-247 in processing wenet speech dateset HOT 1
- Multiple feature extractors in a single Cut HOT 1
- AttributeError: 'dict' object has no attribute 'to_dict' HOT 2
- error in window 11 installation HOT 1
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 505-506: ordinal not in range(128) HOT 2
- OSError: [Errno 9] Unable to synchronously open file (unable to lock file, errno = 9, error message = 'Bad file descriptor') HOT 1
- Support for Video Features, for example How2Sign HOT 7
- dataloader slow with shar HOT 3
- Feature calculation process crashing with large dataset HOT 1
- How to combine with huggingface audio datasets? HOT 1
- AudioTransforms are dropped when saving MixedCuts? HOT 2
- How does tar work with DynamicBucketingSampler?
- Read seperate .jsonl.gz from fbank filter them and make a Cutset into single variable. HOT 3
- How to load parquet file effectively with Lhotse? HOT 1
- MUSAN mix to current CutSet: Cannot load audio of cuts in a lazy CutSet. HOT 3
- Unknown manifest type error for `jsonl.gz` manifests HOT 1
- AttributeError: 'NoneType' object has no attribute 'data' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lhotse.