Comments (9)
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
whyduration - start
rather than justduration
?
My bad - the variable should be called end
, the code should work fine otherwise.
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not[n_sources, n_channels, n_samples]
?
I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4 AudioSource
objects inside a Recording
to represent that. Does that make it clearer?
I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!
from lhotse.
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
whyduration - start
rather than justduration
?My bad - the variable should be called
end
, the code should work fine otherwise.https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not[n_sources, n_channels, n_samples]
?I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4
AudioSource
objects inside aRecording
to represent that. Does that make it clearer?
I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!
Thanks! what is the size of each element in the list samples_per_source
? I assume it is already 2D?
from lhotse.
Thanks! what is the size of each element in the list
samples_per_source
? I assume it is already 2D?
Yes - for mono files it'd be (1, num_samples), for stereo files (2, num_samples), and the stack
operation concats across the channel dim.
from lhotse.
I think np.vstack()
will add one more dimension on top on the existing tensor, so the resulting tensor will be in 3D?
from lhotse.
I don't think so:
import numpy
x = numpy.ones((1, 1000))
y = numpy.zeros((2, 1000))
numpy.vstack([x, y]).shape
Out[5]: (3, 1000)
from lhotse.
Oh OK, maybe only for 1D vectors it adds one more dimension.
from lhotse.
I think np.vstack() will add one more dimension on top on the existing tensor
np.stack
will add an extra dimension, not np.vstack
.
from lhotse.
np.vstack seems to have a very confusing interface. If np.stack is usable, I'd prefer it.
from lhotse.
.. or np.concatenate if not, maybe?
from lhotse.
Related Issues (20)
- Create a custom audio transformation HOT 1
- Describe on cuts does not display supervision custom info
- PR #1332 breaks many operations HOT 1
- 'ascii' codec can't encode characters in position 219-247 in processing wenet speech dateset HOT 1
- Multiple feature extractors in a single Cut HOT 1
- AttributeError: 'dict' object has no attribute 'to_dict' HOT 2
- error in window 11 installation HOT 1
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 505-506: ordinal not in range(128) HOT 2
- OSError: [Errno 9] Unable to synchronously open file (unable to lock file, errno = 9, error message = 'Bad file descriptor') HOT 1
- Support for Video Features, for example How2Sign HOT 7
- dataloader slow with shar HOT 3
- Feature calculation process crashing with large dataset HOT 1
- How to combine with huggingface audio datasets? HOT 1
- AudioTransforms are dropped when saving MixedCuts? HOT 2
- How does tar work with DynamicBucketingSampler?
- Read seperate .jsonl.gz from fbank filter them and make a Cutset into single variable. HOT 3
- How to load parquet file effectively with Lhotse? HOT 1
- MUSAN mix to current CutSet: Cannot load audio of cuts in a lazy CutSet. HOT 3
- Unknown manifest type error for `jsonl.gz` manifests HOT 1
- AttributeError: 'NoneType' object has no attribute 'data' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lhotse.