I came across a few confusions while I was reading the code in order to write an examp

<a href="https://github.com/lhotse-speech/lhotse/blob/master

Thanks! what is the size of each element in the list <code class="notrans

I don't think so: <div class="snippet-clipboard-content notranslate position-relat

A few questions... about lhotse HOT 9 CLOSED

lhotse-speech commented on September 7, 2024

A few questions...

from lhotse.

Comments (9)

pzelasko commented on September 7, 2024

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
why duration - start rather than just duration?

My bad - the variable should be called end, the code should work fine otherwise.

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not [n_sources, n_channels, n_samples]?

I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4 AudioSource objects inside a Recording to represent that. Does that make it clearer?

I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!

from lhotse.

freewym commented on September 7, 2024

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
why duration - start rather than just duration?

My bad - the variable should be called end, the code should work fine otherwise.

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not [n_sources, n_channels, n_samples]?

I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4 AudioSource objects inside a Recording to represent that. Does that make it clearer?

I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!

Thanks! what is the size of each element in the list samples_per_source? I assume it is already 2D?

from lhotse.

pzelasko commented on September 7, 2024

Thanks! what is the size of each element in the list samples_per_source? I assume it is already 2D?

Yes - for mono files it'd be (1, num_samples), for stereo files (2, num_samples), and the stack operation concats across the channel dim.

from lhotse.

freewym commented on September 7, 2024

I think np.vstack() will add one more dimension on top on the existing tensor, so the resulting tensor will be in 3D?

from lhotse.

pzelasko commented on September 7, 2024

I don't think so:

import numpy
x = numpy.ones((1, 1000))
y = numpy.zeros((2, 1000))
numpy.vstack([x, y]).shape
Out[5]: (3, 1000)

from lhotse.

freewym commented on September 7, 2024

Oh OK, maybe only for 1D vectors it adds one more dimension.

from lhotse.

csukuangfj commented on September 7, 2024

I think np.vstack() will add one more dimension on top on the existing tensor

np.stack will add an extra dimension, not np.vstack.

from lhotse.

danpovey commented on September 7, 2024

np.vstack seems to have a very confusing interface. If np.stack is usable, I'd prefer it.

from lhotse.

danpovey commented on September 7, 2024

.. or np.concatenate if not, maybe?

from lhotse.

A few questions... about lhotse HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs