GithubHelp home page GithubHelp logo

A few questions... about lhotse HOT 9 CLOSED

lhotse-speech avatar lhotse-speech commented on September 7, 2024
A few questions...

from lhotse.

Comments (9)

pzelasko avatar pzelasko commented on September 7, 2024

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
why duration - start rather than just duration?

My bad - the variable should be called end, the code should work fine otherwise.

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not [n_sources, n_channels, n_samples]?

I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4 AudioSource objects inside a Recording to represent that. Does that make it clearer?

I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!

from lhotse.

freewym avatar freewym commented on September 7, 2024

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/kaldi.py#L68
why duration - start rather than just duration?

My bad - the variable should be called end, the code should work fine otherwise.

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L178
why not [n_sources, n_channels, n_samples]?

I think that might be a naming collision - "source" in this context is supposed to mean "some storage containing audio samples for one or more channels" (e.g. a file), rather than an acoustic sound source in some physical space... So if some corpus has recordings from an array of 4 mics stored in separate wav files, there would be 4 AudioSource objects inside a Recording to represent that. Does that make it clearer?

I'll make a PR soon with updates to fix/clarify based on your feedback - thanks!

Thanks! what is the size of each element in the list samples_per_source? I assume it is already 2D?

from lhotse.

pzelasko avatar pzelasko commented on September 7, 2024

Thanks! what is the size of each element in the list samples_per_source? I assume it is already 2D?

Yes - for mono files it'd be (1, num_samples), for stereo files (2, num_samples), and the stack operation concats across the channel dim.

from lhotse.

freewym avatar freewym commented on September 7, 2024

I think np.vstack() will add one more dimension on top on the existing tensor, so the resulting tensor will be in 3D?

from lhotse.

pzelasko avatar pzelasko commented on September 7, 2024

I don't think so:

import numpy
x = numpy.ones((1, 1000))
y = numpy.zeros((2, 1000))
numpy.vstack([x, y]).shape
Out[5]: (3, 1000)

from lhotse.

freewym avatar freewym commented on September 7, 2024

Oh OK, maybe only for 1D vectors it adds one more dimension.

from lhotse.

csukuangfj avatar csukuangfj commented on September 7, 2024

I think np.vstack() will add one more dimension on top on the existing tensor

np.stack will add an extra dimension, not np.vstack.

from lhotse.

danpovey avatar danpovey commented on September 7, 2024

np.vstack seems to have a very confusing interface. If np.stack is usable, I'd prefer it.

from lhotse.

danpovey avatar danpovey commented on September 7, 2024

.. or np.concatenate if not, maybe?

from lhotse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.