Comments (24)
This looks like the Chrome-incompatibility issue in #22 and #55 which was just merged and deployed a few hours ago. Would you mind clearing your browser cache, trying again and reporting back?
from pyodide.
See docs/new_packages.md for info on building new packages. Pure python tends to be pretty straightforward, but things with C or Cython extensions can be trickier.
from pyodide.
Thanks for your patience with the rough edges on this very new project.
I'm not surprised to hear that Chrome is slower than Firefox -- that's been my experience thus far. It also is less memory efficient since at the moment we are force to load all dynamic libraries whether they are used or not in Chrome. Plus, we just got it working at all yesterday :)
But, in general, a slowdown of 5x is not outside the range I've seen in my own benchmarking. See http://droettboom.com/blog/2018/04/11/profiling-webassembly/
That said, there are a few places here that might help (though it's hard to say by how much without measuring):
- Create a function in Python, import that to Javascript (using
pyodide.pyimport(...)
) and just call it. That will prevent the need to parse the Python each time. - This code is converting the audioBuffer array to a string and back when it's passed from Javascript to Python. Creating a Python function and calling it should convert the buffer in a much more efficient way. Going even further, you could place the audioBuffer on the C/wasm heap and then just create the Numpy array to point at that, avoiding the copy altogether. I keep meaning to create some helper functions for that, but haven't got to it yet. Are you able to share your work thus far (including the Javascript side) so I could use it as a starting point?
from pyodide.
When you pass an array from Javascript to Python, it becomes a raw memory pointer, represented as a Uint8ClampedArray
. To get the type back, you need to use np.frombuffer
(which initializes a numpy array from existing memory), rather than np.array
, which would interpret each byte as an array element. So try:
np.frombuffer(buffer, dtype=float32)
(This is the sort of low-level detail that we need to hide inside a helper library at some point -- that just doesn't exist yet...)
from pyodide.
Hi @mdboom, thank you for your response, everything is working as expected.
I have a question tho.
Is it possible for me to load any python package i want ? is there a place where i can learn how to generate those package files (numpy.data/numpy.js) ?
from pyodide.
Hey @mdboom , the library works beautifully, this is just awesome to be able to execute python code in the browser but it's extremely slower, see the snippet below:
function py (code) {
return pyodide.runPython(code);
}
py('import numpy as np');
py('from numpy.lib.stride_tricks import as_strided');
// http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
py(`w = ${wind}`);
py(`step = ${step}`);
py(`max_freq = 8000`);
py(`eps = 1e-14`);
py(`sample_rate = ${sampleRate}`);
py(`samples = np.array([${audioBuffer.toString()}])`);
py(`assert not np.iscomplexobj(samples)`);
py(`hop_length = int(0.001 * step * sample_rate)`);
py(`fft_length = int(0.001 * w * sample_rate)` );
py(`window = np.hanning(fft_length)` );
py(`window_norm = np.sum(window ** 2)` );
py(`scale = window_norm * sample_rate` );
py(`trunc = (len(samples) - fft_length) % hop_length`);
py(`x = samples[:len(samples) - trunc]`);
py(`nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)`);
py(`nstrides = (x.strides[0], x.strides[0] * hop_length)`);
py(`x = as_strided(x, shape=nshape, strides=nstrides)`);
py(`x = np.fft.rfft(x, axis=0)`);
py(`x = np.absolute(x)**2`);
py(`x[1:-1, :] *= (2.0 / scale)`);
py(`x[(0, -1), :] /= scale`);
py(`freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])`);
py(`ind = np.where(freqs <= max_freq)[0][-1] + 1`);
py(`result = np.transpose(np.log(x[:ind, :] + eps))`);
let x = py('x');
let freqs = py('freqs');
let spectrogram = py('result');
return { x: x, freqs: freqs, spectrogram: spectrogram };
}
i'm recording audio in the browser and extracting the spectrogram, works great but takes between 1,5 to 2,5 seconds to give me the results which is much slower than the 0.5s or less on native python, so i wander, am i doing something wrong or this is the current state for webassembly? should i do something like:
py(`import numpy as np\nstep=${step}\nmax_freq=8000\n...`)
do you think it's because i'm making multiple calls to pyodide.runPython
?
p.s. i just had the idea above and haven't tried yet
from pyodide.
Calling pyodide.runPython
multiple times doesn't seem to be the problem
i tried this
let pythonCode = (audioBuffer, step, wind, sampleRate) => {
return `w = ${wind}
step = ${step}
max_freq = 8000
eps = 1e-14
sample_rate = ${sampleRate}
samples = np.array([${audioBuffer.toString()}])
assert not np.iscomplexobj(samples)
hop_length = int(0.001 * step * sample_rate)
fft_length = int(0.001 * w * sample_rate)
window = np.hanning(fft_length)
window_norm = np.sum(window ** 2)
scale = window_norm * sample_rate
trunc = (len(samples) - fft_length) % hop_length
x = samples[:len(samples) - trunc]
nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)
nstrides = (x.strides[0], x.strides[0] * hop_length)
x = as_strided(x, shape=nshape, strides=nstrides)
x = np.fft.rfft(x, axis=0)
x = np.absolute(x)**2
x[1:-1, :] *= (2.0 / scale)
x[(0, -1), :] /= scale
freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])
ind = np.where(freqs <= max_freq)[0][-1] + 1
result = np.transpose(np.log(x[:ind, :] + eps))
`
};
function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
py(pythonCode(audioBuffer, step, wind, sampleRate));
let x = py('x');
let freqs = py('freqs');
let spectrogram = py('result');
return { x: x, freqs: freqs, spectrogram: spectrogram };
}
i got almost the same speed, sometimes faster (1,002s) was the fastest
It's faster on Firefox tho, i get 0.4s sometimes
from pyodide.
I really loved your tips, it really sounds like it could run much faster doing this way but i ran into some problems
i've made this experiment with a function
let spectrogram = () => {
return `def spectrogram(audioBuffer, step, wind, sampleRate):
w = wind
step = step
max_freq = 8000
eps = 1e-14
sample_rate = sampleRate
# samples = np.array(audioBuffer)
return audioBuffer`
}
py(spectrogram());
But when i pass my audioBuffer Float32Array(46908)
it becomes a bigger and different array of type Uint8ClampedArray(187632)
see code below:
function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
let spec = pyodide.pyimport('spectrogram'); // TODO: move this away from here
let spectrogram = spec(audioBuffer, step, wind, sampleRate);
console.log(spectrogram) // Uint8ClampedArray(187632)
return { spectrogram: spectrogram };
}
i think that doing this way i lost the ability to parse whatever i need from the C method _runpython and can't pass my array back and forth the way i used to, maybe there is some hacky way? i tried to pass a Buffer
to .decode()
in python and json.loads()
but then i noticed i wasn't in nodeJS and would need to install some extra stuff Buffer is undefined
. would you have a "genious" solution for me to pass data around that i've not been capable to think of?
--- "you could place the audioBuffer on the C/wasm heap"
this sounds like Japanese to me but i'll google
and try to learn how to do this thing.
--- "Are you able to share your work thus far (including the Javascript side) so I could use it as a starting point?"
yes of course, i'll just try to organize the code a little better because right now it looks like a war, i'm just experimenting a lot of things out, as soon as it looks more presentable i'll put it on github and let you know.
from pyodide.
Great tip, thanks! god bless numpy
i got between 0.1s to 0.5s, AWESOME!
that's my class so far, i'm still having some problems with the spectrogram now, getting some Null values on the array, but this is probably some small fix
const py = (code) => {
return pyodide.runPython(code);
}
class AudioGenerator {
constructor() {
this.context = new AudioContext();
this.numpyLoaded = false;
this.cache = {};
pyodide.loadPackage('numpy').then(() => {
py(`import json\nimport numpy as np\nfrom numpy.lib.stride_tricks import as_strided`);
py(this._spectrogram());
this.pySpectrogram = pyodide.pyimport('spectrogram');
this.numpyLoaded = true;
console.log('numpy loaded');
});
}
_soundFile(filePath) {
return new Promise((resolve, reject) => {
console.info('_soundFile()');
const request = new XMLHttpRequest();
request.open('GET', filePath, true);
request.responseType = 'arraybuffer';
request.onreadystatechange = function(event) {
if (request.readyState == 4) {
if (request.status == 200 || request.status == 0) {
resolve(request.response);
} else {
reject({error: '404 Not found'});
}
}
};
request.send(null);
});
}
_spectrogram () {
return `def spectrogram(audioBuffer, step, wind, sampleRate):
w = wind
step = step
max_freq = 8000
eps = 1e-14
sample_rate = sampleRate
samples = np.frombuffer(audioBuffer, dtype='float32')
assert not np.iscomplexobj(samples)
hop_length = int(0.001 * step * sample_rate)
fft_length = int(0.001 * w * sample_rate)
window = np.hanning(fft_length)
window_norm = np.sum(window ** 2)
scale = window_norm * sample_rate
trunc = (len(samples) - fft_length) % hop_length
x = samples[:len(samples) - trunc]
nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)
nstrides = (x.strides[0], x.strides[0] * hop_length)
x = as_strided(x, shape=nshape, strides=nstrides)
x = np.fft.rfft(x, axis=0)
x = np.absolute(x)**2
x[1:-1, :] *= (2.0 / scale)
x[(0, -1), :] /= scale
freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])
ind = np.where(freqs <= max_freq)[0][-1] + 1
result = np.transpose(np.log(x[:ind, :] + eps))
return result`
}
// http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
_spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
let spectrogram = this.pySpectrogram(audioBuffer, step, wind, sampleRate);
return { spectrogram: spectrogram };
}
spectrogramFromFile (filePath, step, wind, sampleRate) {
const self = this;
let spec = null;
return new Promise((resolve, reject) => {
if(!this.cache[filePath]){
this._soundFile(filePath).then((arrayBuffer) => {
this.context.decodeAudioData(arrayBuffer, (audioBuffer) => {
let buffer = audioBuffer.getChannelData(0);
this.cache[filePath] = buffer;
if(!this.numpyLoaded) reject({ message: "Numpy was not loaded yet, try again in a few seconds" });
spec = this._spectrogramFromAudioBuffer(buffer, step, wind, sampleRate);
console.log(spec);
resolve(spec);
});
});
} else {
spec = this._spectrogramFromAudioBuffer(self.cache[filePath], step, wind, sampleRate);
console.log(spec);
resolve(spec);
}
});
}
}
the language is being initialized on the index.html
languagePluginLoader.then(() => {
console.warn('Python initialized');
let audioGen = new AudioGenerator();
// ..... mess and more mess
});
Any tips are welcome, this library/webassembly is awesome, thanks
from pyodide.
hi @mdboom
np.frombuffer(audioBuffer, dtype=float32)
returns the right audiobuffer but with a lot of nan values, would you know why is that?
i found this answer on stackoverflow but it doesn't make much sense to me
https://stackoverflow.com/questions/24601014/behaviour-of-custom-nan-floats-in-python-and-numpy
from pyodide.
I'm not sure why there'd be a lot of NaNs, except if there were a data type mismatch here. Are they there on the Javascript side before sending to Python? Maybe try printing the first few values from Javascript and then from Python to make sure they match. You're a bit in uncharted territory here, but it's helpful to hear about what you're doing.
from pyodide.
Hey @mdboom , i'm using np.nan_to_num()
on my buffer, the NaNs are only on the python side after np.frombuffer()
, javascript side looks ok.
I'm experimenting inside a small machine learning library i implemented in javascript, so i created a new branch for my experimentations
https://github.com/gabrielfreire/neuralnet.js/tree/more_experimentation
you can take a look if you wish, all the client code is inside www/
if you wish to run the code, follow these steps:
npm install
after cloning
npm run tsc
don't mind the errors/warning from typescript
npm run start
index.html
will be available on localhost:5000
with a button called Generate Spectrogram
that when pressed will run the python code using www/data/example.wav
python is loaded on www/index.html
and the python code is all inside www/lib/AudioGenerator.js
from pyodide.
I've tried to pass the buffer to python in some other ways and i found a good solution, not the fastest but the best i could so far, i'll list below what i've tried.
1 - i tried to figure out what was going on with np.frombuffer()
, why was it returning so many nulls and wrong values when compared with the runPython parser pyodide.runPython('np.array(${audioBuffer.toString()})')
which was returning the right values, the same as in native python. i couldn't find any reason and didn't manage to fix
2 - i tried to use this approad
this.pySpectrogram(audioBuffer.toString(), step, wind, sampleRate)
import ast
def spectrogram(audioBuffer, step, wind, sample_rate):
max_freq = 8000
eps = 1e-14
samples = ast.literal_eval('[{}]'.format(audioBuffer))
It worked, but the whole process was taking between 2.5s and 3s, not good at all.
3 - I finally found a good solution, not as fast as using np.frombuffer()
(0.02s to 0.07s) (maybe because it wasn't converting the right way?), but it returns me all the right values and the whole process takes between 0.1s to 0.7s which is acceptable:
what i'm doing is, pass the audioBuffer to python as a string audioBuffer.toString()
and in python:
samples = np.fromstring(audioBuffer, dtype='float32', sep=',')
this is the best solution i got so far to pass a float32array to a python method using pyodide.pyimport
without the need to directly parse my variables with pyodide.runPython
I hope this is helpful.
Thanks
from pyodide.
I think you're running into a bug in how typed arrays are going from javascript to Python. I'm looking into it now and will get back.
from pyodide.
See #63 for a fix. Once this makes it through testing, I'll deploy and you should be good to go. I made the following changes to your code to make this work:
diff --git a/www/lib/AudioGenerator.js b/www/lib/AudioGenerator.js
index f001d6c..504066f 100644
--- a/www/lib/AudioGenerator.js
+++ b/www/lib/AudioGenerator.js
@@ -39,8 +39,7 @@ class AudioGenerator {
return `def spectrogram(audioBuffer, step, wind, sample_rate):
max_freq = 8000
eps = 1e-14
- samples = np.fromstring(audioBuffer, dtype='float32', sep=',')
-
+ samples = np.frombuffer(audioBuffer, dtype='float32')
assert not np.iscomplexobj(samples), "Must not pass in complex numbers"
hop_length = int(0.001 * step * sample_rate)
@@ -69,8 +68,7 @@ class AudioGenerator {
// http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
_spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
- let buff = audioBuffer.toString();
- let spectrogram = this.pySpectrogram(buff, step, wind, sampleRate);
+ let spectrogram = this.pySpectrogram(audioBuffer, step, wind, sampleRate);
return { spectrogram: spectrogram };
}
spectrogramFromFile (filePath, step, wind) {
from pyodide.
Nice, thanks for your help. looking forward to this deploy
from pyodide.
i created a notebook that could be useful as an example for future users
https://github.com/gabrielfreire/neuralnet.js/blob/more_experimentation/spectrogram-feature-extraction.html
from pyodide.
Cool notebook. If you're using Iodide like that, you can just put Python in a Python cell -- no need to use runPython
at all! :) When you're ready, we'd love to have a pull request over at https://github.com/iodide-project/iodide-examples where we are collecting a bunch of cool iodide examples like this.
from pyodide.
The fix in #64 is now merged and deployed. You may need to clear your browser cache to take advantage of it.
from pyodide.
cool, thanks.
Yes i wanted to focus on a javascript only project that makes use of python code on that notebook, you already have example notebooks with python cells, but it's confusing if an user(me) wants to write a full WebApp using python on the client i guess
from pyodide.
This fix was huge.
On my desktop (very fast hardware) i'm getting between 0.001s to 0.02s to extract a spectrogram from an audio buffer
On my work laptop (slower hardware) i'm getting between 0.07s to 0.7s to extract a spectrogram from an audio buffer
this is pretty much native performance, great stuff!! really good.
from pyodide.
Have you played with using the Web Audio API to collect audio from the microphone and do real-time spectral analysis? It seems this might be fast enough, and would make a really cool demo... :)
from pyodide.
Hi @mdboom , yes i've played but i was sending the recorded blob to a nodejs server connected with python through zeroMQ and getting the spectrogram back to the client,
this spectrogram is useful for neural network inference on a speech recognition model that was trained using spectrogram features, doing it this way is VERY fast, don't get me wrong, but if i could do everything on the client (offline), would be AWESOME to not depend on connection.
That's how i got here, i was looking for a way to use python/numpy/pandas on the client and make fast data transformations without the need for open internet connection which is a big use case for hospitals for example.
Maybe i could do all of this using javascript, probably with numjs, but i just like python too much for this kind of task, maybe i'm not doing the right thing on trying to use python on the client for this but with this WebAssembly hype i had to try.
A great win would be to decrease the start download/loading size and increase the start speed.
I'll write some code for that on the notebook
from pyodide.
i created a PR in iodide-examples
from pyodide.
Related Issues (20)
- `loadPackagesFromImports` in `PyodideConsole.runcode` isn't locked HOT 3
- pypdf: PDF library HOT 1
- `<console>` is hard-coded, causing traceback disappear when filename specified HOT 3
- pymupdf: A high performance Python library HOT 1
- Is it possible to bypass the version check? HOT 1
- `stack-data`'s dependency map is wrong HOT 6
- run_sync prints caught exceptions HOT 1
- A single, public, Python API for installing package HOT 9
- `IN_NODE` flag is `false` in `bun` runtime
- RFC Plans for unvendoring package recipes
- Improve generic typing for `to_js` and `JsProxy` HOT 3
- Add a `--force` option to the `pyodide skeleton pypi` CLI to allow updating recipes for packages with patches HOT 3
- Can `last_expr_or_assign` mode support imports HOT 1
- Request to add the popular reportlab PDF generation package to pyodide HOT 7
- add memray (memory profiler)
- Contents of Release Files could be explained HOT 10
- Can't run tests for packages in the Docker container HOT 5
- add skyfield (and sgp4)
- add library in turtle HOT 1
- The REPL is down HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyodide.