GithubHelp home page GithubHelp logo

numpy fails to import about pyodide HOT 24 CLOSED

gabrielfreire avatar gabrielfreire commented on July 30, 2024
numpy fails to import

from pyodide.

Comments (24)

mdboom avatar mdboom commented on July 30, 2024 1

This looks like the Chrome-incompatibility issue in #22 and #55 which was just merged and deployed a few hours ago. Would you mind clearing your browser cache, trying again and reporting back?

from pyodide.

mdboom avatar mdboom commented on July 30, 2024 1

See docs/new_packages.md for info on building new packages. Pure python tends to be pretty straightforward, but things with C or Cython extensions can be trickier.

from pyodide.

mdboom avatar mdboom commented on July 30, 2024 1

Thanks for your patience with the rough edges on this very new project.

I'm not surprised to hear that Chrome is slower than Firefox -- that's been my experience thus far. It also is less memory efficient since at the moment we are force to load all dynamic libraries whether they are used or not in Chrome. Plus, we just got it working at all yesterday :)

But, in general, a slowdown of 5x is not outside the range I've seen in my own benchmarking. See http://droettboom.com/blog/2018/04/11/profiling-webassembly/

That said, there are a few places here that might help (though it's hard to say by how much without measuring):

  • Create a function in Python, import that to Javascript (using pyodide.pyimport(...)) and just call it. That will prevent the need to parse the Python each time.
  • This code is converting the audioBuffer array to a string and back when it's passed from Javascript to Python. Creating a Python function and calling it should convert the buffer in a much more efficient way. Going even further, you could place the audioBuffer on the C/wasm heap and then just create the Numpy array to point at that, avoiding the copy altogether. I keep meaning to create some helper functions for that, but haven't got to it yet. Are you able to share your work thus far (including the Javascript side) so I could use it as a starting point?

from pyodide.

mdboom avatar mdboom commented on July 30, 2024 1

When you pass an array from Javascript to Python, it becomes a raw memory pointer, represented as a Uint8ClampedArray. To get the type back, you need to use np.frombuffer (which initializes a numpy array from existing memory), rather than np.array, which would interpret each byte as an array element. So try:

np.frombuffer(buffer, dtype=float32)

(This is the sort of low-level detail that we need to hide inside a helper library at some point -- that just doesn't exist yet...)

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Hi @mdboom, thank you for your response, everything is working as expected.

I have a question tho.
Is it possible for me to load any python package i want ? is there a place where i can learn how to generate those package files (numpy.data/numpy.js) ?

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Hey @mdboom , the library works beautifully, this is just awesome to be able to execute python code in the browser but it's extremely slower, see the snippet below:

function py (code) {
    return pyodide.runPython(code);
}
py('import numpy as np');
py('from numpy.lib.stride_tricks import as_strided');

// http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
    py(`w = ${wind}`);
    py(`step = ${step}`);
    py(`max_freq = 8000`);
    py(`eps = 1e-14`);
    py(`sample_rate = ${sampleRate}`);
    py(`samples = np.array([${audioBuffer.toString()}])`);
    py(`assert not np.iscomplexobj(samples)`);
    py(`hop_length = int(0.001 * step * sample_rate)`);
    py(`fft_length = int(0.001 * w * sample_rate)` );
    py(`window = np.hanning(fft_length)` );
    py(`window_norm = np.sum(window ** 2)` );
    py(`scale = window_norm * sample_rate` );
    py(`trunc = (len(samples) - fft_length) % hop_length`);
    py(`x = samples[:len(samples) - trunc]`);
    py(`nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)`);
    py(`nstrides = (x.strides[0], x.strides[0] * hop_length)`);
    py(`x = as_strided(x, shape=nshape, strides=nstrides)`);
    py(`x = np.fft.rfft(x, axis=0)`);
    py(`x = np.absolute(x)**2`);
    py(`x[1:-1, :] *= (2.0 / scale)`);
    py(`x[(0, -1), :] /= scale`);
    py(`freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])`);
    py(`ind = np.where(freqs <= max_freq)[0][-1] + 1`);
    py(`result = np.transpose(np.log(x[:ind, :] + eps))`);
    let x = py('x');
    let freqs = py('freqs');
    let spectrogram = py('result');
    return { x: x, freqs: freqs, spectrogram: spectrogram };
}

i'm recording audio in the browser and extracting the spectrogram, works great but takes between 1,5 to 2,5 seconds to give me the results which is much slower than the 0.5s or less on native python, so i wander, am i doing something wrong or this is the current state for webassembly? should i do something like:

py(`import numpy as np\nstep=${step}\nmax_freq=8000\n...`)

do you think it's because i'm making multiple calls to pyodide.runPython?

p.s. i just had the idea above and haven't tried yet

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Calling pyodide.runPython multiple times doesn't seem to be the problem
i tried this

let pythonCode = (audioBuffer, step, wind, sampleRate) => { 
    return `w = ${wind}
step = ${step}
max_freq = 8000
eps = 1e-14
sample_rate = ${sampleRate}
samples = np.array([${audioBuffer.toString()}])
assert not np.iscomplexobj(samples)
hop_length = int(0.001 * step * sample_rate)
fft_length = int(0.001 * w * sample_rate)
window = np.hanning(fft_length)
window_norm = np.sum(window ** 2)
scale = window_norm * sample_rate
trunc = (len(samples) - fft_length) % hop_length
x = samples[:len(samples) - trunc]
nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)
nstrides = (x.strides[0], x.strides[0] * hop_length)
x = as_strided(x, shape=nshape, strides=nstrides)
x = np.fft.rfft(x, axis=0)
x = np.absolute(x)**2
x[1:-1, :] *= (2.0 / scale)
x[(0, -1), :] /= scale
freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])
ind = np.where(freqs <= max_freq)[0][-1] + 1
result = np.transpose(np.log(x[:ind, :] + eps))
    `
};

function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
    py(pythonCode(audioBuffer, step, wind, sampleRate));
    let x = py('x');
    let freqs = py('freqs');
    let spectrogram = py('result');
    return { x: x, freqs: freqs, spectrogram: spectrogram };
}

i got almost the same speed, sometimes faster (1,002s) was the fastest

It's faster on Firefox tho, i get 0.4s sometimes

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

I really loved your tips, it really sounds like it could run much faster doing this way but i ran into some problems
i've made this experiment with a function

let spectrogram = () => { 
    return `def spectrogram(audioBuffer, step, wind, sampleRate):
    w = wind
    step = step
    max_freq = 8000
    eps = 1e-14
    sample_rate = sampleRate
    # samples = np.array(audioBuffer)
    
    return audioBuffer`
}
py(spectrogram());

But when i pass my audioBuffer Float32Array(46908) it becomes a bigger and different array of type Uint8ClampedArray(187632)
see code below:

function _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
    let spec = pyodide.pyimport('spectrogram'); // TODO: move this away from here
    let spectrogram = spec(audioBuffer, step, wind, sampleRate);
    console.log(spectrogram) // Uint8ClampedArray(187632)
    return { spectrogram: spectrogram };
}

i think that doing this way i lost the ability to parse whatever i need from the C method _runpython and can't pass my array back and forth the way i used to, maybe there is some hacky way? i tried to pass a Buffer to .decode() in python and json.loads() but then i noticed i wasn't in nodeJS and would need to install some extra stuff Buffer is undefined. would you have a "genious" solution for me to pass data around that i've not been capable to think of?

--- "you could place the audioBuffer on the C/wasm heap"
this sounds like Japanese to me but i'll google
and try to learn how to do this thing.

--- "Are you able to share your work thus far (including the Javascript side) so I could use it as a starting point?"
yes of course, i'll just try to organize the code a little better because right now it looks like a war, i'm just experimenting a lot of things out, as soon as it looks more presentable i'll put it on github and let you know.

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Great tip, thanks! god bless numpy
i got between 0.1s to 0.5s, AWESOME!

that's my class so far, i'm still having some problems with the spectrogram now, getting some Null values on the array, but this is probably some small fix

const py = (code) => {
    return pyodide.runPython(code);
}
class AudioGenerator {
    constructor() {
        this.context = new AudioContext();
        this.numpyLoaded = false;
        this.cache = {};
        pyodide.loadPackage('numpy').then(() => {
            py(`import json\nimport numpy as np\nfrom numpy.lib.stride_tricks import as_strided`);
            py(this._spectrogram());
            this.pySpectrogram = pyodide.pyimport('spectrogram');
            this.numpyLoaded = true;
            console.log('numpy loaded');
        });
    }
    _soundFile(filePath) {
        return new Promise((resolve, reject) => {
            console.info('_soundFile()');
            const request = new XMLHttpRequest();
            request.open('GET', filePath, true);
            request.responseType = 'arraybuffer';
            request.onreadystatechange = function(event) {
              if (request.readyState == 4) {
                if (request.status == 200 || request.status == 0) {
                  resolve(request.response); 
                } else {
                  reject({error: '404 Not found'});
                }
              }
            };
            request.send(null);
          });
    }
    _spectrogram () {
        return `def spectrogram(audioBuffer, step, wind, sampleRate):
        w = wind
        step = step
        max_freq = 8000
        eps = 1e-14
        sample_rate = sampleRate
        samples = np.frombuffer(audioBuffer, dtype='float32')
        assert not np.iscomplexobj(samples)
        hop_length = int(0.001 * step * sample_rate)
        fft_length = int(0.001 * w * sample_rate)
        window = np.hanning(fft_length)
        window_norm = np.sum(window ** 2)
        scale = window_norm * sample_rate
        trunc = (len(samples) - fft_length) % hop_length
        x = samples[:len(samples) - trunc]
        nshape = (fft_length, (len(x) - fft_length) // hop_length + 1)
        nstrides = (x.strides[0], x.strides[0] * hop_length)
        x = as_strided(x, shape=nshape, strides=nstrides)
        x = np.fft.rfft(x, axis=0)
        x = np.absolute(x)**2
        x[1:-1, :] *= (2.0 / scale)
        x[(0, -1), :] /= scale
        freqs = float(sample_rate) / fft_length * np.arange(x.shape[0])
        ind = np.where(freqs <= max_freq)[0][-1] + 1
        result = np.transpose(np.log(x[:ind, :] + eps))
        return result`
    }

    // http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
    _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
        let spectrogram = this.pySpectrogram(audioBuffer, step, wind, sampleRate);
        return { spectrogram: spectrogram };
    }
    spectrogramFromFile (filePath, step, wind, sampleRate) {
        const self = this;
        let spec = null;
        return new Promise((resolve, reject) => {
            if(!this.cache[filePath]){
                this._soundFile(filePath).then((arrayBuffer) => {
                    this.context.decodeAudioData(arrayBuffer, (audioBuffer) => {
                        let buffer = audioBuffer.getChannelData(0);
                        this.cache[filePath] = buffer;
                        if(!this.numpyLoaded) reject({ message: "Numpy was not loaded yet, try again in a few seconds" });
                        spec = this._spectrogramFromAudioBuffer(buffer, step, wind, sampleRate);
                        console.log(spec);
                        resolve(spec);
                    });
                });
            } else {
                spec = this._spectrogramFromAudioBuffer(self.cache[filePath], step, wind, sampleRate);
                console.log(spec);
                resolve(spec);
            }
        });
    }
}

the language is being initialized on the index.html

languagePluginLoader.then(() => {
   console.warn('Python initialized');
   let audioGen = new AudioGenerator();
   // ..... mess and more mess
});

Any tips are welcome, this library/webassembly is awesome, thanks

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

hi @mdboom
np.frombuffer(audioBuffer, dtype=float32) returns the right audiobuffer but with a lot of nan values, would you know why is that?

i found this answer on stackoverflow but it doesn't make much sense to me
https://stackoverflow.com/questions/24601014/behaviour-of-custom-nan-floats-in-python-and-numpy

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

I'm not sure why there'd be a lot of NaNs, except if there were a data type mismatch here. Are they there on the Javascript side before sending to Python? Maybe try printing the first few values from Javascript and then from Python to make sure they match. You're a bit in uncharted territory here, but it's helpful to hear about what you're doing.

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Hey @mdboom , i'm using np.nan_to_num() on my buffer, the NaNs are only on the python side after np.frombuffer(), javascript side looks ok.

I'm experimenting inside a small machine learning library i implemented in javascript, so i created a new branch for my experimentations
https://github.com/gabrielfreire/neuralnet.js/tree/more_experimentation
you can take a look if you wish, all the client code is inside www/
if you wish to run the code, follow these steps:
npm install after cloning
npm run tsc don't mind the errors/warning from typescript
npm run start
index.html will be available on localhost:5000 with a button called Generate Spectrogram that when pressed will run the python code using www/data/example.wav
python is loaded on www/index.html and the python code is all inside www/lib/AudioGenerator.js

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

I've tried to pass the buffer to python in some other ways and i found a good solution, not the fastest but the best i could so far, i'll list below what i've tried.

1 - i tried to figure out what was going on with np.frombuffer(), why was it returning so many nulls and wrong values when compared with the runPython parser pyodide.runPython('np.array(${audioBuffer.toString()})') which was returning the right values, the same as in native python. i couldn't find any reason and didn't manage to fix

2 - i tried to use this approad

this.pySpectrogram(audioBuffer.toString(), step, wind, sampleRate)
import ast
def spectrogram(audioBuffer, step, wind, sample_rate):
     max_freq = 8000
     eps = 1e-14
     samples = ast.literal_eval('[{}]'.format(audioBuffer))

It worked, but the whole process was taking between 2.5s and 3s, not good at all.

3 - I finally found a good solution, not as fast as using np.frombuffer() (0.02s to 0.07s) (maybe because it wasn't converting the right way?), but it returns me all the right values and the whole process takes between 0.1s to 0.7s which is acceptable:
what i'm doing is, pass the audioBuffer to python as a string audioBuffer.toString()
and in python:

samples = np.fromstring(audioBuffer, dtype='float32', sep=',')

this is the best solution i got so far to pass a float32array to a python method using pyodide.pyimport without the need to directly parse my variables with pyodide.runPython

I hope this is helpful.

Thanks

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

I think you're running into a bug in how typed arrays are going from javascript to Python. I'm looking into it now and will get back.

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

See #63 for a fix. Once this makes it through testing, I'll deploy and you should be good to go. I made the following changes to your code to make this work:

diff --git a/www/lib/AudioGenerator.js b/www/lib/AudioGenerator.js
index f001d6c..504066f 100644
--- a/www/lib/AudioGenerator.js
+++ b/www/lib/AudioGenerator.js
@@ -39,8 +39,7 @@ class AudioGenerator {
         return `def spectrogram(audioBuffer, step, wind, sample_rate):
         max_freq = 8000
         eps = 1e-14
-        samples = np.fromstring(audioBuffer, dtype='float32', sep=',')
-
+        samples = np.frombuffer(audioBuffer, dtype='float32')
         assert not np.iscomplexobj(samples), "Must not pass in complex numbers"
 
         hop_length = int(0.001 * step * sample_rate)
@@ -69,8 +68,7 @@ class AudioGenerator {
 
     // http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
     _spectrogramFromAudioBuffer (audioBuffer, step, wind, sampleRate) {
-        let buff = audioBuffer.toString();
-        let spectrogram = this.pySpectrogram(buff, step, wind, sampleRate);
+        let spectrogram = this.pySpectrogram(audioBuffer, step, wind, sampleRate);
         return { spectrogram: spectrogram };
     }
     spectrogramFromFile (filePath, step, wind) {

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Nice, thanks for your help. looking forward to this deploy

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

i created a notebook that could be useful as an example for future users
https://github.com/gabrielfreire/neuralnet.js/blob/more_experimentation/spectrogram-feature-extraction.html

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

Cool notebook. If you're using Iodide like that, you can just put Python in a Python cell -- no need to use runPython at all! :) When you're ready, we'd love to have a pull request over at https://github.com/iodide-project/iodide-examples where we are collecting a bunch of cool iodide examples like this.

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

The fix in #64 is now merged and deployed. You may need to clear your browser cache to take advantage of it.

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

cool, thanks.

Yes i wanted to focus on a javascript only project that makes use of python code on that notebook, you already have example notebooks with python cells, but it's confusing if an user(me) wants to write a full WebApp using python on the client i guess

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

This fix was huge.
On my desktop (very fast hardware) i'm getting between 0.001s to 0.02s to extract a spectrogram from an audio buffer
On my work laptop (slower hardware) i'm getting between 0.07s to 0.7s to extract a spectrogram from an audio buffer
this is pretty much native performance, great stuff!! really good.

from pyodide.

mdboom avatar mdboom commented on July 30, 2024

Have you played with using the Web Audio API to collect audio from the microphone and do real-time spectral analysis? It seems this might be fast enough, and would make a really cool demo... :)

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Hi @mdboom , yes i've played but i was sending the recorded blob to a nodejs server connected with python through zeroMQ and getting the spectrogram back to the client,

this spectrogram is useful for neural network inference on a speech recognition model that was trained using spectrogram features, doing it this way is VERY fast, don't get me wrong, but if i could do everything on the client (offline), would be AWESOME to not depend on connection.

That's how i got here, i was looking for a way to use python/numpy/pandas on the client and make fast data transformations without the need for open internet connection which is a big use case for hospitals for example.

Maybe i could do all of this using javascript, probably with numjs, but i just like python too much for this kind of task, maybe i'm not doing the right thing on trying to use python on the client for this but with this WebAssembly hype i had to try.

A great win would be to decrease the start download/loading size and increase the start speed.

I'll write some code for that on the notebook

from pyodide.

gabrielfreire avatar gabrielfreire commented on July 30, 2024

Done!
https://github.com/gabrielfreire/neuralnet.js/blob/more_experimentation/spectrogram-feature-extraction.html

i created a PR in iodide-examples

from pyodide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.