Comments (16)
This seems to be an issue where, the web worker is terminated due to the phone going standby, but your frontend logic's states are still preserved, hence directly sending a request, expecting the model to be loaded. We had similar issue with service worker before: #471.
This PR #533 moves the fix for service worker to web worker as well. You can test it locally, or try it out when the new npm is published.
The main logic is that, when the backend realizes there is a mismatch between the frontend's expected loaded model, and the backend's actually-loaded model, the backend calls reload()
internally.
from web-llm.
Do you happen to have the console log? Besides, what is the maxStorageBufferBindingSize
in your webgpureport.org?
from web-llm.
It's 2GB.
// fulll screenshots:
from web-llm.
It may be due to one of the limits being exceeded (not necessarily the buffer size, 2GB sounds enough). Gemma requires a larger size for certain buffers than other models due to its large vocab size 256K, compared to other models like Llama3.1 being 128K. I might have to look into this later
Edit: actually, just saw that you mentioned Phi 3 Mini crashes as well. I will try to look into this. Meanwhile, if you have some sort of log, it would be very helpful, perhaps with remote debugging.
from web-llm.
I'm already using USB debugging, so I can help you there.
What kind of info would you like? Is there a debug logging mode I can activate?
// edit: I went through my recent error screenshots and got a few that belong to Web-LLM. Not sure to what degree these relate to this issue though.
from web-llm.
Ahh yes, there is a DEBUG mode here: #519 (comment)
Any log that may relate to the crash would be helpful, thanks!
from web-llm.
I'm using a slightly different UI, my own project :-)
Can I enable debug mode from Javascript?
from web-llm.
Ah yes! There is a logLevel
option in EngineConfig. You can set it to INFO
like here https://github.com/mlc-ai/web-llm/blob/main/examples/simple-chat-ts/src/simple_chat.ts#L345
from web-llm.
Already found it, thanks :-)
window.web_llm_worker = new Worker(
new URL('./web_llm_worker.js', import.meta.url), { type: 'module' }
)
// Creating the WebLLM engine
window.web_llm_engine = await webllm.CreateWebWorkerMLCEngine(
window.web_llm_worker,
web_llm_model_id,
{
initProgressCallback: function (mes) {
//console.log('WebLLM init progress message received: ', mes);
window.handle_web_llm_init_progress(mes);
},
appConfig: window.web_llm_app_config,
logLevel: "DEBUG"
},
chatOpts
);
from web-llm.
What the heck.. now that I've enabled debugging.. Gemma 2 2B suddenly works 0_0.
Phi 3 mini crashed, but retrying a few times I managed to get a response!
So strange.
// ..and then it crashed again. No interesting output in the debug though.
from web-llm.
I see... thanks for the info!
from web-llm.
There are various issues similar to this on mobile devices, probably something related to WebGPU on Android Chromes. I don't have something on top of my mind. Not sure if updating Android version and using the latest Chrome Canary would alleviate.
from web-llm.
The phone went into standby, and then when I woke it up and tried running inference I saw this:
It seems to be related to 'losing the WebGPU'. Should I call MLCEngine.reload(model)
before each inference? Or can I detect if the model has been removed from memory by the OS somehow? How can I hook into A valid external Instance reference no longer exist
?
from web-llm.
Quick question, are you using WebWorker, ServiceWorker, or the plain MLCEngine? For ServiceWorker, my understanding is that this PR has fixed this: #471
from web-llm.
WebWorker.
I noticed I hadn't put a try-catch around WebLLM there (a testament to it's quality), but I've added that now in the hopes of catching the GPU disappeared
event, and then simple restarting the engine.
WebLLM says "please initialize again", but what a setting to let WebLLM do this by itself? "Stay alive until told otherwise" could even be a default?
from web-llm.
This should be added to npm 0.2.56. Let me know if the issue is fixed!
from web-llm.
Related Issues (20)
- [Bug] Converted model outputs gibberish text HOT 1
- TOO SLOW in downloading models from huggingface when running 'mlc_llm package'
- Can I initialize existing model with random weights?
- Deply llama 3 40 billion parameters model HOT 4
- Sending raw text to the model HOT 4
- Deploy small LLM in a chrome extension HOT 2
- Runing LLM in a webworker fails due to loglevel dependency HOT 1
- support concurrent inference from multiple models HOT 4
- anyone tried to run web-llm in tauri?
- Request: Allow deletion of individual cached models. HOT 3
- LLama 3.1 Error: Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model. HOT 4
- Custom model outputs garbage in firefox nightly, works fine in chrome.
- Phi 3 Mini output near random (Phi-3-mini-4k-instruct-q4f16_1-MLC) HOT 10
- Support concurrent requests to a single model instance HOT 5
- [Tracking][WebLLM] Function calling (beta) and Embeddings HOT 1
- Feature request: engine.preload() HOT 5
- I can't find a method to stop a conversation in progress. HOT 1
- TypeError: Cannot read properties of undefined (reading 'origin') HOT 1
- vercel/ai provider integration HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-llm.