Comments (6)
How much does the usage differ?
Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?
Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.
from web-llm.
we usually provide 2 context length for each model, 4k and 1k
How much does the usage differ?用法有何不同?
Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?另外,有没有办法获取或修改Web-LLM的KV-Cache设置?
Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.好问题;目前还没有办法,但我们应该有一个待办事项。目前,我们通常为每个模型提供 2 个上下文长度,4k 和 1k。目前只有米斯特拉尔使用滑动窗。
For example, Llama3-8B-q4f32-1 uses around 7800MB of VRAM natively and around 5600MB on the web, without changing any of the original example configurations. My input prompt is "what is the meaning of life?" I suspect the KV Cache settings are different, but I can't view VRAM usage details on the web. How can I check the KV Cache size on the web?
This is the data I provided when launching natively (mlc-llm).
from web-llm.
I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set Loggil Level
to Debug
in Settings
, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama
from web-llm.
I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set
Loggil Level
toDebug
inSettings
, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama明白了;我猜这可能是由于 KVCache 大小。对于 WebLLM,如果您使用的是 Web 应用程序,则可以设置为Loggil Level
Debug
inSettings
,并且可以在控制台日志中看到 kv 缓存大小;在这里,我们有 2048 个 TinyLlama
got it!Thank you!
from web-llm.
if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1
from web-llm.
if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1如果您使用 MLCLLM,请注意它默认为“本地”模式,该模式为并发访问设置更大的 kv,您可以通过 --mode interactive 进行更改,这将映射到批次 1
Thank you for your answer!
from web-llm.
Related Issues (20)
- Model request (Aya-23)
- Abort `reload()` if receiving another `reload()` call
- Inconsistent and unreliable outputs on mobile as opposed to on pc/laptop for -1k models HOT 1
- Error: Module has already been disposed HOT 2
- Are old models being removed? HOT 2
- How to fine tuning the model in the browser?
- New error: DXGI_ERROR_DEVICE_HUNG (0x887A0006)
- model request: Llama-3-8B-Web
- How to actually use WebLLM HOT 3
- wasm optimization? HOT 1
- Microsoft just released a more capable new version over Phi 3 Mini
- Example for using web worker with next js HOT 1
- Error: Failed to execute 'mapAsync' on 'GPUBuffer'
- How to let the user cancel loading the model and stop it from fetching params HOT 3
- Which LLM models can run on 6GB RTX 4050? HOT 2
- [Bug] Converted model outputs gibberish text HOT 1
- TOO SLOW in downloading models from huggingface when running 'mlc_llm package'
- Can I initialize existing model with random weights?
- Deply llama 3 40 billion parameters model HOT 4
- Sending raw text to the model HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-llm.