I tried to compare a specific model (such as llama 3B) between Web-LLM and local (MLC-

GPU memory usage differs from local. about web-llm HOT 6 CLOSED

137591 commented on July 30, 2024

GPU memory usage differs from local.

from web-llm.

Comments (6)

CharlieFRuan commented on July 30, 2024

How much does the usage differ?

Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?

Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.

from web-llm.

137591 commented on July 30, 2024

we usually provide 2 context length for each model, 4k and 1k

How much does the usage differ?用法有何不同？

Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?另外，有没有办法获取或修改Web-LLM的KV-Cache设置？

Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.好问题;目前还没有办法，但我们应该有一个待办事项。目前，我们通常为每个模型提供 2 个上下文长度，4k 和 1k。目前只有米斯特拉尔使用滑动窗。

For example, Llama3-8B-q4f32-1 uses around 7800MB of VRAM natively and around 5600MB on the web, without changing any of the original example configurations. My input prompt is "what is the meaning of life?" I suspect the KV Cache settings are different, but I can't view VRAM usage details on the web. How can I check the KV Cache size on the web?
This is the data I provided when launching natively (mlc-llm).

from web-llm.

CharlieFRuan commented on July 30, 2024

I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set Loggil Level to Debug in Settings, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama

from web-llm.

137591 commented on July 30, 2024

I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set Loggil Level to Debug in Settings, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama明白了;我猜这可能是由于 KVCache 大小。对于 WebLLM，如果您使用的是 Web 应用程序，则可以设置为 Loggil Level Debug in Settings ，并且可以在控制台日志中看到 kv 缓存大小;在这里，我们有 2048 个 TinyLlama

got it！Thank you！

from web-llm.

tqchen commented on July 30, 2024

if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1

from web-llm.

137591 commented on July 30, 2024

if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1如果您使用 MLCLLM，请注意它默认为“本地”模式，该模式为并发访问设置更大的 kv，您可以通过 --mode interactive 进行更改，这将映射到批次 1

Thank you for your answer!

from web-llm.

GPU memory usage differs from local. about web-llm HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs