GithubHelp home page GithubHelp logo

Comments (6)

CharlieFRuan avatar CharlieFRuan commented on July 30, 2024

How much does the usage differ?

Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?

Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.

from web-llm.

137591 avatar 137591 commented on July 30, 2024

we usually provide 2 context length for each model, 4k and 1k

How much does the usage differ?用法有何不同?

Additionally, is there a way to obtain or modify the KV-Cache settings of Web-LLM?另外,有没有办法获取或修改Web-LLM的KV-Cache设置?

Good question; there isn't a way as of now, but should be a TODO from us. Currently, we usually provide 2 context length for each model, 4k and 1k. And only Mistral uses sliding windows as of now.好问题;目前还没有办法,但我们应该有一个待办事项。目前,我们通常为每个模型提供 2 个上下文长度,4k 和 1k。目前只有米斯特拉尔使用滑动窗。

For example, Llama3-8B-q4f32-1 uses around 7800MB of VRAM natively and around 5600MB on the web, without changing any of the original example configurations. My input prompt is "what is the meaning of life?" I suspect the KV Cache settings are different, but I can't view VRAM usage details on the web. How can I check the KV Cache size on the web?
This is the data I provided when launching natively (mlc-llm).
img_v3_02bc_3098a712-0484-4966-b208-c284c9187edg

from web-llm.

CharlieFRuan avatar CharlieFRuan commented on July 30, 2024

I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set Loggil Level to Debug in Settings, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama
image

from web-llm.

137591 avatar 137591 commented on July 30, 2024

I see; I'm guessing this is probably due to KVCache size. For WebLLM, if you are using the web app, you can set Loggil Level to Debug in Settings, and you can see in the console log the kv cache size; here we have 2048 for TinyLlama明白了;我猜这可能是由于 KVCache 大小。对于 WebLLM,如果您使用的是 Web 应用程序,则可以设置为 Loggil Level Debug in Settings ,并且可以在控制台日志中看到 kv 缓存大小;在这里,我们有 2048 个 TinyLlama image

got it!Thank you!

from web-llm.

tqchen avatar tqchen commented on July 30, 2024

if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1

from web-llm.

137591 avatar 137591 commented on July 30, 2024

if you use MLC LLM note that it defaults to "local" mode that sets a bigger kv for concurrent access, you can change that via --mode interactive, which will map to batch 1如果您使用 MLCLLM,请注意它默认为“本地”模式,该模式为并发访问设置更大的 kv,您可以通过 --mode interactive 进行更改,这将映射到批次 1

Thank you for your answer!

from web-llm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.