If you make multiple requests with the same engine without awaiting, you get garbage.<

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Support concurrent requests to a single model instance about web-llm HOT 5 CLOSED

LEXNY commented on September 27, 2024

Support concurrent requests to a single model instance

from web-llm.

Comments (5)

LEXNY commented on September 27, 2024 1

Powerhouse!

from web-llm.

CharlieFRuan commented on September 27, 2024

Thanks for reporting this! I'll look into fixing this, perhaps blocking subsequent chatCompletion() calls until the previous one is finished, maintaining FCFS. Currently the engine does not support continuous batching, so this may be the only way resolving it as of now. That is, despite you can call multiple chatCompletion()s concurrently, to ensure correctness, they have to be executed sequentially.

However, if you instantiate multiple engines, two requests can be processed concurrently. We will soon support having multiple models loaded in a single engine, so in that case same thing applies.

from web-llm.

LEXNY commented on September 27, 2024

Thank you.

I don't strictly need this to run in parallel (though that would be nice). The concurrency bug is very nonintuitive though and worth fixing.

I did some investigation and this is a gross start, but I think if you separate out this.outputIds.push per completion, that could fix it.

LEXNY@463d075

Some sort of key to identify the specific request and keep track of its specific outputIds.

from web-llm.

CharlieFRuan commented on September 27, 2024

Hi @LEXNY this should be fixed in #549 and reflected in npm 0.2.61. You can check out the PR description for the specifics of the problem and the solution.

Your example now works, though the second request does not start until the first request is finished, as we maintain a FCFS schedule, with only one request running per-model. However, there can be multiple models running in an engine, hence multiple requests can be running per-engine. For more, you can try examples/multi-models.

from web-llm.

CharlieFRuan commented on September 27, 2024

Closing this issue as completed. Feel free to reopen/open new ones if issues arise!

from web-llm.

Recommend Projects

Support concurrent requests to a single model instance about web-llm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs