Comments (23)
2.6.24 looks good!
Glad I could help, and thanks for all your work on Twinny!
from twinny.
Hey @rcgtnick thanks for the report. I'm confused by this because twinny is using the stream: true
option for Ollama API and this should always return multiple small JSON responses.
As far as I can tell you are using your server as some kind of proxy between twinny and the ollama instance? Are you running a proxy server between Ollama and your API? If so, are you also piping the request or taking the input and making an API call without stream: true
this is the only reason that any responses would exceed 15kb.
Many thanks.
from twinny.
I may be wrong about the cause, then.
The only thing between Twinny and Ollama is a reverse proxy, which shouldn't be modifying headers or request bodies at all. I captured some network traffic for a request between the proxy and Ollama, to see what is happening on the wire:
Looks like it's not the chunking - while the last chunk is quite large, it's still just one chunk. However, if I enable developer tools in VSCode, add a console.log
to print the chunk right before it's passed to JSON.parse
, I can see that it'schunk
contains only part of the complete chunk from the server, and the error from the JSON module accurately describes the problem (data is cut off in the middle of an array, should have been a comma or a ]
next).
Interestingly, though, a capture from the local Ollama request shows an even larger response, but it's received by Twinny all together and parsed just fine:
Any other idea what would cause this?
from twinny.
After looking at both captures, all of the JSON that is returned/printed in the document seems to be parseable so it makes it harder for me to understand what could be causing it. If it is as you say that large responses are the issue I think a fix might be to check for done: true
and cut off the context
before parsing.
from twinny.
Just to rule out the proxy, I opened a port directly to Ollama and captured a Twinny request from the server side. The results look like the above two captures: complete JSON objects in each chunk, but the last chunk is very large.
The same error occurred: JSON parsing error expecting a comma or ]
.
2024-01-23 15:45:55.399 [error] SyntaxError: Expected ',' or ']' after array element in JSON at position 1434
at JSON.parse (<anonymous>)
at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:684:46)
at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:987:17)
at IncomingMessage.emit (node:events:513:28)
at addChunk (node:internal/streams/readable:324:12)
at readableAddChunk (node:internal/streams/readable:297:9)
at Readable.push (node:internal/streams/readable:234:10)
at HTTPParser.parserOnBody (node:_http_common:131:24)
at Socket.socketOnData (node:_http_client:542:22)
at Socket.emit (node:events:513:28)
at addChunk (node:internal/streams/readable:324:12)
at readableAddChunk (node:internal/streams/readable:297:9)
at Readable.push (node:internal/streams/readable:234:10)
at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
from twinny.
And just so it's all here, here's an example of a cut-off response object right before Twinny tries to parse it as JSON:
{"model":"codellama:7b-code","created_at":"2024-01-23T20:52:47.488087387Z","response":"","done":true,"context":[32007,29871,13,13,458,17088,29901,12728,313,7729,29897,29871,13,458,3497,21333,29901,934,597,29914,5959,29914,19254,6294,4270,401,29914,24299,29914,29878,29926,8628,279,21155,29889,29873,5080,1460,29899,29906,29889,29953,29889,29896,29955,29914,449,29914,17588,29889,1315,313,7729,29897,29871,13,18884,716,21501,3552,29872,29897,1149,23597,14885,1149,321,11864,29900,511,29871,29896,29872,29941,876,13,795,3482,13,9651,5615,890,13,4706,3980,13,418,2981,13,539,29906,29946,29929,29901,313,29872,29892,260,29897,1149,426,13,4706,376,1509,9406,1769,13,4706,4669,29889,7922,4854,29898,29873,29892,376,1649,267,7355,613,426,995,29901,1738,29900,500,511,13,3986,313,29873,29889,29880,8737,353,1780,29871,29900,511,13,3986,313,29873,29889,29880,8737,353,426,13,9651,23741,29901,426,13,795,1024,29901,376,10562,924,613,13,795,17752,29901,518,1642,1372,613,11393,312,29879,613,11393,29885,1372,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,23741,8423,29901,426,13,795,1024,29901,376,10562,924,9537,613,13,795,17752,29901,518,1642,1372,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3513,29901,426,13,795,1024,29901,376,29967,2516,613,13,795,17752,29901,518,1642,1315,613,11393,1315,29916,613,11393,29883,1315,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,6965,29916,29901,426,13,795,1024,29901,376,8700,29990,613,13,795,17752,29901,518,1642,1315,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3472,29901,426,13,795,1024,29901,376,7020,613,13,795,17752,29901,518,1642,13357,613,11393,1420,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,5997,29901,426,1024,29901,376,19407,613,17752,29901,518,1642,4268,3108,2981,13,9651,269,465,29901,426,13,795,1024,29901,376,8132,1799,613,13,795,17752,29901,518,1642,29879,465,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,885,893,29901,426,13,795,1024,29901,376,7187,1799,613,13,795,17752,29901,518,1642,1557,893,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,4390,29901,426,1024,29901,376,7249,613,17752,29901,518,1642,3126,613,11393,3126,29880,613,11393,24756,3126,3108,2981,13,9651,343,8807,29901,426,13,795,1024,29901,376,29979,23956,613,13,795,17752,29901,518,1642,21053,613,11393,25162,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,4903,29901,426,13,795,1024,29901,376,9165,613,13,795,17752,29901,518,1642,3134,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,2115,29901,426,13,795,1024,29901,376,8404,613,13,795,17752,29901,518,1642,1645,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,413,13961,29901,426,13,795,1024,29901,376,29968,13961,613,13,795,17752,29901,518,1642,1193,613,11393,1193,29885,613,11393,1193,29879,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,12086,29901,426,13,795,1024,29901,376,10840,2027,613,13,795,17752,29901,518,1642,26792,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,376,3318,573,29899,29883,1115,426,13,795,1024,29901,376,2061,573,315,613,13,795,17752,29901,518,1642,29882,613,11393,29885,613,11393,4317,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,21580,29901,426,13,795,1024,29901,376,29934,504,613,13,795,17752,29901,518,1642,2288,613,11393,2288,29889,262,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3017,29901,426,13,795,1024,29901,376,11980,613,13,795,17752,29901,518,1642,2272,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,274,29901,426,13,795,1024,29901,376,29907,613,13,795,17752,29901,518,1642,29883,613,11393,29882,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981
(That's for a different request than the capture, so the responses won't line up exactly.)
from twinny.
I just added a new version to cut off everything after context
in the JSON response if you say it's cut off it might help, please let me know. It's a bit of a hack but should work.
from twinny.
Thanks, I'll give it a try.
from twinny.
@rcgtnick all good?
from twinny.
EDIT: This was on v2.6.18.
Now it looks like multiple chunks are getting processed at the same time.
Here is the console.log with the object just before it's passed to JSON.parse
:
{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.671507727Z","response":"\n","done":false}
{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.683629072Z","response":" ","done":false}
And here's the stack trace from the extension upon calling JSON.parse:
log.ts:441 ERR [Extension Host] SyntaxError: Unexpected non-whitespace character after JSON at position 106
at JSON.parse (<anonymous>)
at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:690:48)
at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:1001:17)
at IncomingMessage.emit (node:events:513:28)
at Readable.read (node:internal/streams/readable:539:10)
at flow (node:internal/streams/readable:1023:34)
at resume_ (node:internal/streams/readable:1004:3)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Based on packet captures, the server is going to send multiple JSON objects in one response, and it looks like the extension is processing both as a single JSON object.
from twinny.
On v2.6.21 I see the large, truncated JSON object and a JSON parsing error, same as before 2.6.18.
ERR [Extension Host] SyntaxError: Unexpected end of JSON input
at JSON.parse (<anonymous>)
at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:709:50)
at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:1020:17)
at IncomingMessage.emit (node:events:513:28)
at Readable.read (node:internal/streams/readable:539:10)
at flow (node:internal/streams/readable:1023:34)
at resume_ (node:internal/streams/readable:1004:3)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
How about I give you a server you can send requests to?
from twinny.
I'd be happy to try a server hosted version in case there is any further can be done. However, I'm tempted to revert the flimsy changes introduced to remove the context in the previous release as it appears to be just a server issue. Feel free to email me at [email protected]
from twinny.
If this issue isn't high priority for you, I get it! I've stuck with this so far because I need a VSCode extension that will use a remote Ollama API, and most extensions I've tried only work with local Ollama.
I've done a bunch of testing with past versions, and I've found:
- the problem only occurs when response chunks reach a certain size: <1kB chunks are fine, but >30kB chunks are not
- as I go through versions from 2.6.0 to 2.6.15 I get longer responses back
I can't reproduce the issue with 2.6.14, but I can with 2.6.15 and later versions. The prompts are longer - I have a capture of 2.6.15 sending 25kB to the server and getting a 32kB response back (which triggers the JSON parsing issue).
In 2.6.21, if I set the Context Length from 300 (the default) down to 20 I do not encounter the issue. I get responses of around 5kB, which seem to be fine. At 100 lines of context I get responses around 11kB, which do cause the issue. I'll try using the extension with small values here and see if it's still helpful.
I'll send you an email and get you set up with a dev server in case you want to try it out.
EDIT: I found the "Num Predict Fim" setting, looks like that gets me back smaller responses too, probably more reliably than limiting the context length.
from twinny.
As for the code, I don't know JS, but it looks like maybe this pattern would be better than calling onData
every time data is available in the response. The callback handler for the on('data')
event is definitely getting called before the full HTTP chunk is available from the wire.
In the SO post, the .on('data')
callback just builds a string until the .on('end')
callback is called, at which point it processes the data. Maybe that's when you call the onData
callback?
from twinny.
I tried this ^^ and ended up where you did in an earlier attempt, sending all the chunks at once, which is a series of JSON objects but not one valid JSON object.
I tried looking for chunk delimiters in the stream and that seems to have done it. I'll put up a PR.
from twinny.
I see, I think that there may be a better way. The suggestion to add the callback in end
is not the correct approach but I think if we buffer the response until its parse-able that would be better. I will release a new version shortly to remove the flimsy fix and try this approach. If you PR makes it in time I will review it also. Thanks.
from twinny.
I just released version v2.6.23
, please could you test it and let me know how you get on?
from twinny.
I don't see v2.6.23 yet, but I tested on v2.6.22.
It's not raising the error any longer, but it seems harder to get it to suggest a completion. Maybe it's not related, but I really have to coax it now.
from twinny.
Version v2.6.22
includes the fix. For me on localhost this works fast and the same as before, this extra logic make no difference. For you, it may be that you server is taking some time send parse-able data before the completion callback is executed.
from twinny.
I think it would be better to look for the delimiters in the HTTP chunked transfer encoding than to try parsing every response as JSON. Trying to parse everything as JSON is less efficient and won't work if you get a packet that is the end of one chunk and the start of another.
Newlines are built into the protocol and are meant to tell you when a chunk ends, so splitting up the data base on newlines is more correct, more reliable, and more efficient.
I don't care if you use my PR or not, but there's a packet trace in the description that shows exactly what I mean.
from twinny.
Ok, thanks @rcgtnick Ill take another look soon.
from twinny.
Hey @rcgtnick I just merged #48 which includes your changes. You're correct that this is a better solution for speed and efficiency. Please let me know if it works and we'll close the issue.
Many thanks
from twinny.
Funny thing happened to me while I was writing a small LLM chat app that streamed responses. After a while of chatting with an LLM I started getting JSON parsing errors on my chunks.
It took me a minute, but eventually I realized I already knew what the problem was and how to solve the it!
from twinny.
Related Issues (20)
- Oobabooga vs. Twinny HOT 11
- et API Bearer Token is not working HOT 2
- Enhance Twinny with LiteLLM (and indirectly OpenRouter) Support HOT 4
- Default ollama `Chat Api Path` points to the wrong URL path HOT 2
- Cannot chat successfully with ollama HOT 2
- No robot icon, no completion HOT 3
- [Feature] Jetbrains plugins HOT 1
- Ideal setup of parallel chat and fim models HOT 2
- Possibly my mistake, but I keep getting this error HOT 2
- Support Comments Translation HOT 5
- Different shortcuts for single-line or multi-line suggestions HOT 2
- Outputs only "undefined" HOT 8
- When using the vscode twinny plugin with remote-ssh for remote development, code suggestions are not working. HOT 1
- something wrong with the new update no ability to add model name. what could the issue be? HOT 5
- VSCodium reports as it is not compatible with VS Code '1.81.1'. HOT 2
- Twinny for Visual Studio Community 2022 HOT 1
- Instructions for the configuration on MacOS with llama.cpp HOT 3
- Inline suggestion does not work when there is a .hg folder HOT 1
- Unable to interact with ollama running VSCode with WSL2 HOT 3
- Business use HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twinny.