Describe the bug When I run a local Ollama server o

JSON parse errors handling streamed responses,about rjmacarthy/twinny

Comments (23)

oxaronick commented on May 26, 2024 1

2.6.24 looks good!

Glad I could help, and thanks for all your work on Twinny!

from twinny.

rjmacarthy commented on May 26, 2024

Hey @rcgtnick thanks for the report. I'm confused by this because twinny is using the stream: true option for Ollama API and this should always return multiple small JSON responses.

As far as I can tell you are using your server as some kind of proxy between twinny and the ollama instance? Are you running a proxy server between Ollama and your API? If so, are you also piping the request or taking the input and making an API call without stream: true this is the only reason that any responses would exceed 15kb.

Many thanks.

from twinny.

oxaronick commented on May 26, 2024

I may be wrong about the cause, then.

The only thing between Twinny and Ollama is a reverse proxy, which shouldn't be modifying headers or request bodies at all. I captured some network traffic for a request between the proxy and Ollama, to see what is happening on the wire:

ollama_query_server.txt

Looks like it's not the chunking - while the last chunk is quite large, it's still just one chunk. However, if I enable developer tools in VSCode, add a console.log to print the chunk right before it's passed to JSON.parse, I can see that it'schunk contains only part of the complete chunk from the server, and the error from the JSON module accurately describes the problem (data is cut off in the middle of an array, should have been a comma or a ] next).

Interestingly, though, a capture from the local Ollama request shows an even larger response, but it's received by Twinny all together and parsed just fine:

ollama_query_local.txt

Any other idea what would cause this?

from twinny.

rjmacarthy commented on May 26, 2024

After looking at both captures, all of the JSON that is returned/printed in the document seems to be parseable so it makes it harder for me to understand what could be causing it. If it is as you say that large responses are the issue I think a fix might be to check for done: true and cut off the context before parsing.

from twinny.

oxaronick commented on May 26, 2024

Just to rule out the proxy, I opened a port directly to Ollama and captured a Twinny request from the server side. The results look like the above two captures: complete JSON objects in each chunk, but the last chunk is very large.

The same error occurred: JSON parsing error expecting a comma or ].

ollama_query_server_notls.txt

2024-01-23 15:45:55.399 [error] SyntaxError: Expected ',' or ']' after array element in JSON at position 1434
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:684:46)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:987:17)
	at IncomingMessage.emit (node:events:513:28)
	at addChunk (node:internal/streams/readable:324:12)
	at readableAddChunk (node:internal/streams/readable:297:9)
	at Readable.push (node:internal/streams/readable:234:10)
	at HTTPParser.parserOnBody (node:_http_common:131:24)
	at Socket.socketOnData (node:_http_client:542:22)
	at Socket.emit (node:events:513:28)
	at addChunk (node:internal/streams/readable:324:12)
	at readableAddChunk (node:internal/streams/readable:297:9)
	at Readable.push (node:internal/streams/readable:234:10)
	at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

from twinny.

oxaronick commented on May 26, 2024

And just so it's all here, here's an example of a cut-off response object right before Twinny tries to parse it as JSON:

{"model":"codellama:7b-code","created_at":"2024-01-23T20:52:47.488087387Z","response":"","done":true,"context":[32007,29871,13,13,458,17088,29901,12728,313,7729,29897,29871,13,458,3497,21333,29901,934,597,29914,5959,29914,19254,6294,4270,401,29914,24299,29914,29878,29926,8628,279,21155,29889,29873,5080,1460,29899,29906,29889,29953,29889,29896,29955,29914,449,29914,17588,29889,1315,313,7729,29897,29871,13,18884,716,21501,3552,29872,29897,1149,23597,14885,1149,321,11864,29900,511,29871,29896,29872,29941,876,13,795,3482,13,9651,5615,890,13,4706,3980,13,418,2981,13,539,29906,29946,29929,29901,313,29872,29892,260,29897,1149,426,13,4706,376,1509,9406,1769,13,4706,4669,29889,7922,4854,29898,29873,29892,376,1649,267,7355,613,426,995,29901,1738,29900,500,511,13,3986,313,29873,29889,29880,8737,353,1780,29871,29900,511,13,3986,313,29873,29889,29880,8737,353,426,13,9651,23741,29901,426,13,795,1024,29901,376,10562,924,613,13,795,17752,29901,518,1642,1372,613,11393,312,29879,613,11393,29885,1372,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,23741,8423,29901,426,13,795,1024,29901,376,10562,924,9537,613,13,795,17752,29901,518,1642,1372,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3513,29901,426,13,795,1024,29901,376,29967,2516,613,13,795,17752,29901,518,1642,1315,613,11393,1315,29916,613,11393,29883,1315,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,6965,29916,29901,426,13,795,1024,29901,376,8700,29990,613,13,795,17752,29901,518,1642,1315,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3472,29901,426,13,795,1024,29901,376,7020,613,13,795,17752,29901,518,1642,13357,613,11393,1420,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,5997,29901,426,1024,29901,376,19407,613,17752,29901,518,1642,4268,3108,2981,13,9651,269,465,29901,426,13,795,1024,29901,376,8132,1799,613,13,795,17752,29901,518,1642,29879,465,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,885,893,29901,426,13,795,1024,29901,376,7187,1799,613,13,795,17752,29901,518,1642,1557,893,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,4390,29901,426,1024,29901,376,7249,613,17752,29901,518,1642,3126,613,11393,3126,29880,613,11393,24756,3126,3108,2981,13,9651,343,8807,29901,426,13,795,1024,29901,376,29979,23956,613,13,795,17752,29901,518,1642,21053,613,11393,25162,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,4903,29901,426,13,795,1024,29901,376,9165,613,13,795,17752,29901,518,1642,3134,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,2115,29901,426,13,795,1024,29901,376,8404,613,13,795,17752,29901,518,1642,1645,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,413,13961,29901,426,13,795,1024,29901,376,29968,13961,613,13,795,17752,29901,518,1642,1193,613,11393,1193,29885,613,11393,1193,29879,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,12086,29901,426,13,795,1024,29901,376,10840,2027,613,13,795,17752,29901,518,1642,26792,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,376,3318,573,29899,29883,1115,426,13,795,1024,29901,376,2061,573,315,613,13,795,17752,29901,518,1642,29882,613,11393,29885,613,11393,4317,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,21580,29901,426,13,795,1024,29901,376,29934,504,613,13,795,17752,29901,518,1642,2288,613,11393,2288,29889,262,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3017,29901,426,13,795,1024,29901,376,11980,613,13,795,17752,29901,518,1642,2272,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,274,29901,426,13,795,1024,29901,376,29907,613,13,795,17752,29901,518,1642,29883,613,11393,29882,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981

(That's for a different request than the capture, so the responses won't line up exactly.)

from twinny.

rjmacarthy commented on May 26, 2024

I just added a new version to cut off everything after context in the JSON response if you say it's cut off it might help, please let me know. It's a bit of a hack but should work.

from twinny.

oxaronick commented on May 26, 2024

Thanks, I'll give it a try.

from twinny.

rjmacarthy commented on May 26, 2024

@rcgtnick all good?

from twinny.

oxaronick commented on May 26, 2024

EDIT: This was on v2.6.18.

Now it looks like multiple chunks are getting processed at the same time.

Here is the console.log with the object just before it's passed to JSON.parse:

{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.671507727Z","response":"\n","done":false}
{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.683629072Z","response":"                ","done":false}

And here's the stack trace from the extension upon calling JSON.parse:

log.ts:441   ERR [Extension Host] SyntaxError: Unexpected non-whitespace character after JSON at position 106
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:690:48)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:1001:17)
	at IncomingMessage.emit (node:events:513:28)
	at Readable.read (node:internal/streams/readable:539:10)
	at flow (node:internal/streams/readable:1023:34)
	at resume_ (node:internal/streams/readable:1004:3)
	at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Based on packet captures, the server is going to send multiple JSON objects in one response, and it looks like the extension is processing both as a single JSON object.

from twinny.

oxaronick commented on May 26, 2024

On v2.6.21 I see the large, truncated JSON object and a JSON parsing error, same as before 2.6.18.

  ERR [Extension Host] SyntaxError: Unexpected end of JSON input
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:709:50)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:1020:17)
	at IncomingMessage.emit (node:events:513:28)
	at Readable.read (node:internal/streams/readable:539:10)
	at flow (node:internal/streams/readable:1023:34)
	at resume_ (node:internal/streams/readable:1004:3)
	at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

How about I give you a server you can send requests to?

from twinny.

rjmacarthy commented on May 26, 2024

I'd be happy to try a server hosted version in case there is any further can be done. However, I'm tempted to revert the flimsy changes introduced to remove the context in the previous release as it appears to be just a server issue. Feel free to email me at [email protected]

from twinny.

oxaronick commented on May 26, 2024

If this issue isn't high priority for you, I get it! I've stuck with this so far because I need a VSCode extension that will use a remote Ollama API, and most extensions I've tried only work with local Ollama.

I've done a bunch of testing with past versions, and I've found:

the problem only occurs when response chunks reach a certain size: <1kB chunks are fine, but >30kB chunks are not
as I go through versions from 2.6.0 to 2.6.15 I get longer responses back

I can't reproduce the issue with 2.6.14, but I can with 2.6.15 and later versions. The prompts are longer - I have a capture of 2.6.15 sending 25kB to the server and getting a 32kB response back (which triggers the JSON parsing issue).

In 2.6.21, if I set the Context Length from 300 (the default) down to 20 I do not encounter the issue. I get responses of around 5kB, which seem to be fine. At 100 lines of context I get responses around 11kB, which do cause the issue. I'll try using the extension with small values here and see if it's still helpful.

I'll send you an email and get you set up with a dev server in case you want to try it out.

EDIT: I found the "Num Predict Fim" setting, looks like that gets me back smaller responses too, probably more reliably than limiting the context length.

from twinny.

oxaronick commented on May 26, 2024

As for the code, I don't know JS, but it looks like maybe this pattern would be better than calling onData every time data is available in the response. The callback handler for the on('data') event is definitely getting called before the full HTTP chunk is available from the wire.

In the SO post, the .on('data') callback just builds a string until the .on('end') callback is called, at which point it processes the data. Maybe that's when you call the onData callback?

from twinny.

oxaronick commented on May 26, 2024

I tried this ^^ and ended up where you did in an earlier attempt, sending all the chunks at once, which is a series of JSON objects but not one valid JSON object.

I tried looking for chunk delimiters in the stream and that seems to have done it. I'll put up a PR.

from twinny.

rjmacarthy commented on May 26, 2024

I see, I think that there may be a better way. The suggestion to add the callback in end is not the correct approach but I think if we buffer the response until its parse-able that would be better. I will release a new version shortly to remove the flimsy fix and try this approach. If you PR makes it in time I will review it also. Thanks.

from twinny.

rjmacarthy commented on May 26, 2024

I just released version v2.6.23, please could you test it and let me know how you get on?

from twinny.

oxaronick commented on May 26, 2024

I don't see v2.6.23 yet, but I tested on v2.6.22.

It's not raising the error any longer, but it seems harder to get it to suggest a completion. Maybe it's not related, but I really have to coax it now.

from twinny.

rjmacarthy commented on May 26, 2024

Version v2.6.22 includes the fix. For me on localhost this works fast and the same as before, this extra logic make no difference. For you, it may be that you server is taking some time send parse-able data before the completion callback is executed.

from twinny.

oxaronick commented on May 26, 2024

I think it would be better to look for the delimiters in the HTTP chunked transfer encoding than to try parsing every response as JSON. Trying to parse everything as JSON is less efficient and won't work if you get a packet that is the end of one chunk and the start of another.

Newlines are built into the protocol and are meant to tell you when a chunk ends, so splitting up the data base on newlines is more correct, more reliable, and more efficient.

I don't care if you use my PR or not, but there's a packet trace in the description that shows exactly what I mean.

from twinny.

rjmacarthy commented on May 26, 2024

Ok, thanks @rcgtnick Ill take another look soon.

from twinny.

rjmacarthy commented on May 26, 2024

Hey @rcgtnick I just merged #48 which includes your changes. You're correct that this is a better solution for speed and efficiency. Please let me know if it works and we'll close the issue.

Many thanks

from twinny.

oxaronick commented on May 26, 2024

Funny thing happened to me while I was writing a small LLM chat app that streamed responses. After a while of chatting with an LLM I started getting JSON parsing errors on my chunks.

It took me a minute, but eventually I realized I already knew what the problem was and how to solve the it!

from twinny.

JSON parse errors handling streamed responses about twinny HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs