short version When uploading many files under 16KiB, busboy will r

That almost sounds like what I was thinking, but the deion of when to pause soun

busboy seems to read ahead too fast for many small (< 16KiB) files about busboy HOT 5 OPEN

KernelDeimos commented on August 20, 2024

busboy seems to read ahead too fast for many small (< 16KiB) files

from busboy.

Comments (5)

mscdex commented on August 20, 2024

even if the streams for previous files haven't been read yet

This is due to how node.js streams work. They buffer up to a highWaterMark amount of bytes (16KB is currently the default for readable streams -- which is what file streams are). The defaults used by this library are the same as the node.js streams defaults. You can however configure them to whatever you like via the fileHwm option for individual file streams and highWaterMark for the parser.

from busboy.

KernelDeimos commented on August 20, 2024

That makes sense, but why would more than 16KiB worth of files be emitted before a more previous file's stream has ended?

from busboy.

mscdex commented on August 20, 2024

If you're suggesting that busboy stops parsing immediately (and stores the current chunk somewhere temporarily) when it sees the end of a part until the file stream has been completely read, that would not only complicate things (especially because writable streams do not have an unshift() method like readable streams do -- but that would cause performance issues anyway) but more importantly could potentially break things for existing users who may not be expecting that kind of behavior.

My suggestion would be to override one or both of the limits I previously mentioned.

from busboy.

KernelDeimos commented on August 20, 2024

That almost sounds like what I was thinking, but the description of when to pause sounds different and I don't understand why that implies a chunk needs to be stored somewhere temporarily.

Although highWatermark defaults to 16KiB, I get a lot more than 16KiB worth of files after a file who's stream is still unread. I imagine that busboy would emit all the file events for the chunks its consumed; that would be 16 - maybe 32 or or 48 - file events for my 1KiB files, right? Or, because I measured the mode chunk size from file streams as 64 KiB, it would be 64 file events (assuming the request stream produces chunks of the same size). I can also see why it might be 128 file events because grabbing the next chunk before any of the previous chunk's file's streams are read might be more performent.

Instead I get file events for all the files in my test case (8000 files) when I've only read the streams for the first < 100 files. With this result I assume busboy would read ahead indefinitely and consume all server memory if the request had enough really small files (as it is, my test case consumes 8MB of server memory immediately).

from busboy.

mscdex commented on August 20, 2024

I think if you really want to minimize this sort of thing, the best you're going to be able to do is just have something that stops writing to the busboy instance until you've sufficiently determined that it's ok to continue parsing.

This could be accomplished by creating a custom Transform stream or similar that sits in between your upstream and the busboy instance. It would check how many file streams have not been completely read and then either continue passing data on or stop until the file streams have all been read or whatever logic you want.

from busboy.

busboy seems to read ahead too fast for many small (< 16KiB) files about busboy HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs