GithubHelp home page GithubHelp logo

Comments (5)

mscdex avatar mscdex commented on August 20, 2024

even if the streams for previous files haven't been read yet

This is due to how node.js streams work. They buffer up to a highWaterMark amount of bytes (16KB is currently the default for readable streams -- which is what file streams are). The defaults used by this library are the same as the node.js streams defaults. You can however configure them to whatever you like via the fileHwm option for individual file streams and highWaterMark for the parser.

from busboy.

KernelDeimos avatar KernelDeimos commented on August 20, 2024

That makes sense, but why would more than 16KiB worth of files be emitted before a more previous file's stream has ended?

from busboy.

mscdex avatar mscdex commented on August 20, 2024

If you're suggesting that busboy stops parsing immediately (and stores the current chunk somewhere temporarily) when it sees the end of a part until the file stream has been completely read, that would not only complicate things (especially because writable streams do not have an unshift() method like readable streams do -- but that would cause performance issues anyway) but more importantly could potentially break things for existing users who may not be expecting that kind of behavior.

My suggestion would be to override one or both of the limits I previously mentioned.

from busboy.

KernelDeimos avatar KernelDeimos commented on August 20, 2024

That almost sounds like what I was thinking, but the description of when to pause sounds different and I don't understand why that implies a chunk needs to be stored somewhere temporarily.

Although highWatermark defaults to 16KiB, I get a lot more than 16KiB worth of files after a file who's stream is still unread. I imagine that busboy would emit all the file events for the chunks its consumed; that would be 16 - maybe 32 or or 48 - file events for my 1KiB files, right? Or, because I measured the mode chunk size from file streams as 64 KiB, it would be 64 file events (assuming the request stream produces chunks of the same size). I can also see why it might be 128 file events because grabbing the next chunk before any of the previous chunk's file's streams are read might be more performent.

Instead I get file events for all the files in my test case (8000 files) when I've only read the streams for the first < 100 files. With this result I assume busboy would read ahead indefinitely and consume all server memory if the request had enough really small files (as it is, my test case consumes 8MB of server memory immediately).

from busboy.

mscdex avatar mscdex commented on August 20, 2024

I think if you really want to minimize this sort of thing, the best you're going to be able to do is just have something that stops writing to the busboy instance until you've sufficiently determined that it's ok to continue parsing.

This could be accomplished by creating a custom Transform stream or similar that sits in between your upstream and the busboy instance. It would check how many file streams have not been completely read and then either continue passing data on or stop until the file streams have all been read or whatever logic you want.

from busboy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.