GithubHelp home page GithubHelp logo

Comments (7)

ashutosh-narkar avatar ashutosh-narkar commented on May 9, 2024

OPA seems to be doing the right thing here ie. rejecting the large file and returning a bundle load error which in turn should kick-off an exponential back-off delay based successive download. What's happening here probably is the memory is not getting released quick enough? I guess if GC were to happen we'd see memory being released. Go 1.19 introduced a concept of soft memory limit which helps to control GC behavior and it can be set via the GOMEMLIMIT env variable. I haven't played around with this so not sure if it will help but is this something you've looked into?

from opa.

anderseknert avatar anderseknert commented on May 9, 2024

@ashutosh-narkar do we really need to load the file at all if the file size exceeds size_limit_bytes? That seems to defeat the point of the setting. I haven't looked into it, but I'm assuming there's a way to check the file size of an item in the tarball before reading the bytes. Is that not the case?

from opa.

anderseknert avatar anderseknert commented on May 9, 2024

Looking at the implementation now, and maybe I'm missing someting, but it seems like the NextFile() function greedily reads all files on the first invocation, then serves them from cache on subsequent calls as long as there are more to return. We then call the readFile() on each of the files (which we've already read), which copies them to yet another buffer. Here we use Read with a limit, which will fail if the size limit is exceeded. But at this point we've already read the entire file once, and now we read it again up to the size limit. So if we've set a size limit of 1 GB and we have a file in the tarball which is 5 GB, we'll now have 6 GB of data buffered before we return an error. In case we don't hit the size limit, we'll effectively have each file buffered twice, spending twice as much memory as we're required to.

Some alternative approaches I can think of:

  1. Have NextFile read lazily, i.e. only one file from the tar ball per invocation, and return the reader without copying the buffer. This seems like a reasonable expecation for a function called "NextFile", but it's a breaking change in behavior, as callers would now be required to read the stream of the "file" returned before calling NextFile again.
  2. Keep behavior of NextFile, but store the bytes.Buffer on the descriptor rather than an io.Reader, and make it accessible. That way we can reuse the buffer we've read elsewhere, and avoid the second copy.
  3. Pass the size limit to the tar ball loader, and have it fail immediately when the size reported exceeds the limit, i.e. before we read anything more than the header.

2 and 3 aren't mutually exclusive, but would rather be good to have both done. To avoid keeping two copies of the file, and to not read files exceeding the limit.

As for 2, perhaps there is some more elegant non-intrusive way to do it. I'm open to suggestions.

from opa.

dolevf avatar dolevf commented on May 9, 2024

Thanke for investigating Anders. Do we need a CVE assigned to this?

from opa.

anderseknert avatar anderseknert commented on May 9, 2024

Anytime, Dolev 🙂 Quite pleased with the result!

As for CVE, I'd lean towards no. OPA will need to run under the premise that a remote bundle server can be trusted. If that is not the case, an OOM is about the least harmful thing a malicious actor could accomplish. If they can tamper with the contents of bundles, they could e.g. change an authorization policy to allow them access, or in the case of discovery bundles, changing OPA's configuration to e.g. turn off decision logging or whatnot.

Having the size limit exceeded in real-world deployments is likely going to happen by accident, like where a user accidentally includes a big file by mistake. The fact that a mistake could cause an OOM is of course not good, and I'm happy to see this fixed. But similarly, there are many mistakes one might do as a bundle server "owner" which could have quite severe consequences, so I can't say I'm more worried about this than other imaginable scenarios.

Happy to discuss more if you think differently!

from opa.

dolevf avatar dolevf commented on May 9, 2024

Hi,

I agree that if a bundle server is compromised, DoS is the least interesting abuse case. But then again, bundle signing is also a thing, so there's some assumption things can get risky even if you're supposedly trusting your bundle server, no?

Totally up to the project to decide at the end of the day :) what's important is that it's fixed!

from opa.

anderseknert avatar anderseknert commented on May 9, 2024

You're right — I didn't really make the distinction between whoever creates the bundle and who hosts it, and I should have! Indeed, bundle signing is a good extra measure. Assuming that is in place, the only actor who could accomplish this would be whoever built and signed the bundle, and if that's a malicious actor... not a whole lot we can do then.

Let's leave it as a (soon to be fixed!) issue for now. If others have opinions here, please make your voices heard :)

Thanks Dolev!

from opa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.