Comments (7)
OPA seems to be doing the right thing here ie. rejecting the large file and returning a bundle load error which in turn should kick-off an exponential back-off delay based successive download. What's happening here probably is the memory is not getting released quick enough? I guess if GC were to happen we'd see memory being released. Go 1.19 introduced a concept of soft memory limit which helps to control GC behavior and it can be set via the GOMEMLIMIT
env variable. I haven't played around with this so not sure if it will help but is this something you've looked into?
from opa.
@ashutosh-narkar do we really need to load the file at all if the file size exceeds size_limit_bytes
? That seems to defeat the point of the setting. I haven't looked into it, but I'm assuming there's a way to check the file size of an item in the tarball before reading the bytes. Is that not the case?
from opa.
Looking at the implementation now, and maybe I'm missing someting, but it seems like the NextFile() function greedily reads all files on the first invocation, then serves them from cache on subsequent calls as long as there are more to return. We then call the readFile() on each of the files (which we've already read), which copies them to yet another buffer. Here we use Read
with a limit, which will fail if the size limit is exceeded. But at this point we've already read the entire file once, and now we read it again up to the size limit. So if we've set a size limit of 1 GB and we have a file in the tarball which is 5 GB, we'll now have 6 GB of data buffered before we return an error. In case we don't hit the size limit, we'll effectively have each file buffered twice, spending twice as much memory as we're required to.
Some alternative approaches I can think of:
- Have
NextFile
read lazily, i.e. only one file from the tar ball per invocation, and return the reader without copying the buffer. This seems like a reasonable expecation for a function called "NextFile", but it's a breaking change in behavior, as callers would now be required to read the stream of the "file" returned before callingNextFile
again. - Keep behavior of
NextFile
, but store thebytes.Buffer
on the descriptor rather than anio.Reader
, and make it accessible. That way we can reuse the buffer we've read elsewhere, and avoid the second copy. - Pass the size limit to the tar ball loader, and have it fail immediately when the size reported exceeds the limit, i.e. before we read anything more than the header.
2 and 3 aren't mutually exclusive, but would rather be good to have both done. To avoid keeping two copies of the file, and to not read files exceeding the limit.
As for 2, perhaps there is some more elegant non-intrusive way to do it. I'm open to suggestions.
from opa.
Thanke for investigating Anders. Do we need a CVE assigned to this?
from opa.
Anytime, Dolev 🙂 Quite pleased with the result!
As for CVE, I'd lean towards no. OPA will need to run under the premise that a remote bundle server can be trusted. If that is not the case, an OOM is about the least harmful thing a malicious actor could accomplish. If they can tamper with the contents of bundles, they could e.g. change an authorization policy to allow them access, or in the case of discovery bundles, changing OPA's configuration to e.g. turn off decision logging or whatnot.
Having the size limit exceeded in real-world deployments is likely going to happen by accident, like where a user accidentally includes a big file by mistake. The fact that a mistake could cause an OOM is of course not good, and I'm happy to see this fixed. But similarly, there are many mistakes one might do as a bundle server "owner" which could have quite severe consequences, so I can't say I'm more worried about this than other imaginable scenarios.
Happy to discuss more if you think differently!
from opa.
Hi,
I agree that if a bundle server is compromised, DoS is the least interesting abuse case. But then again, bundle signing is also a thing, so there's some assumption things can get risky even if you're supposedly trusting your bundle server, no?
Totally up to the project to decide at the end of the day :) what's important is that it's fixed!
from opa.
You're right — I didn't really make the distinction between whoever creates the bundle and who hosts it, and I should have! Indeed, bundle signing is a good extra measure. Assuming that is in place, the only actor who could accomplish this would be whoever built and signed the bundle, and if that's a malicious actor... not a whole lot we can do then.
Let's leave it as a (soon to be fixed!) issue for now. If others have opinions here, please make your voices heard :)
Thanks Dolev!
from opa.
Related Issues (20)
- Rego loop not terminating on first successful match HOT 4
- Add support for newer versions of Jetbrains IDEs HOT 1
- Add Rego-Version attribute to Bundle
- OPA Pulling OCI Bundle from AWS ECR HOT 4
- Support for Unsigned Payload or provided content sha256 in AWS signing HOT 2
- Support generation of Presigned URLs in AWS signer HOT 2
- "Overlapping roots problem" With bundles appearing only on the linux executable for latest versions HOT 8
- Nil response HOT 4
- Status of the statement "GraphQL API Authorization with OPA is currently experimental" HOT 7
- Metadata: reported location is wrong when error encountered in first attribute HOT 1
- OPA - ambiguity between union and set comprehension operator due to bracket removal HOT 1
- OCI with private Azure Container Registry: download of policies fails with HTTP 403 HOT 10
- `opa inspect` fails on unknown Rego functions
- Unify yaml dependency to sigs.k8s.io/yaml HOT 1
- More descriptive AWS error messages. HOT 3
- `opa exec` hangs forever if provided malformed bundle HOT 1
- json.match_schema() does not validate regex patterns defined in the schema
- json.match_schema() does not work reliable with array input and schema
- Docs should mention homebrew as an install option
- Add Batch Query REST API
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opa.