Comments (5)
From my reading of this bug, I still don't understand why supporting gRPC-level in-transit compression (https://github.com/grpc/grpc-go/blob/master/Documentation/compression.md) is not desired/allowed. This feature should be easy to enable and it is completely invisible to the application-level logic, correct?
The reason I ask is because we face the situation where some of our artifacts are too large to download in a reasonable amount of time. However, these artifacts can be compressed very well and they almost-never change (so they stay in the cache for a long time)—so we don't really care about on-disk compression. Having the ability to compress in-transit only would solve 90% of our problems with slow builds due to downloads, and I think this should be a fairly simple change?
More specifically, what we'd be looking at is having compression between the frontend and bb-clientd, and nothing else. Bazel nor the storage nodes wouldn't even have to know that the compression is happening.
from bb-storage.
+1 to gRPC-level in-transit compression.
Our biggest artifacts (~1GB) are large binaries with debug information that we know compress really well: adding "-gz=zlib" to our gcc options shrinks these by 60% or more. Unfortunately "-gz=zlib" adds 40+ seconds to the link time, which is a dealbreaker. We're optimistic gRPC compression would perform better.
(gcc 13 adds a "-gz=zstd" option which might be better, but we're on gcc 12.3 for now).
from bb-storage.
Hey @eagleonhill,
When compression support was added to the Remote Execution protocol, I was opposed to adding it in its given form. The reason being that the transmitting side does not properly send the size of the compressed object to the receiving side. With the way LocalBlobAccess is designed, it would be pretty hard to allocate space for objects as they are being written, as it tries to store objects consecutively. This means that compression would either be limited to just on wire (not at rest), or bb-storage would need to do unnecessary copying of data to guarantee consecutive storage.
Furthermore, one downside of compression is that it's a lot harder to provide fast random access to blobs. As Buildbarn relies heavily on virtual file systems (based on FUSE/NFSv4), this is something that we need. Even without compression the protocol's story around random access is a bit messy, as there's no way to guarantee data integrity by reading CAS objects partially.
Because of that, I am currently spending time with the remote API working group to get chunking/decomposition of CAS objects added. See these PRs:
- bazelbuild/remote-apis#236 <- In my opinion good to land
- bazelbuild/remote-apis#235 <- In my opinion also good to land
- bazelbuild/remote-apis#233 <- Still need to be rewritten on top of the two PRs above
Once we have proper support for chunking/decomposition, the story around compression will become a lot simpler. All (primitive) CAS objects will become bounded in size. This means that even if the receiving side doesn't know the size of a compressed CAS object, it can safely ingest it into memory to compute its size, before letting LocalBlobAccess write it to disk.
So in summary, the reason Buildbarn doesn't support compression is not because we don't care. We want to do it right, and that still takes a bit more time to sort out. I will keep this issue open, so that people can remain informed on updates.
from bb-storage.
Hey @EdSchouten
compression is also a very useful feature for us.
I have read the code of bazel-remote, in their decision, when receive bytestream.Write request with resource path like
{instance_name}/uploads/{uuid}/compressed-blobs/{compressor}/{uncompressed_hash}/{uncompressed_size}{/optional_metadata}
, they will decode the compressed data and write to the file if use diskCache backend.
zstd supports ”streaming decompression“, so they can decode and write in a loop, like this:
for {
compressedData = src.Recv()
uncompressedData = zstd.Decode(compressedData)
dst.write(uncompressedData)
}
maybe buildbarn can have a try on it. when receive WriteRequest, Allocator a space with uncompressed_hash
, then receive, decode, and write to the target Block.
from bb-storage.
That’s the point: if we add compression support, I don’t just want compression in transit. Also at rest. And that’s problematic, as we can’t allocate space for something we don’t know the size of.
from bb-storage.
Related Issues (20)
- GetCapabilities() behavior with mixed action cache permissions doesn't behave as desired
- Missing `storage_type` label on some local blobstore metrics
- Support highly available deployments HOT 11
- Revisit deprecation of GCS storage backend HOT 12
- Documentation needed for allowing cache access from different types of bazel clients HOT 1
- Missing shell in bb-storage docker image HOT 1
- Panic in local blockstore write HOT 3
- Doc Update: ISCC / AC storages are only compatible with local replicator HOT 1
- gRPC Client Certificate Refresh Interval is not respected
- Filesystem errors in bb-storage are recorded as "Unknown" in prometheus
- Feature request: Support RSA signed JWTs
- Feature request: Support JWKS for specifying JWT public keys HOT 4
- Failed to fetch file errors in "builds without the bytes" builds in a sharded setup HOT 7
- Failed to create authorization header parser for JWT authentication policy: Unsupported public key type HOT 3
- Cannot open raw block device provisioned by kubernetes when running as non root user HOT 3
- Support connection draining in kubernetes environments HOT 2
- Tunable LogLevels? HOT 3
- Is there any detailed description about config? HOT 6
- Creating buildbarn storage image doesn't work on bazel 7 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bb-storage.