GithubHelp home page GithubHelp logo

Comments (5)

jmmv avatar jmmv commented on June 11, 2024 2

From my reading of this bug, I still don't understand why supporting gRPC-level in-transit compression (https://github.com/grpc/grpc-go/blob/master/Documentation/compression.md) is not desired/allowed. This feature should be easy to enable and it is completely invisible to the application-level logic, correct?

The reason I ask is because we face the situation where some of our artifacts are too large to download in a reasonable amount of time. However, these artifacts can be compressed very well and they almost-never change (so they stay in the cache for a long time)—so we don't really care about on-disk compression. Having the ability to compress in-transit only would solve 90% of our problems with slow builds due to downloads, and I think this should be a fairly simple change?

More specifically, what we'd be looking at is having compression between the frontend and bb-clientd, and nothing else. Bazel nor the storage nodes wouldn't even have to know that the compression is happening.

from bb-storage.

kjteske avatar kjteske commented on June 11, 2024 2

+1 to gRPC-level in-transit compression.

Our biggest artifacts (~1GB) are large binaries with debug information that we know compress really well: adding "-gz=zlib" to our gcc options shrinks these by 60% or more. Unfortunately "-gz=zlib" adds 40+ seconds to the link time, which is a dealbreaker. We're optimistic gRPC compression would perform better.

(gcc 13 adds a "-gz=zstd" option which might be better, but we're on gcc 12.3 for now).

from bb-storage.

EdSchouten avatar EdSchouten commented on June 11, 2024

Hey @eagleonhill,

When compression support was added to the Remote Execution protocol, I was opposed to adding it in its given form. The reason being that the transmitting side does not properly send the size of the compressed object to the receiving side. With the way LocalBlobAccess is designed, it would be pretty hard to allocate space for objects as they are being written, as it tries to store objects consecutively. This means that compression would either be limited to just on wire (not at rest), or bb-storage would need to do unnecessary copying of data to guarantee consecutive storage.

Furthermore, one downside of compression is that it's a lot harder to provide fast random access to blobs. As Buildbarn relies heavily on virtual file systems (based on FUSE/NFSv4), this is something that we need. Even without compression the protocol's story around random access is a bit messy, as there's no way to guarantee data integrity by reading CAS objects partially.

Because of that, I am currently spending time with the remote API working group to get chunking/decomposition of CAS objects added. See these PRs:

Once we have proper support for chunking/decomposition, the story around compression will become a lot simpler. All (primitive) CAS objects will become bounded in size. This means that even if the receiving side doesn't know the size of a compressed CAS object, it can safely ingest it into memory to compute its size, before letting LocalBlobAccess write it to disk.

So in summary, the reason Buildbarn doesn't support compression is not because we don't care. We want to do it right, and that still takes a bit more time to sort out. I will keep this issue open, so that people can remain informed on updates.

from bb-storage.

bytearth avatar bytearth commented on June 11, 2024

Hey @EdSchouten
compression is also a very useful feature for us.
I have read the code of bazel-remote, in their decision, when receive bytestream.Write request with resource path like
{instance_name}/uploads/{uuid}/compressed-blobs/{compressor}/{uncompressed_hash}/{uncompressed_size}{/optional_metadata}, they will decode the compressed data and write to the file if use diskCache backend.
zstd supports ”streaming decompression“, so they can decode and write in a loop, like this:

for {
    compressedData = src.Recv()
    uncompressedData = zstd.Decode(compressedData)
    dst.write(uncompressedData)
}

maybe buildbarn can have a try on it. when receive WriteRequest, Allocator a space with uncompressed_hash, then receive, decode, and write to the target Block.

from bb-storage.

EdSchouten avatar EdSchouten commented on June 11, 2024

That’s the point: if we add compression support, I don’t just want compression in transit. Also at rest. And that’s problematic, as we can’t allocate space for something we don’t know the size of.

from bb-storage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.