rules_ll replicates remote execution e

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Remote execution uses over ~50x more cache transfers than local upstreaming about buildbuddy HOT 5 CLOSED

buildbuddy-io commented on June 15, 2024

Remote execution uses over ~50x more cache transfers than local upstreaming

from buildbuddy.

Comments (5)

siggisim commented on June 15, 2024 1

Totally agree that execution metadata can be easy to miss, will work on making it more readable!

from buildbuddy.

siggisim commented on June 15, 2024

Hey @aaronmondal - if a remote executor doesn't already have this input in its local disk cache needed to perform a remote action, it must fetch it from the cache. If the executor already has artifact in its local cache, it will not be fetched again. There can be many (dozens, hundreds, +) executors running at given time so it can take a few builds for a given cache artifact to wind up on every executor. You can explore the Executions tab and click on an execution to see the inputs and how large they are.

If much of this size is from toolchains / some large inputs that get pulled in for every action - you can consider putting this on a custom container-image which gets pulled from a docker registry rather than from cache https://www.buildbuddy.io/docs/rbe-platforms/#using-a-custom-docker-image

from buildbuddy.

aaronmondal commented on June 15, 2024

@siggisim Thanks for the swift reply! I might have an idea where that download size is coming from, though this might be a "bug" in the UI.

After looking at the build again it just didnt make sense that there would be such a large download size from the Bazel artifacts:

https://app.buildbuddy.io/invocation/be9102f9-6977-47b0-8ab9-25ddb38c6265

The executors tab here shows ~6000 actions with a max read size of ~0.28KB, which would amount to a max download of ~2MB. That looks a bit strange, but since this was a clean build it might just be profiling artifacts exchanged between executors.

The large amount of CAS hits probably has some part in this. ~500.000 hits at max artifact size of ~130KB are still just max ~65GB though.

However, we already do use a custom image, and that image is quite large at ~2.8 GB. If the docker pull of that image on 50 workers is tracked in the download size, tht would explain the ~140GB that I can't find in the logs (though 50 jobs=50 separate pulls?).

Another part is that http_archive fetches seem to be missing from the logs as well. I expect ~700MB of fetches there which on 50 workers again could contribute to the overall download size.

So one of these, or both might be missing metrics in the UI (or I just couldn't find it?):

Docker pull size
http_archive (and similar) fetch size

It might also be relevant that we are using bzlmod for all of this, and the log might be missing information because fetches triggered by module resolution is not tracked correctly.

from buildbuddy.

siggisim commented on June 15, 2024

Hey @aaronmondal - you can take a look at the Executions tab and sort by "File size downloaded"

It shows that many of the (5,323) remote actions have 700MB + of inputs (these numbers only show networked downloads, and skip artifacts that are already present on the remote executor). You can click on these individual actions and explore the input files to see where this is going.

Docker pull size doesn't affect these stats, since they're pulled from a docker registry rather than from the cache.

http_archives fetches don't count because they are downloaded to your machine that is hosting Bazel, not to the remote executor (unless it's listed as an input to a particular remote action).

from buildbuddy.

aaronmondal commented on June 15, 2024

@siggisim Ahhh now is see it. Ok then this is all clear and this of course fully explains everything.

Maybe it would be a good idea to make that small text larger and higher contrast. On a (fairly high quality) 4k display this is so small and low contrast that I overlooked that text even after looking at these logs for like a really long time. I just always read that 0.28KB number which takes all the visual focus since it is so much more pronounced. I assumed that those 0.28KB values were the download size, completely overlooking the 808 MB value.

I don't have a visual impairment, but an occasional case of "being a very dumb user". Maybe this might classifies as an accessibility issue regardless 😅

from buildbuddy.

Remote execution uses over ~50x more cache transfers than local upstreaming about buildbuddy HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs