GithubHelp home page GithubHelp logo

Comments (9)

siggisim avatar siggisim commented on June 3, 2024

Hey @amit-mittal - thanks for reporting!

This type of error message most commonly occurs when your BuildBuddy instance restarts for whatever reason. Are you able to see if your BuildBuddy app has any restarts? Could it be running out of memory?

from buildbuddy.

amit-mittal avatar amit-mittal commented on June 3, 2024

Hey @siggisim - thanks for looking into it!

No, I don't see any crashes or restarts on the BuildBuddy server side. The node on which BuildBuddy is running has 60 GB memory, while only 16 GB is being used.

So far, we are able to repro this issue only while running tests of one of our Go services (if that is relevant).

from buildbuddy.

siggisim avatar siggisim commented on June 3, 2024

Interesting, is there any chance this is a CI run that runs multiple Bazel invocations back-to-back?

By default, Bazel will try to re-use the build event stream connection across invocations - which could be leading to the issues we're seeing here. You can disable this behavior with the bazel flag --keep_backend_build_event_connections_alive=false. I suspect that might solve this issue.

from buildbuddy.

amit-mittal avatar amit-mittal commented on June 3, 2024

I added --keep_backend_build_event_connections_alive=false in our .bazelrc, but I am seeing below errors while doing bazel build .... one after another, that never used to happen for us. The below errors go away if I remove this new setting, so I don't think the overhead of creating the new connection as part of every run would work for us.

WARNING: The background upload of the Build Event Protocol for the previous invocation failed with the following exception: 'com.google.devtools.build.lib.util.AbruptExitException: The Build Event Protocol upload failed: All retry attempts failed. UNAVAILABLE: UNAVAILABLE: Channel shutdown invoked UNAVAILABLE: UNAVAILABLE: Channel shutdown invoked'. Ignoring the failure and starting a new invocation..
WARNING: The background upload of the Build Event Protocol for the previous invocation failed to complete in 5.003 seconds. Cancelling and starting a new invocation...

I don't think it should matter, but the run is happening on the developer machine (MacOS).

Regarding the multiple invocations, we are NOT running bazel commands in parallel, but as part of the usual developer workflow, we do run bazel commands one after another. That is one of the reasons, that we have --bes_upload_mode=nowait_for_upload_complete set, so the developers are not blocked while the events are being uploaded.

from buildbuddy.

siggisim avatar siggisim commented on June 3, 2024

There are (unfortunately) lots of Bazel bugs with the --bes_upload_mode= flag:

There is some work being done to improve the BES artifact uploader:

Do you see the same error without that --bes_upload_mode= flag?

Do you see the same error if you add the flags --remote_timeout=3600 and --bes_timeout=3600s (wondering if a timeout is being hit and not handled gracefully)?

from buildbuddy.

amit-mittal avatar amit-mittal commented on June 3, 2024

That's true! 😞

I don't think we would be able to change the --bes_upload_mode to a blocking call, but we can try it out. As we upload the events in async mode, we'll also try increasing the timeout and share the findings.

We will also prioritize upgrading bazel to v4.2.1, to pick up the fixes in the BES uploader, if there were any. Thanks for helping to investigate the issue!

from buildbuddy.

BalestraPatrick avatar BalestraPatrick commented on June 3, 2024

We also see this crash only on our Linux CI (macOS CI is fine). We're on Bazel 4.2.0 and we don't even set --bes_upload_mode, but we still see the exact same crash. Setting --keep_backend_build_event_connections_alive=false did not make a difference for us.

from buildbuddy.

siggisim avatar siggisim commented on June 3, 2024

This is likely fixed by bazelbuild/bazel#13959 which hasn't made it into any Bazel releases yet.

Are either of you able to share your grpc log for one of these invocations captured with Bazel's --experimental_remote_grpc_log=? You can send it to [email protected]

from buildbuddy.

siggisim avatar siggisim commented on June 3, 2024

Going to close this issue now that bazelbuild/bazel@e855a26 seems to have made it into Bazel 5.0 release candidates bazelbuild/bazel#14013

Please re-open this issue if you're able to reproduce with this with Bazel 5.0 (in which case either a grpc log or a BuildBuddy Cloud invocation would be super helpful).

from buildbuddy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.