buildbarn / bb-storage Goto Github PK
View Code? Open in Web Editor NEWStorage daemon, capable of storing data for the Remote Execution protocol
License: Apache License 2.0
Storage daemon, capable of storing data for the Remote Execution protocol
License: Apache License 2.0
Buildbarn's story with regards to authentication is becoming a lot more complete:
Authentication: Bearer ${foo}
headers in bazelbuild/bazel#10015 and bazelbuild/bazel#10634.All we need to do now is add a proper implementation of Authenticator in pkg/grpc that uses OAuth2, OIDC, or just plain JWTs. What ever is the big trend nowadays. Maybe some of the code in #6 may be repurposed.
During the monthly https://github.com/bazelbuild/remote-apis meeting 2020-04-14, the following was decided:
We should standardize this. Servers should allow references to the empty blobs, and should support serving the empty blob even if it has not been uploaded. Clients should feel free to optimize within this to avoid uploading or downloading empty blobs.
Implementing this will sort out the Bazel behaviour mentioned in bazelbuild/bazel#11063.
Within bb_worker, I sometimes experience a panic when writing to the local blockstore.
bb-storage@6bd6d5d
panic: runtime error: slice bounds out of range [:-53506]
goroutine 170339 [running]:
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*blockDeviceBackedBlockWriter).Write(0xc001ca96c0, {0xc002860000?, 0x10000, 0x10000?})
external/com_github_buildbarn_bb_storage/pkg/blobstore/local/block_device_backed_block_allocator.go:339 +0x745
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.intoWriterViaChunkReader({0x14b7998?, 0xc001ca9700}, {0x14ab820, 0xc001ca96c0})
external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/common_conversions.go:24 +0x135
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.(*casClonedBuffer).IntoWriter(0xc00231b650?, {0x14ab820, 0xc001ca96c0})
external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_cloned_buffer.go:91 +0x3c
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*blockDeviceBackedBlock).Put.func1({0x14ca098?, 0xc0001ce850?})
external/com_github_buildbarn_bb_storage/pkg/blobstore/local/block_device_backed_block_allocator.go:255 +0x4d
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).Put.func1({0x14ca098?, 0xc0001ce850?})
external/com_github_buildbarn_bb_storage/pkg/blobstore/local/old_current_new_location_blob_map.go:374 +0x47
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*flatBlobAccess).Put(0xc0007b1100, {0x411823d000000000?, 0xc002623fa8?}, {{0xc000cd0f50?, 0x0?}}, {0x14ca098, 0xc0001ce850})
external/com_github_buildbarn_bb_storage/pkg/blobstore/local/flat_blob_access.go:294 +0x11c
github.com/buildbarn/bb-storage/pkg/blobstore.(*metricsBlobAccess).Put(0xc0003e13f0, {0x14bbe18, 0xc000bc1f20}, {{0xc000cd0f50?, 0x0?}}, {0x14ca098?, 0xc0001ce850?})
external/com_github_buildbarn_bb_storage/pkg/blobstore/metrics_blob_access.go:140 +0x132
github.com/buildbarn/bb-storage/pkg/blobstore/replication.(*localBlobReplicator).ReplicateSingle.func1() external/com_github_buildbarn_bb_storage/pkg/blobstore/replication/local_blob_replicator.go:38 +0x3e
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.newCASBufferWithBackgroundTask.func1()
external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_buffer_with_background_task.go:42 +0x26
created by github.com/buildbarn/bb-storage/pkg/blobstore/buffer.newCASBufferWithBackgroundTask
external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_buffer_with_background_task.go:41 +0x1aa
I cannot yet reliably reproduce this, but it is guessed that this occurs when the syscall pwrite returns no-error but does a short-write, so return n < len(p)
, but err == nil
func (bd *memoryMappedBlockDevice) WriteAt(p []byte, off int64) (int, error) {
// Let write actions go through the file descriptor. Doing so
// yields better performance, as writes through a memory map
// would trigger a page fault that causes data to be read.
//
// TODO: Maybe it makes sense to let unaligned writes that would
// trigger reads anyway to go through the memory map?
return unix.Pwrite(bd.fd, p, off)
}
Then, io.WriterAt requires that if n < len(p)
that err != nil
This potentially could be solved by adding a loop to memoryMappedBlockDevice.WriteAt(), to repeatedly call it in case we see short writes.
When a remoteexecution.Execute
grpc call is forwarded to the bb-scheduler the bazel invocation id gets lost.
The scheduler then creates only a single queue for all bazel invocations.
Whilst configuring my bb-storage to use the S3 cloud blob storage I noticed that I was getting io.grpc.StatusRuntimeException: UNKNOWN: EOF
in Bazel without any error reporting from Buildbarn.
This turned out to be because I had set my bucket
parameter incorrectly inside of the s3 configuration.
The storage instance should produce some form of log to report this, which may or may not be propagated to Bazel. The likely source of this issue is at cloud_blob_access.go#L33 and likely resides within the other functions within this file. This get should definitely be reporting some form of issue, but perhaps gocloud.dev isn't providing any sort of error?
The code at create_blob_access.go#L193 suggests that when provided with this incorrect configuration, gocloud.dev
isn't returning an err
value.
Using https://blog.bazel.build/2019/05/07/builds-without-bytes.html requires all action outputs to persist during a build. When the circular disk storage gets full, Bazel experiences sporadic failures when blobs are being evicted from the cache.
To solve this, one could rewrite the CAS blobs requested through Get()
or FindMissing()
, whenever they get close to being evicted. The threshold can be configured as a percentage threshold.
Action cache requests can also call FindMissing()
to the CAS for all outputs. As this introduces latency, it should be a configurable option as some users might choose to avoid automatically evicting blobs from their storage.
Any comments or shall I start implementing?
Also see: buildbarn/bb-remote-execution#37
To reduce duplication, we could also consider breaking this all out into a parent set of actions that the bb-* repositories can call upon. This would probably be a new repository in the buildbarn namespace.
An attempt to connect to the GCS bucket with enabled Workload Identity fails with the following log:
ERROR: /gcp/namespaces/vpn/certs-generator/BUILD.bazel:4:1 Executing genrule //build/linters/metalinter_lib:generated-metalinter failed (Exit 34). Note: Remote connection/protocol failed with: execution failed io.grpc.StatusRuntimeException: UNKNOWN: blob (key "ace0c0360d8d2c139fbcb5abfcf437743c458f1065d152ecbdf7b72bcbb21a54c5-142-remote-execution") (code=Unknown):
Get "https://storage.googleapis.com/bazel-remote-execution-trusted.project-id.organisation-name/ace0c0360d8d2c139fbcb5abfcf437743c458f1065d152ecbdf7b72bcbb21a54c5-142-remote-execution": metadata: GCE metadata "instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.read_write" not defined
Am I doing something wrong or does bb-storage not support this connection method?
Storage Configuration:
{
blobstore: {
contentAddressableStorage: {
cloud: {
key_prefix: "cas",
gcs: {
bucket: "bazel-remote-execution-trusted.project-id.organisation-name",
},
},
},
actionCache: {
cloud: {
key_prefix: "ac",
gcs: {
bucket: "bazel-remote-execution-trusted.project-id.organisation-name",
},
},
},
},
httpListenAddress: ':9980',
grpcServers: [{
listenAddresses: [':8981'],
authenticationPolicy: { allow: {} },
}],
allowAcUpdatesForInstances: ['remote-execution'],
maximumMessageSizeBytes: 16 * 1024 * 1024,
}
When switching to local storage, everything works correctly.
According to instructions, GCP service account successfully connected to pod:
I have no name!@storage-0:/$ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* buildbarn-trusted-storage-service@project-id.iam.gserviceaccount.com
Documentation on main page states that this repository contains copy of storage daemon. Is this really a copy (if so then where is the original), or just documentation could be improved?
I'm seeking help actually.
I run the remote execution with docker-compose referring to the deployment repo:
It runs two shared storage. One of them down unexpected. It works well after I restart the storage service.
But I always get "Failed to obtain input file "external/xxx": Blob not found" error. And the error is gone after I re-run the "bazel build xxx" command.
I'm wondering whether there is any problem with one of the storage services. How can I figure out which the bad storage service is? How can I fix this problem? Clean all the disk storage of the bad one?
An example of doing this is at https://github.com/actions/starter-workflows/blob/master/ci/docker-push.yml
It would be very valuable to support either an active-active or even active-passive deployments to provide a HA deployment configuration option. From my reading of the other blobstore options, I do not see a configuration set that would provide HA characteristics, especially for a local
backend.
bb-storage/pkg/proto/configuration/blobstore/blobstore.proto
Lines 53 to 60 in a575837
It appears that the mirrored blobstore configuration does not provide highly available deployments, although its not clear to me why. This would seem like the most likely candidate for HA and would provide active-active configuration.
My goal is to be able to upgrade storage nodes with zero downtime. Without backing from an external store (e.g. S3), I am not seeing an option that would enable zero downtime storage deployments.
Do you have alternative suggestions?
As discussed during office hours, it would be great if in addition to the code comments in .proto files you could provide documentation on how to use the available configuration parameters in examples. The specific example I'm looking for is for demultiplexing to allow for different remote_instance_names.
The use case I'm trying to solve for is allowing remote execution from 2 different type of bazel clients. CI clients coming from Jenkins and user clients coming from individual dev VMs.
One option would be to use 2 different instance_names so their caches are stored separately. Ed mentioned a better option. which would be to allow ReadWrite access to the cache by the CI clients and ReadOnly access by the user clients. Documentation for this 2nd option would also be extremely helpful.
After our Buildbarn (combined with Goma) update in November/December, which changed the cache system to the new "local" system we have had to restart the cache backend several times a week, frequently at least once a day, because the system slows down as memory usage grows.
Until this week, the backend was installed on an older server, with 256 GB RAM, working on a 1.6 TB cache on SATA SSDs in a RAID
This week, the backend (which was also updated to a Jun 12 state) was moved to a modern machine, also with 256 GB RAM, and much more diskspace on NVMe drives, although only ~400 GB is currently used.
What I am observing is that as the memory usage of the bb-storage process passes ~70% of RAM (~180 GB), builds slow down significantly (today I restarted at 75% because a colleague reported slowness when building). RAM usage seems to top out just over ~80% of RAM (~205 GB).
Based on classification of RAM usage, e.g. by top, the memory usage seems to overlap with the memory allocated to file-caching, which also matches my impression that "local" uses memory mapping of files.
My current policy is to restart the backend if I notice bb-storage memory usage start inching towards 60%.
Would it be possible to include https://github.com/uber-go/automaxprocs with all BuildBarn services? Running BuildBarn in Kubernetes (which uses cgroups) does not correctly use all available resources.
In reference to: https://buildteamworld.slack.com/archives/CD6HZC750/p1645457888693499
Based on the discussion,this code should also consider the acGetAuthorizer permissions.
Additional context in case slack messages disappear:
repro authorizers:
actionCacheAuthorizers: {
get: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
put: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed"] }},
},
contentAddressableStorageAuthorizers: {
get: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
put: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
findMissing: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
},
executeAuthorizer:{ instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
If a client tries to do simple remote caching, (without execution) against the not-allowed
instance prefix, they will receive a permission denied and their build will fail, but the expected behavior is that they are allowed to read from the action cache, but not write to it.
We have recently started exploring Buildbarn. Sorry for silly questions but can we do shell access to the pods for an insight of especially the storage (what's the data in CAS/AC in case of a circular storage?). I get the fact that logging is only enabled for failure as per best practices and to avoid unnecessary logging.. Thanks
I've been hitting this restriction occasionally, I think correlated with large/wide builds. Bazel 0.26.0. The buildbarn error is reported here https://github.com/buildbarn/bb-storage/blob/master/pkg/cas/byte_stream_server.go#L63
I don't know under what circumstances bazel would request a part of a file, but in my experience it's infrequent but not never.
Bazel error message:
(10:11:45) ERROR: …/BUILD.bazel:1:1: Extracting interface … failed (Exit 34). Note: Remote connection/protocol failed with: execution failed io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files: java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files
at com.google.devtools.build.lib.remote.GrpcRemoteCache.lambda$downloadBlob$4(GrpcRemoteCache.java:318)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:175)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:162)
at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:107)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:866)
at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:752)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:185)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:162)
at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:116)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:866)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:711)
at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:54)
at com.google.devtools.build.lib.remote.GrpcRemoteCache$2.onError(GrpcRemoteCache.java:358)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:398)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files
at io.grpc.Status.asRuntimeException(Status.java:532)
... 19 more
A slight downside is that we currently can't easily simulate cases where a token first becomes cached and then expires. If we wanted to do things a hundred percent tidy (and I'd leave it up to you whether you'd be up for that), we could remove all clock access from the Buildbarn codebase (is there any apart from the JWT handling?) by this:
https://github.com/benbjohnson/clock
We could let auth.NewJWTAuthCache()
have a clock parameter.
But then the next problem becomes: jwt-go only has some very rudimentary code to override the clock. It can only be replaced globally, as opposed to per parser. Here's a local patch we could add/try to upstream.
Thoughts?
Originally posted by @EdSchouten in #6
We have a setup that looks something like:
Is less than 4mb:
-> Use Redis
Else:
-> Use S3
We are also using remote execution nodes. We are trying to keep our points of failure low, so our CAS servers only serve clients and our workers directly talk to S3 and redis. This allows us to scale every easily with minimal points of failure.
We are also trying to get our clients setup (in bazel) with --remote_download_toplevel
and --remote_download_minimal
, however we have found that S3 is caching our 404 (cache misses). This is especially problematic for us because in what is happening is:
Client -> CAS -> S3 -> Do you have ObjectA?
S3 -> CAS -> Client -> No (404)
Client -> Worker -> Go build ObjectA
Worker -> S3 -> Put ObjectA
Worker -> Client -> Done (client does not download ObjectA)
Client -> Worker -> Go build ObjectB that uses ObjectA
Worker -> S3 -> Do you have ObjectA?
S3 -> Worker -> No (404) (cached result).
Worker -> Client -> Give me ObjectA
Client -> .... Failure because it expected ObjectA to exist.
I would really prefer not to put another point of failure between S3 and the workers/cas instances. The CAS instances are really light weight in our setup and only used to forward data from S3 and redis, but not cache stuff locally (yet). If put a CAS node in front of S3 we'll add a point of failure + the maintenance of them, plus we'd have to figure out a way to synchronize their consistency.
What I think would be the best solution is to extend the way ExistenceCachingBlobAccessConfiguration
works to have an additional optional BlobAccessConfiguration
which can be used for the key lookup and the value lookups always go to backend
. In our case we'd use redis as our existence database and S3 as our blob store. We'd also need to add a salt of some kind to prevent this "cache" entry from being a positive hit but no data associated with it when sharing the blobstore with the real data.
We are happy to implement this, but figured I'd query here before we go off and do it.
The most common meaning of "circular file" is a trash can. This is even what you get when you search "circular file computing" or "circular file storage" on Google. It would be bad if the file-backed storage was actually deleting its contents. Different terminology might be better.
When trying to do a simple test of recc + buildgrid + bb-storage
, all uploads fail due to recc
making use of the BatchUpdateBlobs endpoint for collections of blobs under a certain size. There doesn't seem to be a way specified in the API to tell clients to use bytestream.Write
only[1], so it would be great if it was supported.
[1] I suppose the Capabilities API could report a very small non-zero number for max_batch_total_size_bytes
, but having support for BatchUpdateBlobs would be better IMO.
I'm not seeing much in the way of logging in the bb-storage codebase, and my bb-storage instances aren't outputting any logs at all. Is this intentional?
Happy to open a PR that adds additional logging (perhaps hidden behind a flag). We're attempting to add Bazel remote execution to thought-machine/please, hence the desire to see what buildbarn is doing.
As reported in buildbarn/bb-clientd#7
#149 added support for using client and server certificates from files on disk, for both gRPC and HTTP connections. It also added support for refreshing these certificates on an interval.
Specifically For gRPC Client certificates, the rotation is not working as intended. The original certificate is used for the duration of the process.
This is due to the same issue seen here, grpc/grpc-go#5791, where tls.Config.GetClientCertificate is not respected within grpc-go. The solution is documented to use advancedtls.NewServerCreds instead of credentials.NewTLS.
Currently, we cannot easily switch to this alternate API, since it drops support for configuring MinTLS Version as well as TLS CipherSuites and thus would be a breaking change. There is an open ticket grpc/grpc-go#5667 for adding support for configuring CipherSuites with advancedtls
, but until then, grpc client certificates will not properly rotate.
It would be useful to have some ability to split the config files. Eg in my setup the storage configuration for all the barn components is the same, while the other settings are mostly different (these are currently command line params).
The actual file format doesn't support features like including separate files. Having a simple feature in the barn config loading to read multiple config files and merge them is one reasonable approach.
Following the change to unconditionally call blob.Copy
in FindMissing, objects bigger than 5GB result in an error from AWS (#82). This is because they require a multipart upload and to update metadata on a multipart upload multiple calls have to be made:
The limitation is documented here: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html. Specifically:
You create a copy of your object up to 5 GB in size in a single atomic operation using this API. However, to copy an object greater than 5 GB, you must use the multipart upload Upload Part - Copy API.
This also means that objects > 5GB will require multiple calls on every pass through FindMissing (since the x-amz-copy-source-if-unmodified-since has to be passed to every UploadPartCopy call -- https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html).
Lastly, this needs to be fixed in gocloud, but currently, bb-storage will not work as intended for objects > 5GB.
The error will appear as follows from AWS:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /junk HTTP/1.1
Host: 1space-test.s3.amazonaws.com
User-Agent: aws-sdk-go/1.34.5 (go1.14.4; linux; amd64)
Content-Length: 0
Authorization: AWS4-HMAC-SHA256 Credential=<key>/20200902/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-unmodified-since;x-amz-date;x-amz-meta-used;x-amz-metadata-directive, Signature=5f21ff6a41f0aae38c1a562d7d43f195deb0d572755fbc8ab93151147309fee4
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Copy-Source: <object>
X-Amz-Copy-Source-If-Unmodified-Since: Wed, 02 Sep 2020 16:39:02 GMT
X-Amz-Date: 20200902T163902Z
X-Amz-Meta-Used: 2020-09-02 16:39:02.003101746 +0000 UTC
X-Amz-Metadata-Directive: REPLACE
Accept-Encoding: gzip
-----------------------------------------------------
2020/09/02 09:39:02 DEBUG: Response s3/CopyObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 400 Bad Request
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Wed, 02 Sep 2020 16:39:02 GMT
Server: AmazonS3
X-Amz-Id-2: uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=
X-Amz-Request-Id: E871C91CA74390BD
-----------------------------------------------------
2020/09/02 09:39:02 <Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>E871C91CA74390BD</RequestId><HostId>uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=</HostId></Error>
ret: blob (key "<key> -> <key>") (code=Unknown): InvalidRequest: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
status code: 400, request id: E871C91CA74390BD, host id: uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=
It would be really handy to have TTFB data. Because of the nature of bazel caching, it's expected that some requests will take multiple powers of magnitude more than others, so it's somewhat challenging to draw useful conclusions about the cache's performance from the timing/latency data in traces. Having TTFB would give a more reliable view into how long the cache is taking to begin doing its job.
I think it's a very useful option to have. AFAICS the jsonpb library doesn't have an option to accept comments in the JSON.
This is a placeholder for a feature request for a gRPC "GetStats" channel for getting the statistics for remote execution, remote bazel cache.
For remote cache lookup using Build Barn, one can use the --remote_cache_header
for a cache get/put/update. The "GetStats" could take the key used on the cache operation to get a statistic on how well the cache is performing (hit rate, hit rate %, etc.).
Bazel remote protocol now have compression option. That would be very useful for cases where the bazel and build-barn is in different clusters, and also saves storage usage.
It appears that all replicator types, other than the local
replicator, are incompatible with the ISCC and AC data storage. The deduplication / concurrencyLimiting replicators rely on use of FindMissingBlobs and thus fail at runtime when attempted to be used.
Related, it would be nice if the proto config guarded against use of incorrect decorators like these.
The unit test TestLocalDirectoryEnterSymlink fails on our Red Hat Enterprise Linux 7 system.
The test checks if the error code ENOTDIR is returned but on our system ELOOP is returned instead.
https://github.com/buildbarn/bb-storage/blob/master/pkg/filesystem/local_directory_test.go#L73
When reading about the issue we have found that ELOOP should be returned when too many symbolic links were encountered in the resolving pathname or when O_NOFOLLOW was specified but pathname was a symbolic link (see source).
When investigating the code we have found that O_NOFOLLOW is specified in the syscall and therefore we think that the correct behavior should be to return ELOOP not ENOTDIR since the test tries to open a symlink to the root folder ("/").
https://github.com/buildbarn/bb-storage/blob/master/pkg/filesystem/local_directory.go#L54
We are not sure if this is behavior is due to our specific environment or if it is to be seen as a bug in the test. For those of you who got this test to pass it would be really helpful if you could provide us with which OS you are running on.
Following discussions over on bazelbuild/remote-apis#165, it seems like the API has reason for using the Directory message over Tree. GetTree will provide a means for retrieving a tree from an API call. Whilst it would be relatively expensive to construct this tree on deep directory trees, caching these constructed Tree objects within the CAS should significantly reduce the required work for bb-storage.
I think the primary problem here is the additional maintenance burden on an additional API surface which currently only has one dependant (as far as I am aware).
From your statement #27 I would expect that a shell e.g. sh
is baked into the docker image published to Docker Hub. Because kubectl exec ...
is failing for me I extracted the image, but couldn't find any shell.
I assume this is because go_image
is called:
bb-storage/cmd/bb_storage/BUILD.bazel
Line 40 in 81407c8
From my understanding the pure flag ensure that an app-only image is built.
I currently work for Toyota Research Institute and we have been using bb-storage & bb-remote-execution (modified to our needs) for over a year and handles all bazel builds and tests which include GPU and CPU jobs. We run about 100k-200k tests per day and an additional 200k-300k build jobs per day (rough estimates) and about an infra failure every 2-3 weeks.
We use only s3 as our AC and CAS (yes we have thought about moving AC to a faster storage, but we found there's currently no need), sadly this backend was removed in this commit 0efcb2a . However, as of December 1, 2020 all of our original issues we had with S3 are gone due to AWS's S3 now being strong consistency (see: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/).
We do however still have two remaining issues:
We would be happy to post the changes we have made that make S3 work to our needs (and fixed the above issues).
This could be solved fairly simple by adding a limit on the number of concurrent outstanding requests to S3.
I am posting this to get a feel for the likely-hood of bring S3 store support back into bb-storage.
(as a side note, Toyota recently acquired Lyft's self-driving division (L5) and it turns out they also use bb-storage with S3 backend, which hints that there might be more teams out there that also uses S3 heavily).
Thank you!
When the underlying storage is full (due to a bad configuration), bb-storage throws the following errors back to worker.
"status":{"code":2, "message":"Failed to store cached action result: Shard 1: Backend A: no space left on device"}}
However, from prometheus, the storage returns "Unknown" grpc code. Same for frontend write
call (see the screenshot below).
The validation code for filenames is quite restrictive, including preventing access to parent directories via ..
. This can be seen here.
This issue is related to buildbarn/bb-remote-execution#18.
As discussed at FOSDEM, we could research the use of gRPC for the sharding of the bb-storage. The overview here provides some methods, I believe the specific setup we considered was a service providing the mapping which would have been the "lookaside", although haproxy might be suitable too.
Conversation in buildteam #buildbarn Slack Channel.
EdSchouten 7:59 PM
Not entirely. CircularBlobAccess also supports persistency across restarts
Chris Phang 8:00 PM
Ahh true, the digest map is inmemory
Chris Phang 8:06 PM
@EdSchouten to support persistency, what would be the blockers in having this map written to disk instead of in memory?
EdSchouten 8:09 PM
1) We’d need to add code to store the map on disk. That’s not too hard.
2) We’d need to add code to NewLocalBlobAccess() (or some helper function?) to reload the existing blocks from disk. Right now it’s only possible to create a LocalBlobAccess that assumes the initial block layout
8:10
Some readdir() code that parses a directory listing, figures out which blocks are present, opens them up, spreads them out across old/current/new/...
Chris Phang 8:16 PM
Makes sense, would you accept PRs that went in this direction? That would mean transient restarts of bb-storage with the Replicator decorator would have far fewer repairs to make. Plus if both replicas failed, then it wouldn't result in complete data loss?Plus I'm guessing that coupled with the directory backed listing would then allow for deprecation of CircularBlobAccess?
EdSchouten 8:17 PM
Sure!
EdSchouten 8:17 PM
Kicking out CircularBlobAccess would be nice
there is a TODO to use a container image instead installing bazel
// TODO: Switch back to l.gcr.io/google/bazel once updated
// container images get published once again.
could bazelisk be a suitable replacement?
Consider the following situation: an action gets cached. Its outputs are eventually pushed out of the CAS, so it has to be rerun. Everything is fine if the action is deterministic (exactly the outputs get regenerated) or if the AC entry gets updated with the new ActionResult, pointing to the new outputs.
However, when updating the key-location map, bb-storage only allows replacing objects with newer objects, as determined by Location.IsOlder
. It uses the block index, but this isn't reliable when writes are smeared across multiple new blocks. The old object may end up not getting updated, which would cause the action to be rerun every time.
Given the massive performance improvements in GCS, would it be interesting to revisit its deprecation as a storage backend?
Thanks!
When writing to a CloudBlobAccess, the entire blob gets loaded into memory thanks to a buffer.ToReader
call. In systems without a large amount of memory this can cause an OOM, especially in workloads which push a lot of large blobs (e.g. docker images).
We are exploring Buildbarn at this point and see that the default implementation is circular storage. We think REDIS might act as a better and stable alternative to disk space storage. Moreover, we can actually see what's in CAS and AC for a better understanding. So, we are using REDIS at this point on a Minikube cluster as a POC to explore.
Has anybody used REDIS with Buildbarn?
I'm currently trying to plug together SPIFFE (https://github.com/spiffe) and buildbarn in a deployment on GKE, to give me well-managed mTLS. My current approach is to use the spiffe-helper (https://github.com/spiffe/spiffe-helper) to keep pem files up to date on the in one container, with spiffe-helper just invoking 'sleep infinity', and then having the buildbarn container share a filesystem so that it can read the resultant certs. Note that SPIFFE rotates the leaf certificates very frequently - on the order of once per hour. This is pretty painful however, as buildbarn uses jsonnet to pipe creds through to the process, which means that a hot reload of the rotated certs is impossible without a full restart.
I'm thinking that the best option here would be for buildbarn to allow specification of a set of files for certs, and then just use grpc's hot reloading code as demonstrated here - https://github.com/grpc/grpc-go/blob/master/security/advancedtls/examples/credential_reloading_from_files/server/main.go
Does that seem right? Or am I missing something obvious?
Thank you!
The past few weeks I have investigated an issue with a Buildbarn/Goma upgrade to status early November, testing with our Chromium-based project.
The issue concerned error reports from the workers about "Failed to obtain" directories, usually directories that AFAIK had only subfolders, no files.
Initially, the problem seemed to be caused by Goma, since I was not able to reproduce in earlier versions, but it turned out that the error was still reported, but not treated as fatal until later (I have not been able to locate that modification, so far).
Eventually, I started testing changes to the cache size, and significantly increased the size of our keyLocationMapOnBlockDevice configuration from it's current 256K 4 2 items (2M), which was actually supposed to be able to handle two platforms, with two build profiles (22=4). 256K * 16 entries was the lowest cache size the test system for a single branch. 256K10 items did not work, at least in one to two tries.
My guess is that relatively frequently there will be collisions for files, and while source entries are checked before each command, the intermediate subdirectory-only directories, e.g. "out", can get kicked out (in fact, in one case it seems like the top directory was kicked out).
Chromium is a fairly big project, and the source tree totals more than 650K files, the out dir adds 250K more; of these approximately 180K are C/C++ files, and there are up to 60K ninja work items for Linux (some platforms have 90 K), so an estimate of 256K files in a build isn't that far off.
The KeyLocationMapInMemory.entries documentation, which I assume also works for the keyLocationMapOnBlockDevice.sizeBytes entry (with a factor of 64 bytes for each entry), says :
// Recommended value: between 2 and 10 times the expected number of
// objects stored.
My testing seems to indicate that this factor is too low, at least for a project on the size of Chromium; I haven't tried factors between 10 and 16, but I assume that, maybe, I could reduce it to 14, maybe 12, but don't see any reason to try. I will probably up the factor to 32 to be on the safe side for the production system.
I would suggest that the documentation is updated to indicate that even higher factors, such as 16 and 32 are needed for big projects. the keyLocationMapOnBlockDevice should also reference this, as well as the size of each entry.
I would also suggest that this sizing information, as well as all possible configuration file fields are documented either in the README.md file, or an associated file, with links to their definitions in the proto files.
Continuation of https://github.com/EdSchouten/bazel-buildbarn/issues/24
The label is present on buildbarn_blobstore_old_new_current_location_blob_map_last_removed_old_block_insertion_time_seconds
but not bazel.buildcache.buildbarn_blobstore_block_device_backed_block_allocator_allocations_total
, for example.
zero urgency, just filing so it's documented.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.