buildbarn / bb-storage Goto Github PK

View Code? Open in Web Editor NEW

132.0 17.0 88.0 3.25 MB

Storage daemon, capable of storing data for the Remote Execution protocol

License: Apache License 2.0

Go 93.02% Shell 0.01% Starlark 5.39% Jsonnet 0.35% HTML 0.17% Assembly 1.07%

bb-storage's Issues

pkg/grpc: Add an Authenticator for OAuth2/OIDC/JWTs

Buildbarn's story with regards to authentication is becoming a lot more complete:

Bazel added proper support for passing in Authentication: Bearer ${foo} headers in bazelbuild/bazel#10015 and bazelbuild/bazel#10634.
Buildbarn nowadays has a decent authentication layer in pkg/grpc.
4f5fdb6 added support for credential forwarding from gRPC servers to gRPC clients.

All we need to do now is add a proper implementation of Authenticator in pkg/grpc that uses OAuth2, OIDC, or just plain JWTs. What ever is the big trend nowadays. Maybe some of the code in #6 may be repurposed.

Add implicit zero-sized CAS blob

During the monthly https://github.com/bazelbuild/remote-apis meeting 2020-04-14, the following was decided:

We should standardize this. Servers should allow references to the empty blobs, and should support serving the empty blob even if it has not been uploaded. Clients should feel free to optimize within this to avoid uploading or downloading empty blobs.

Implementing this will sort out the Bazel behaviour mentioned in bazelbuild/bazel#11063.

Panic in local blockstore write

Within bb_worker, I sometimes experience a panic when writing to the local blockstore.

bb-storage@6bd6d5d

panic: runtime error: slice bounds out of range [:-53506]
goroutine 170339 [running]:
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*blockDeviceBackedBlockWriter).Write(0xc001ca96c0, {0xc002860000?, 0x10000, 0x10000?})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/local/block_device_backed_block_allocator.go:339 +0x745
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.intoWriterViaChunkReader({0x14b7998?, 0xc001ca9700}, {0x14ab820, 0xc001ca96c0})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/common_conversions.go:24 +0x135
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.(*casClonedBuffer).IntoWriter(0xc00231b650?, {0x14ab820, 0xc001ca96c0})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_cloned_buffer.go:91 +0x3c
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*blockDeviceBackedBlock).Put.func1({0x14ca098?, 0xc0001ce850?})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/local/block_device_backed_block_allocator.go:255 +0x4d
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).Put.func1({0x14ca098?, 0xc0001ce850?})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/local/old_current_new_location_blob_map.go:374 +0x47
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*flatBlobAccess).Put(0xc0007b1100, {0x411823d000000000?, 0xc002623fa8?}, {{0xc000cd0f50?, 0x0?}}, {0x14ca098, 0xc0001ce850})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/local/flat_blob_access.go:294 +0x11c
github.com/buildbarn/bb-storage/pkg/blobstore.(*metricsBlobAccess).Put(0xc0003e13f0, {0x14bbe18, 0xc000bc1f20}, {{0xc000cd0f50?, 0x0?}}, {0x14ca098?, 0xc0001ce850?})
	external/com_github_buildbarn_bb_storage/pkg/blobstore/metrics_blob_access.go:140 +0x132
github.com/buildbarn/bb-storage/pkg/blobstore/replication.(*localBlobReplicator).ReplicateSingle.func1()	external/com_github_buildbarn_bb_storage/pkg/blobstore/replication/local_blob_replicator.go:38 +0x3e
github.com/buildbarn/bb-storage/pkg/blobstore/buffer.newCASBufferWithBackgroundTask.func1()
	external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_buffer_with_background_task.go:42 +0x26
created by github.com/buildbarn/bb-storage/pkg/blobstore/buffer.newCASBufferWithBackgroundTask
	external/com_github_buildbarn_bb_storage/pkg/blobstore/buffer/cas_buffer_with_background_task.go:41 +0x1aa

I cannot yet reliably reproduce this, but it is guessed that this occurs when the syscall pwrite returns no-error but does a short-write, so return n < len(p), but err == nil

func (bd *memoryMappedBlockDevice) WriteAt(p []byte, off int64) (int, error) {
        // Let write actions go through the file descriptor. Doing so
        // yields better performance, as writes through a memory map
        // would trigger a page fault that causes data to be read.
        //
        // TODO: Maybe it makes sense to let unaligned writes that would
        // trigger reads anyway to go through the memory map?
        return unix.Pwrite(bd.fd, p, off)
}

Then, io.WriterAt requires that if n < len(p) that err != nil

This potentially could be solved by adding a loop to memoryMappedBlockDevice.WriteAt(), to repeatedly call it in case we see short writes.

Bazel InvocationID is not forwarded to bb-scheduler

When a remoteexecution.Executegrpc call is forwarded to the bb-scheduler the bazel invocation id gets lost.

The scheduler then creates only a single queue for all bazel invocations.

S3 incorrect bucket name isn't reported

Whilst configuring my bb-storage to use the S3 cloud blob storage I noticed that I was getting io.grpc.StatusRuntimeException: UNKNOWN: EOF in Bazel without any error reporting from Buildbarn.

This turned out to be because I had set my bucket parameter incorrectly inside of the s3 configuration.

The storage instance should produce some form of log to report this, which may or may not be propagated to Bazel. The likely source of this issue is at cloud_blob_access.go#L33 and likely resides within the other functions within this file. This get should definitely be reporting some form of issue, but perhaps gocloud.dev isn't providing any sort of error?

The code at create_blob_access.go#L193 suggests that when provided with this incorrect configuration, gocloud.dev isn't returning an err value.

Build without the bytes with circular cache

Using https://blog.bazel.build/2019/05/07/builds-without-bytes.html requires all action outputs to persist during a build. When the circular disk storage gets full, Bazel experiences sporadic failures when blobs are being evicted from the cache.

To solve this, one could rewrite the CAS blobs requested through Get() or FindMissing(), whenever they get close to being evicted. The threshold can be configured as a percentage threshold.

Action cache requests can also call FindMissing() to the CAS for all outputs. As this introduces latency, it should be a configurable option as some users might choose to avoid automatically evicting blobs from their storage.

Any comments or shall I start implementing?

Call gazelle instead of buildifier in CI pipeline

Also see: buildbarn/bb-remote-execution#37

To reduce duplication, we could also consider breaking this all out into a parent set of actions that the bb-* repositories can call upon. This would probably be a new repository in the buildbarn namespace.

Connect to GCS using Workload Identity in GKE.

An attempt to connect to the GCS bucket with enabled Workload Identity fails with the following log:

ERROR: /gcp/namespaces/vpn/certs-generator/BUILD.bazel:4:1 Executing genrule //build/linters/metalinter_lib:generated-metalinter failed (Exit 34). Note: Remote connection/protocol failed with: execution failed io.grpc.StatusRuntimeException: UNKNOWN: blob (key "ace0c0360d8d2c139fbcb5abfcf437743c458f1065d152ecbdf7b72bcbb21a54c5-142-remote-execution") (code=Unknown):
 Get "https://storage.googleapis.com/bazel-remote-execution-trusted.project-id.organisation-name/ace0c0360d8d2c139fbcb5abfcf437743c458f1065d152ecbdf7b72bcbb21a54c5-142-remote-execution": metadata: GCE metadata "instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.read_write" not defined

Am I doing something wrong or does bb-storage not support this connection method?

Storage Configuration:

{
  blobstore: {
    contentAddressableStorage: {
      cloud: {
        key_prefix: "cas",
        gcs: {
          bucket: "bazel-remote-execution-trusted.project-id.organisation-name",
        },
      },
    },
    actionCache: {
      cloud: {
        key_prefix: "ac",
        gcs: {
          bucket: "bazel-remote-execution-trusted.project-id.organisation-name",
        },
      },
    },
  },
  httpListenAddress: ':9980',
  grpcServers: [{
    listenAddresses: [':8981'],
    authenticationPolicy: { allow: {} },
  }],
  allowAcUpdatesForInstances: ['remote-execution'],
  maximumMessageSizeBytes: 16 * 1024 * 1024,
}

When switching to local storage, everything works correctly.

According to instructions, GCP service account successfully connected to pod:

I have no name!@storage-0:/$ gcloud auth list
                            Credentialed Accounts
ACTIVE  ACCOUNT
*       buildbarn-trusted-storage-service@project-id.iam.gserviceaccount.com

Doc: "This repository provides a copy of Buildbarn's storage daemon"

Documentation on main page states that this repository contains copy of storage daemon. Is this really a copy (if so then where is the original), or just documentation could be improved?

Failed to obtain input file

I'm seeking help actually.

I run the remote execution with docker-compose referring to the deployment repo:

It runs two shared storage. One of them down unexpected. It works well after I restart the storage service.

But I always get "Failed to obtain input file "external/xxx": Blob not found" error. And the error is gone after I re-run the "bazel build xxx" command.

I'm wondering whether there is any problem with one of the storage services. How can I figure out which the bad storage service is? How can I fix this problem? Clean all the disk storage of the bad one?

Use CI to update container images upon merging to master

An example of doing this is at https://github.com/actions/starter-workflows/blob/master/ci/docker-push.yml

Support highly available deployments

It would be very valuable to support either an active-active or even active-passive deployments to provide a HA deployment configuration option. From my reading of the other blobstore options, I do not see a configuration set that would provide HA characteristics, especially for a local backend.

bb-storage/pkg/proto/configuration/blobstore/blobstore.proto

Lines 53 to 60 in a575837

 // Store blobs in two backends. Blobs present in exactly one backend 

 // are automatically replicated to the other backend. 

 // 

 // This backend does not guarantee high availability, as it does not 

 // function in case one backend is unavailable. Crashed backends 

 // need to be replaced with functional empty instances. These will 

 // be refilled automatically. 

 MirroredBlobAccessConfiguration mirrored = 14;

It appears that the mirrored blobstore configuration does not provide highly available deployments, although its not clear to me why. This would seem like the most likely candidate for HA and would provide active-active configuration.

My goal is to be able to upgrade storage nodes with zero downtime. Without backing from an external store (e.g. S3), I am not seeing an option that would enable zero downtime storage deployments.

Do you have alternative suggestions?

Documentation needed for allowing cache access from different types of bazel clients

As discussed during office hours, it would be great if in addition to the code comments in .proto files you could provide documentation on how to use the available configuration parameters in examples. The specific example I'm looking for is for demultiplexing to allow for different remote_instance_names.

The use case I'm trying to solve for is allowing remote execution from 2 different type of bazel clients. CI clients coming from Jenkins and user clients coming from individual dev VMs.

One option would be to use 2 different instance_names so their caches are stored separately. Ed mentioned a better option. which would be to allow ReadWrite access to the cache by the CI clients and ReadOnly access by the user clients. Documentation for this 2nd option would also be extremely helpful.

Using lots of memory seems to slow bb-storage down to a crawl

After our Buildbarn (combined with Goma) update in November/December, which changed the cache system to the new "local" system we have had to restart the cache backend several times a week, frequently at least once a day, because the system slows down as memory usage grows.

Until this week, the backend was installed on an older server, with 256 GB RAM, working on a 1.6 TB cache on SATA SSDs in a RAID

This week, the backend (which was also updated to a Jun 12 state) was moved to a modern machine, also with 256 GB RAM, and much more diskspace on NVMe drives, although only ~400 GB is currently used.

What I am observing is that as the memory usage of the bb-storage process passes ~70% of RAM (~180 GB), builds slow down significantly (today I restarted at 75% because a colleague reported slowness when building). RAM usage seems to top out just over ~80% of RAM (~205 GB).

Based on classification of RAM usage, e.g. by top, the memory usage seems to overlap with the memory allocated to file-caching, which also matches my impression that "local" uses memory mapping of files.

My current policy is to restart the backend if I notice bb-storage memory usage start inching towards 60%.

automaxprocs?

Would it be possible to include https://github.com/uber-go/automaxprocs with all BuildBarn services? Running BuildBarn in Kubernetes (which uses cgroups) does not correctly use all available resources.

docker image does not include TLS certificate files required for s3 over TLS to work.

GetCapabilities() behavior with mixed action cache permissions doesn't behave as desired

In reference to: https://buildteamworld.slack.com/archives/CD6HZC750/p1645457888693499

Based on the discussion,this code should also consider the acGetAuthorizer permissions.

Additional context in case slack messages disappear:

repro authorizers:

  actionCacheAuthorizers: {
    get: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
    put: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed"] }},
  },
  contentAddressableStorageAuthorizers: {
    get: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
    put: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
    findMissing: { instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},
  },
  executeAuthorizer:{ instanceNamePrefix: {allowedInstanceNamePrefixes: ["allowed", "not-allowed"] }},

If a client tries to do simple remote caching, (without execution) against the not-allowed instance prefix, they will receive a permission denied and their build will fail, but the expected behavior is that they are allowed to read from the action cache, but not write to it.

How we do shell access to any pods?

We have recently started exploring Buildbarn. Sorry for silly questions but can we do shell access to the pods for an insight of especially the storage (what's the data in CAS/AC in case of a circular storage?). I get the fact that logging is only enabled for failure as per best practices and to avoid unnecessary logging.. Thanks

bazel hitting "This service does not support downloading partial files"

I've been hitting this restriction occasionally, I think correlated with large/wide builds. Bazel 0.26.0. The buildbarn error is reported here https://github.com/buildbarn/bb-storage/blob/master/pkg/cas/byte_stream_server.go#L63

I don't know under what circumstances bazel would request a part of a file, but in my experience it's infrequent but not never.

Bazel error message:

(10:11:45) ERROR: …/BUILD.bazel:1:1: Extracting interface … failed (Exit 34). Note: Remote connection/protocol failed with: execution failed io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files: java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files
        at com.google.devtools.build.lib.remote.GrpcRemoteCache.lambda$downloadBlob$4(GrpcRemoteCache.java:318)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:175)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:162)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:107)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:866)
        at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:752)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:185)
        at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:162)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:116)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:866)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:711)
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:54)
        at com.google.devtools.build.lib.remote.GrpcRemoteCache$2.onError(GrpcRemoteCache.java:358)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:398)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: UNIMPLEMENTED: This service does not support downloading partial files
        at io.grpc.Status.asRuntimeException(Status.java:532)
... 19 more

Exercise the JWT auth. caches expiry code

A slight downside is that we currently can't easily simulate cases where a token first becomes cached and then expires. If we wanted to do things a hundred percent tidy (and I'd leave it up to you whether you'd be up for that), we could remove all clock access from the Buildbarn codebase (is there any apart from the JWT handling?) by this:

https://github.com/benbjohnson/clock

We could let auth.NewJWTAuthCache() have a clock parameter.

But then the next problem becomes: jwt-go only has some very rudimentary code to override the clock. It can only be replaced globally, as opposed to per parser. Here's a local patch we could add/try to upstream.

jwt-go.diff.gz

Thoughts?

Originally posted by @EdSchouten in #6

Dealing with eventual consistency issues

We have a setup that looks something like:

Is less than 4mb:
  -> Use Redis
Else:
  -> Use S3

We are also using remote execution nodes. We are trying to keep our points of failure low, so our CAS servers only serve clients and our workers directly talk to S3 and redis. This allows us to scale every easily with minimal points of failure.

We are also trying to get our clients setup (in bazel) with --remote_download_toplevel and --remote_download_minimal, however we have found that S3 is caching our 404 (cache misses). This is especially problematic for us because in what is happening is:

Client -> CAS -> S3     -> Do you have ObjectA?
S3     -> CAS -> Client -> No (404)
Client -> Worker        -> Go build ObjectA
Worker -> S3            -> Put ObjectA
Worker -> Client        -> Done (client does not download ObjectA)
Client -> Worker        -> Go build ObjectB that uses ObjectA
Worker -> S3            -> Do you have ObjectA?
S3     -> Worker        -> No (404) (cached result).
Worker -> Client        -> Give me ObjectA
Client -> .... Failure because it expected ObjectA to exist.

I would really prefer not to put another point of failure between S3 and the workers/cas instances. The CAS instances are really light weight in our setup and only used to forward data from S3 and redis, but not cache stuff locally (yet). If put a CAS node in front of S3 we'll add a point of failure + the maintenance of them, plus we'd have to figure out a way to synchronize their consistency.

What I think would be the best solution is to extend the way ExistenceCachingBlobAccessConfiguration works to have an additional optional BlobAccessConfiguration which can be used for the key lookup and the value lookups always go to backend. In our case we'd use redis as our existence database and S3 as our blob store. We'd also need to add a salt of some kind to prevent this "cache" entry from being a positive hit but no data associated with it when sharing the blobstore with the real data.

We are happy to implement this, but figured I'd query here before we go off and do it.

"circular file" means trash can

The most common meaning of "circular file" is a trash can. This is even what you get when you search "circular file computing" or "circular file storage" on Google. It would be bad if the file-backed storage was actually deleting its contents. Different terminology might be better.

Add support for BatchUpdateBlobs

When trying to do a simple test of recc + buildgrid + bb-storage, all uploads fail due to recc making use of the BatchUpdateBlobs endpoint for collections of blobs under a certain size. There doesn't seem to be a way specified in the API to tell clients to use bytestream.Write only^[1], so it would be great if it was supported.

[1] I suppose the Capabilities API could report a very small non-zero number for max_batch_total_size_bytes, but having support for BatchUpdateBlobs would be better IMO.

BB doesn't output logs

I'm not seeing much in the way of logging in the bb-storage codebase, and my bb-storage instances aren't outputting any logs at all. Is this intentional?

Happy to open a PR that adds additional logging (perhaps hidden behind a flag). We're attempting to add Bazel remote execution to thought-machine/please, hence the desire to see what buildbarn is doing.

gRPC Client Certificate Refresh Interval is not respected

As reported in buildbarn/bb-clientd#7

#149 added support for using client and server certificates from files on disk, for both gRPC and HTTP connections. It also added support for refreshing these certificates on an interval.

Specifically For gRPC Client certificates, the rotation is not working as intended. The original certificate is used for the duration of the process.

This is due to the same issue seen here, grpc/grpc-go#5791, where tls.Config.GetClientCertificate is not respected within grpc-go. The solution is documented to use advancedtls.NewServerCreds instead of credentials.NewTLS.

Currently, we cannot easily switch to this alternate API, since it drops support for configuring MinTLS Version as well as TLS CipherSuites and thus would be a breaking change. There is an open ticket grpc/grpc-go#5667 for adding support for configuring CipherSuites with advancedtls, but until then, grpc client certificates will not properly rotate.

Option to load and merge several configuration files

It would be useful to have some ability to split the config files. Eg in my setup the storage configuration for all the barn components is the same, while the other settings are mostly different (these are currently command line params).
The actual file format doesn't support features like including separate files. Having a simple feature in the barn config loading to read multiple config files and merge them is one reasonable approach.

Failed to update object metadata with AWS S3

Following the change to unconditionally call blob.Copy in FindMissing, objects bigger than 5GB result in an error from AWS (#82). This is because they require a multipart upload and to update metadata on a multipart upload multiple calls have to be made:

initiate multipart upload
upload part copy with the same source
complete multipart upload with the new metadata

The limitation is documented here: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html. Specifically:

You create a copy of your object up to 5 GB in size in a single atomic operation using this API. However, to copy an object greater than 5 GB, you must use the multipart upload Upload Part - Copy API.

This also means that objects > 5GB will require multiple calls on every pass through FindMissing (since the x-amz-copy-source-if-unmodified-since has to be passed to every UploadPartCopy call -- https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html).

Lastly, this needs to be fixed in gocloud, but currently, bb-storage will not work as intended for objects > 5GB.

The error will appear as follows from AWS:

---[ REQUEST POST-SIGN ]-----------------------------
PUT /junk HTTP/1.1
Host: 1space-test.s3.amazonaws.com
User-Agent: aws-sdk-go/1.34.5 (go1.14.4; linux; amd64)
Content-Length: 0
Authorization: AWS4-HMAC-SHA256 Credential=<key>/20200902/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-copy-source;x-amz-copy-source-if-unmodified-since;x-amz-date;x-amz-meta-used;x-amz-metadata-directive, Signature=5f21ff6a41f0aae38c1a562d7d43f195deb0d572755fbc8ab93151147309fee4
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Copy-Source: <object>
X-Amz-Copy-Source-If-Unmodified-Since: Wed, 02 Sep 2020 16:39:02 GMT
X-Amz-Date: 20200902T163902Z
X-Amz-Meta-Used: 2020-09-02 16:39:02.003101746 +0000 UTC
X-Amz-Metadata-Directive: REPLACE
Accept-Encoding: gzip


-----------------------------------------------------
2020/09/02 09:39:02 DEBUG: Response s3/CopyObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 400 Bad Request
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Wed, 02 Sep 2020 16:39:02 GMT
Server: AmazonS3
X-Amz-Id-2: uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=
X-Amz-Request-Id: E871C91CA74390BD


-----------------------------------------------------
2020/09/02 09:39:02 <Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>E871C91CA74390BD</RequestId><HostId>uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=</HostId></Error>
ret: blob (key "<key> -> <key>") (code=Unknown): InvalidRequest: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
        status code: 400, request id: E871C91CA74390BD, host id: uYk9/YTJ16he14eMk1DslODt1TrksjLk/1burYQ+gMo9gOZgeSCXhvUiT3qv+5Bx2Y1/vvfSU1Q=

Feature request: introduce time to first byte metrics

It would be really handy to have TTFB data. Because of the nature of bazel caching, it's expected that some requests will take multiple powers of magnitude more than others, so it's somewhat challenging to draw useful conclusions about the cache's performance from the timing/latency data in traces. Having TTFB would give a more reliable view into how long the cache is taking to begin doing its job.

this was discussed a bit during office hours, and @EdSchouten mentioned https://github.com/buildbarn/bb-storage/tree/master/pkg/blobstore/buffer as a possible place to insert this telemetry.

Configuration files don't permit comments (JSON)

I think it's a very useful option to have. AFAICS the jsonpb library doesn't have an option to accept comments in the JSON.

Feature to expose BB metrics based user query key

This is a placeholder for a feature request for a gRPC "GetStats" channel for getting the statistics for remote execution, remote bazel cache.

For remote cache lookup using Build Barn, one can use the --remote_cache_header for a cache get/put/update. The "GetStats" could take the key used on the cache operation to get a statistic on how well the cache is performing (hit rate, hit rate %, etc.).

Support for compression from buildbarn

Bazel remote protocol now have compression option. That would be very useful for cases where the bazel and build-barn is in different clusters, and also saves storage usage.

Doc Update: ISCC / AC storages are only compatible with local replicator

It appears that all replicator types, other than the local replicator, are incompatible with the ISCC and AC data storage. The deduplication / concurrencyLimiting replicators rely on use of FindMissingBlobs and thus fail at runtime when attempted to be used.

Related, it would be nice if the proto config guarded against use of incorrect decorators like these.

Unit test TestLocalDirectoryEnterSymlink fails on RHEL7, ELOOP returned instead of ENOTDIR

The unit test TestLocalDirectoryEnterSymlink fails on our Red Hat Enterprise Linux 7 system.
The test checks if the error code ENOTDIR is returned but on our system ELOOP is returned instead.
https://github.com/buildbarn/bb-storage/blob/master/pkg/filesystem/local_directory_test.go#L73

When reading about the issue we have found that ELOOP should be returned when too many symbolic links were encountered in the resolving pathname or when O_NOFOLLOW was specified but pathname was a symbolic link (see source).

When investigating the code we have found that O_NOFOLLOW is specified in the syscall and therefore we think that the correct behavior should be to return ELOOP not ENOTDIR since the test tries to open a symlink to the root folder ("/").
https://github.com/buildbarn/bb-storage/blob/master/pkg/filesystem/local_directory.go#L54

We are not sure if this is behavior is due to our specific environment or if it is to be seen as a bug in the test. For those of you who got this test to pass it would be really helpful if you could provide us with which OS you are running on.

Implement GetTree

Following discussions over on bazelbuild/remote-apis#165, it seems like the API has reason for using the Directory message over Tree. GetTree will provide a means for retrieving a tree from an API call. Whilst it would be relatively expensive to construct this tree on deep directory trees, caching these constructed Tree objects within the CAS should significantly reduce the required work for bb-storage.

I think the primary problem here is the additional maintenance burden on an additional API surface which currently only has one dependant (as far as I am aware).

Missing shell in bb-storage docker image

From your statement #27 I would expect that a shell e.g. sh is baked into the docker image published to Docker Hub. Because kubectl exec ... is failing for me I extracted the image, but couldn't find any shell.

I assume this is because go_image is called:

bb-storage/cmd/bb_storage/BUILD.bazel

Line 40 in 81407c8

pure = "on",

From my understanding the pure flag ensure that an app-only image is built.

Bring back S3 support

I currently work for Toyota Research Institute and we have been using bb-storage & bb-remote-execution (modified to our needs) for over a year and handles all bazel builds and tests which include GPU and CPU jobs. We run about 100k-200k tests per day and an additional 200k-300k build jobs per day (rough estimates) and about an infra failure every 2-3 weeks.

We use only s3 as our AC and CAS (yes we have thought about moving AC to a faster storage, but we found there's currently no need), sadly this backend was removed in this commit 0efcb2a . However, as of December 1, 2020 all of our original issues we had with S3 are gone due to AWS's S3 now being strong consistency (see: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/).

We do however still have two remaining issues:

Too many connections opened at once (in our case ~2-3k per build), but hacked a fix in.
touchBlob() never worked, but we fixed it.

We would be happy to post the changes we have made that make S3 work to our needs (and fixed the above issues).

This could be solved fairly simple by adding a limit on the number of concurrent outstanding requests to S3.

I am posting this to get a feel for the likely-hood of bring S3 store support back into bb-storage.

(as a side note, Toyota recently acquired Lyft's self-driving division (L5) and it turns out they also use bb-storage with S3 backend, which hints that there might be more teams out there that also uses S3 heavily).

Thank you!

Filesystem errors in bb-storage are recorded as "Unknown" in prometheus

When the underlying storage is full (due to a bad configuration), bb-storage throws the following errors back to worker.

 "status":{"code":2, "message":"Failed to store cached action result: Shard 1: Backend A: no space left on device"}}

However, from prometheus, the storage returns "Unknown" grpc code. Same for frontend write call (see the screenshot below).

automaxprocs?

bb-storage LocalDirectory cannot navigate to parent directory

The validation code for filenames is quite restrictive, including preventing access to parent directories via ... This can be seen here.

This issue is related to buildbarn/bb-remote-execution#18.

Use gRPC for sharding

As discussed at FOSDEM, we could research the use of gRPC for the sharding of the bb-storage. The overview here provides some methods, I believe the specific setup we considered was a service providing the mapping which would have been the "lookaside", although haproxy might be suitable too.

Allow for persistency in LocalBlobAccess across restarts

Conversation in buildteam #buildbarn Slack Channel.

EdSchouten  7:59 PM
Not entirely. CircularBlobAccess also supports persistency across restarts
Chris Phang  8:00 PM
Ahh true, the digest map is inmemory
Chris Phang  8:06 PM
@EdSchouten to support persistency, what would be the blockers in having this map written to disk instead of in memory?
EdSchouten  8:09 PM
1) We’d need to add code to store the map on disk. That’s not too hard.
2) We’d need to add code to NewLocalBlobAccess() (or some helper function?) to reload the existing blocks from disk. Right now it’s only possible to create a LocalBlobAccess that assumes the initial block layout
8:10
Some readdir() code that parses a directory listing, figures out which blocks are present, opens them up, spreads them out across old/current/new/...
Chris Phang  8:16 PM
Makes sense, would you accept PRs that went in this direction? That would mean transient restarts of bb-storage with the Replicator decorator would have far fewer repairs to make. Plus if both replicas failed, then it wouldn't result in complete data loss?Plus I'm guessing that coupled with the directory backed listing would then allow for deprecation of CircularBlobAccess?
EdSchouten  8:17 PM
Sure!
EdSchouten  8:17 PM
Kicking out CircularBlobAccess would be nice

install bazel with bazelisk?

there is a TODO to use a container image instead installing bazel

// TODO: Switch back to l.gcr.io/google/bazel once updated
// container images get published once again.

could bazelisk be a suitable replacement?

AC isn't robust to non-deterministic actions if newBlocks > 1

Consider the following situation: an action gets cached. Its outputs are eventually pushed out of the CAS, so it has to be rerun. Everything is fine if the action is deterministic (exactly the outputs get regenerated) or if the AC entry gets updated with the new ActionResult, pointing to the new outputs.

However, when updating the key-location map, bb-storage only allows replacing objects with newer objects, as determined by Location.IsOlder. It uses the block index, but this isn't reliable when writes are smeared across multiple new blocks. The old object may end up not getting updated, which would cause the action to be rerun every time.

Revisit deprecation of GCS storage backend

Given the massive performance improvements in GCS, would it be interesting to revisit its deprecation as a storage backend?

https://cloud.google.com/blog/products/storage-data-transfer/new-gcloud-storage-cli-for-your-data-transfers

Thanks!

CloudBlobAccess can cause OOM with big artifacts

When writing to a CloudBlobAccess, the entire blob gets loaded into memory thanks to a buffer.ToReader call. In systems without a large amount of memory this can cause an OOM, especially in workloads which push a lot of large blobs (e.g. docker images).

REDIS as the backend storage

We are exploring Buildbarn at this point and see that the default implementation is circular storage. We think REDIS might act as a better and stable alternative to disk space storage. Moreover, we can actually see what's in CAS and AC for a better understanding. So, we are using REDIS at this point on a Minikube cluster as a POC to explore.

Has anybody used REDIS with Buildbarn?

Allow us to point all the grpc servers at files to pick up credentials

I'm currently trying to plug together SPIFFE (https://github.com/spiffe) and buildbarn in a deployment on GKE, to give me well-managed mTLS. My current approach is to use the spiffe-helper (https://github.com/spiffe/spiffe-helper) to keep pem files up to date on the in one container, with spiffe-helper just invoking 'sleep infinity', and then having the buildbarn container share a filesystem so that it can read the resultant certs. Note that SPIFFE rotates the leaf certificates very frequently - on the order of once per hour. This is pretty painful however, as buildbarn uses jsonnet to pipe creds through to the process, which means that a hot reload of the rotated certs is impossible without a full restart.

I'm thinking that the best option here would be for buildbarn to allow specification of a set of files for certs, and then just use grpc's hot reloading code as demonstrated here - https://github.com/grpc/grpc-go/blob/master/security/advancedtls/examples/credential_reloading_from_files/server/main.go

Does that seem right? Or am I missing something obvious?

Thank you!

KeyLocationMapInMemory size recommendation may be too small

The past few weeks I have investigated an issue with a Buildbarn/Goma upgrade to status early November, testing with our Chromium-based project.

The issue concerned error reports from the workers about "Failed to obtain" directories, usually directories that AFAIK had only subfolders, no files.

Initially, the problem seemed to be caused by Goma, since I was not able to reproduce in earlier versions, but it turned out that the error was still reported, but not treated as fatal until later (I have not been able to locate that modification, so far).

Eventually, I started testing changes to the cache size, and significantly increased the size of our keyLocationMapOnBlockDevice configuration from it's current 256K 4 2 items (2M), which was actually supposed to be able to handle two platforms, with two build profiles (22=4). 256K * 16 entries was the lowest cache size the test system for a single branch. 256K10 items did not work, at least in one to two tries.

My guess is that relatively frequently there will be collisions for files, and while source entries are checked before each command, the intermediate subdirectory-only directories, e.g. "out", can get kicked out (in fact, in one case it seems like the top directory was kicked out).

Chromium is a fairly big project, and the source tree totals more than 650K files, the out dir adds 250K more; of these approximately 180K are C/C++ files, and there are up to 60K ninja work items for Linux (some platforms have 90 K), so an estimate of 256K files in a build isn't that far off.

The KeyLocationMapInMemory.entries documentation, which I assume also works for the keyLocationMapOnBlockDevice.sizeBytes entry (with a factor of 64 bytes for each entry), says :

    // Recommended value: between 2 and 10 times the expected number of
    // objects stored.

My testing seems to indicate that this factor is too low, at least for a project on the size of Chromium; I haven't tried factors between 10 and 16, but I assume that, maybe, I could reduce it to 14, maybe 12, but don't see any reason to try. I will probably up the factor to 32 to be on the safe side for the production system.

I would suggest that the documentation is updated to indicate that even higher factors, such as 16 and 32 are needed for big projects. the keyLocationMapOnBlockDevice should also reference this, as well as the size of each entry.

I would also suggest that this sizing information, as well as all possible configuration file fields are documented either in the README.md file, or an associated file, with links to their definitions in the proto files.

Authentication Support

Continuation of https://github.com/EdSchouten/bazel-buildbarn/issues/24

Missing `storage_type` label on some local blobstore metrics

The label is present on buildbarn_blobstore_old_new_current_location_blob_map_last_removed_old_block_insertion_time_seconds but not bazel.buildcache.buildbarn_blobstore_block_device_backed_block_allocator_allocations_total, for example.

zero urgency, just filing so it's documented.

	// Store blobs in two backends. Blobs present in exactly one backend
	// are automatically replicated to the other backend.
	//
	// This backend does not guarantee high availability, as it does not
	// function in case one backend is unavailable. Crashed backends
	// need to be replaced with functional empty instances. These will
	// be refilled automatically.
	MirroredBlobAccessConfiguration mirrored = 14;

buildbarn / bb-storage Goto Github PK

bb-storage's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs