qinusty / bb-asset-hub Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 155 KB

A remote asset compatible server supporting REv2 compatible GRPC caches

License: Apache License 2.0

Starlark 50.11% Go 45.89% Shell 0.19% Jsonnet 3.81%

build buildbarn remote-asset remote-execution

bb-asset-hub's People

Contributors

Watchers

Forkers

othko97

bb-asset-hub's Issues

Add prometheus metrics for fetch/pull requests

In order to provide relevant information to buildbarn users, we should expose metrics via prometheus to ensure that usage data is available in a similar manner to other buildstream projects.

Add CompletenessChecking for Fetch/Push services

Currently we naïvely assume that the blobs referenced in requests/responses exist in CAS without checking. We should add something analogous to the CompletenessCheckingBlobAccess to ensure that blobs we reference actually exist.

One thing to be careful of while implementing this is that PushBlob and PushDirectory requests contain an additional pair of digest lists which we will also need to ensure exist. Given this, we'll need to add an extra field (or two, if we want to distinguish between Blob/Directory) to the Asset proto to store these digests so we can check at push and fetch time.

Address scalability concerns

There is an obvious need to address scalability concerns towards a production use case, this issue should be used as a tracking issue towards reaching a production ready deployment configuration for the asset hub daemon.

Initial Thoughts

I'm going to split these out as there are a range of ideas towards providing a means of ensuring scalability which each have their own complexities and advantages.

GRPC Fetcher implementation

The idea here is to handle asset storage (just the references to digests), in a frontend style entrypoint which will distribute requests for fetching across many deployed instances of bb-asset-hub for fetch operations, returning results for caching and placing fetched blobs into the CAS.

Positives:

Independent of remote execution infrastructure
Fetchers can be brought up and down as load demands

Negatives:

Requires server implementation on bb-asset-hub daemon to accept connections from other daemons (similar to workers connecting to scheduler)
Single point of failure at the entrypoint

Remote Asset Store

Since we're unable to make use of GRPC as a backend for the AssetStore without a custom blob access implementation we're restricted to blob accesses made available through newNestedBlobAccess(). However, redis/s3 would suffice here as a means of removing our single point of failure and moving our only state elsewhere.

Advantages:

Stateless, scale as much as we want
Independent of remote execution infrastructure

Disadvantages:

Relies on alternative means of storage (no Local or Circular) due to lack of GRPC backend support. (This could be added)
Requires external load balancer to distribute requests to asset-hub (traefik/nginx etc)

Other Ideas

Other suggestions are welcome, we should track discussions / developments against this issue with the goal of producing an example deployment for a stable production environment. As things stand, I think configuring AssetStore to use a redis instance for the backend and scaling the asset-hub daemon will provide a scalable solution to integrate with an existing bb-storage deployment.

Implement asset reference storage

To ensure that we can persist the references which link the uri/qualifiers to the CAS digests, we should implement a persistent storage mechanism for this likely taking advantage of bb-storage BlobAccess similarly to the action cache.

Send BadRequest for unsupported qualifiers

Relevant part of the spec.

  // In the case of unsupported qualifiers, the server *SHOULD* additionally
  // send a [BadRequest][google.rpc.BadRequest] error detail where, for each
  // unsupported qualifier, there is a `FieldViolation` with a `field` of
  // `qualifiers.name` and a `description` of `"{qualifier}" not supported`
  // indicating the name of the unsupported qualifier.

Allow Fetch to match on a subset of qualifiers from a Push'd asset

Currently we demand that a Fetch request match all the qualifiers for a previously Pushed blob to be found, however this doesn't line up with how BuildStream plans on using the Remote Asset API. In https://gitlab.com/BuildStream/buildstream/-/issues/1274 it's suggested that BuildStream will use a different set of Qualifiers for Push and Fetch requests - fetch qualifiers being a subset of those pushed. As a result this will mean that Push'd sources will never be Fetch'd by BuildStream.

Our implementation currently uses a list of all qualifiers as part of the inputs to a hash, which is in turn used as the key for a BlobAccess. As a result, matching on just a subset of qualifiers is a non-trivial change. I'll outline a few possibilities to square this circle that have come to mind:

1. Allow a fetcher to set which qualifiers it deems important enough to hash.

As BuildStream presumably (from the discussion linked) plan to have the qualifiers which it Fetches with well-defined on a per-source type basis, we may be able to add some API to specify which qualifiers a given fetcher finds important. Given we'll likely need a whole bunch of different fetchers adapted to specific fetching cases, this isn't so bad, although it does potentially focus on this use case to the detriment of generality.

We could expose this as configuration for server operators to maintain, at the risk of ballooning configuration.

2. Hash only the URI, and mitigate against the collisions

This way we retrieve assets based only on the URI, and add additional logic in order to handle matching the qualifiers. This will keep things general, and mean that we match qualifiers consistently in all cases. However, it will clearly cause a performance decrease.

To mitigate collisions, I see two possibilities: use a form of cuckoo hashing or modify the AssetStore to store a list of Assets corresponding to the URI, along with their qualifiers. Cuckoo hashing has the benefit of allowing us to expire references automatically as part of the Put, but is more complex and means more I/O, as we have to read from the underlying blob store more. Making the AssetStore take a list of Assets will allow us to load only once, but may cause the entries to increase a lot in size, and will also mean that we have to modify the content stored in the blobstore under a single digest.

I think of these possibilities I'm leaning towards extending the AssetStore to store a list of Assets.

One nit we might want to address is tidying up the go imports, I think the other bb-* repos split local imports into a separate block.

Agreed. We can address this in a follow up PR.

Originally posted by @Qinusty in #6 (comment)

Implement push service

We should support the push service for both Blobs and Directories.

Implement basic HTTP fetch service

We should support the fetch service for both Blobs and Directories.