qinusty / bb-asset-hub Goto Github PK
View Code? Open in Web Editor NEWA remote asset compatible server supporting REv2 compatible GRPC caches
License: Apache License 2.0
A remote asset compatible server supporting REv2 compatible GRPC caches
License: Apache License 2.0
In order to provide relevant information to buildbarn users, we should expose metrics via prometheus to ensure that usage data is available in a similar manner to other buildstream projects.
Currently we naïvely assume that the blobs referenced in requests/responses exist in CAS without checking. We should add something analogous to the CompletenessCheckingBlobAccess
to ensure that blobs we reference actually exist.
One thing to be careful of while implementing this is that PushBlob
and PushDirectory
requests contain an additional pair of digest lists which we will also need to ensure exist. Given this, we'll need to add an extra field (or two, if we want to distinguish between Blob/Directory) to the Asset
proto to store these digests so we can check at push and fetch time.
There is an obvious need to address scalability concerns towards a production use case, this issue should be used as a tracking issue towards reaching a production ready deployment configuration for the asset hub daemon.
I'm going to split these out as there are a range of ideas towards providing a means of ensuring scalability which each have their own complexities and advantages.
The idea here is to handle asset storage (just the references to digests), in a frontend style entrypoint which will distribute requests for fetching across many deployed instances of bb-asset-hub for fetch operations, returning results for caching and placing fetched blobs into the CAS.
Positives:
Negatives:
Since we're unable to make use of GRPC as a backend for the AssetStore without a custom blob access implementation we're restricted to blob accesses made available through newNestedBlobAccess(). However, redis/s3 would suffice here as a means of removing our single point of failure and moving our only state elsewhere.
Advantages:
Disadvantages:
Other suggestions are welcome, we should track discussions / developments against this issue with the goal of producing an example deployment for a stable production environment. As things stand, I think configuring AssetStore to use a redis instance for the backend and scaling the asset-hub daemon will provide a scalable solution to integrate with an existing bb-storage deployment.
To ensure that we can persist the references which link the uri/qualifiers to the CAS digests, we should implement a persistent storage mechanism for this likely taking advantage of bb-storage BlobAccess similarly to the action cache.
// In the case of unsupported qualifiers, the server *SHOULD* additionally
// send a [BadRequest][google.rpc.BadRequest] error detail where, for each
// unsupported qualifier, there is a `FieldViolation` with a `field` of
// `qualifiers.name` and a `description` of `"{qualifier}" not supported`
// indicating the name of the unsupported qualifier.
Currently we demand that a Fetch
request match all the qualifiers for a previously Push
ed blob to be found, however this doesn't line up with how BuildStream plans on using the Remote Asset API. In https://gitlab.com/BuildStream/buildstream/-/issues/1274 it's suggested that BuildStream will use a different set of Qualifiers for Push and Fetch requests - fetch qualifiers being a subset of those pushed. As a result this will mean that Push'd sources will never be Fetch'd by BuildStream.
Our implementation currently uses a list of all qualifiers as part of the inputs to a hash, which is in turn used as the key for a BlobAccess. As a result, matching on just a subset of qualifiers is a non-trivial change. I'll outline a few possibilities to square this circle that have come to mind:
As BuildStream presumably (from the discussion linked) plan to have the qualifiers which it Fetches with well-defined on a per-source type basis, we may be able to add some API to specify which qualifiers a given fetcher finds important. Given we'll likely need a whole bunch of different fetchers adapted to specific fetching cases, this isn't so bad, although it does potentially focus on this use case to the detriment of generality.
We could expose this as configuration for server operators to maintain, at the risk of ballooning configuration.
This way we retrieve assets based only on the URI, and add additional logic in order to handle matching the qualifiers. This will keep things general, and mean that we match qualifiers consistently in all cases. However, it will clearly cause a performance decrease.
To mitigate collisions, I see two possibilities: use a form of cuckoo hashing or modify the AssetStore
to store a list of Asset
s corresponding to the URI, along with their qualifiers. Cuckoo hashing has the benefit of allowing us to expire references automatically as part of the Put
, but is more complex and means more I/O, as we have to read from the underlying blob store more. Making the AssetStore
take a list of Asset
s will allow us to load only once, but may cause the entries to increase a lot in size, and will also mean that we have to modify the content stored in the blobstore under a single digest.
I think of these possibilities I'm leaning towards extending the AssetStore to store a list of Assets.
Buildstream intends to introduce functionality which will expect the remote asset server to fetch a pushed blob with a subset of the pushed qualifiers.
This is currently not supported by bb-asset-hub and will require a rethink on asset reference storage which currently uses the URI/Qualifiers as a key to access the blob digest stored in a CAS.
Currently we have a verbose logger, we should replace this with prometheus metrics
One nit we might want to address is tidying up the go imports, I think the other bb-* repos split local imports into a separate block.
Agreed. We can address this in a follow up PR.
Originally posted by @Qinusty in #6 (comment)
We should support the push service for both Blobs and Directories.
We should support the fetch service for both Blobs and Directories.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.