GithubHelp home page GithubHelp logo

cache-buildkite-plugin's Introduction

Cache Buildkite Plugin Build status

A Buildkite plugin to store ephemeral cache files between builds.

Often builds involve fetching and processing large amounts of data that don't change much between builds, for instance downloading npm/gem/pip/cocoapod packages from central registries, or shared compile cache for things like ccache, or large virtual machine images that can be re-used.

Buildkite recommends using Artifacts for build artifacts that are the result of a build and useful for humans, where as we see cache as being an optional byproduct of builds that doesn't need to be content addressable.

For example, caching the node_modules folder as long as the package-lock.json file does not change can be done as follows:

steps:
  - label: ':nodejs: Install dependencies'
    command: npm ci
    plugins:
      - cache#v1.1.0:
          manifest: package-lock.json
          path: node_modules
          restore: file
          save: file

Mandatory parameters

path (string)

The file or folder to cache.

At least one of the following

restore (string, specific values)

The maximum caching level to restore, if available. See the available caching levels

save (string or array of strings, specific values)

The level(s) to use for saving the cache. See the available caching levels.

You can specify multiple levels in an array to save the same artifact as a cache for all those levels.

Options

backend (string)

Defines how the cache is stored and restored. Can be any string (see Customizable Backends), but the plugin natively supports the following:

  • fs (default)
  • s3

fs

Very basic local filesystem backend.

The BUILDKITE_PLUGIN_FS_CACHE_FOLDER environment variable defines where the copies are (default: /var/cache/buildkite). If you don't change it, you will need to make sure that the folder exists and buildkite-agent has the propper permissions, otherwise the plugin will fail.

IMPORTANT: the fs backend just copies files to a different location in the current agent, as it is not a shared or external resource, its caching possibilities are quite limited.

s3

Store things in an S3 bucket. You need to make sure that the aws command is available and appropriately configured.

You also need the agent to have access to the following defined environment variables:

  • BUILDKITE_PLUGIN_S3_CACHE_BUCKET: the bucket to use (backend will fail if not defined)
  • BUILDKITE_PLUGIN_S3_CACHE_PREFIX: optional prefix to use for the cache within the bucket
  • BUILDKITE_PLUGIN_S3_CACHE_ENDPOINT: optional S3 custom endpoint to use

Setting the BUILDKITE_PLUGIN_S3_CACHE_ONLY_SHOW_ERRORS environment variable will reduce logging of file operations towards S3.

compression (string, optional)

Allows for the cached file/folder to be saved/restored as a single file. You will need to make sure to use the same compression when saving and restoring or it will cause a cache miss.

Assuming the underlying executables are available, the allowed values are:

  • tgz: tar with gzip compression
  • zip: (un)zip compression

force (boolean, optional, save only)

Force saving the cache even if it exists. Default: false.

manifest (string, required if using file caching level)

A path to a file or folder that will be hashed to create file-level caches.

It will cause an unrecoverable error if either save or restore are set to file and this option is not specified.

Caching levels

This plugin uses the following hierarchical structure for caches to be valid (meaning usable), from the most specific to the more general:

  • file: only as long as a manifest file does not change (see the manifest option)
  • step: valid only for the current step
  • branch: when the pipeline executes in the context of the current branch
  • pipeline: all builds and steps of the pipeline
  • all: all the time

When restoring from cache, all levels, in the described order, up to the one specified will be checked. The first one available will be restored and no further levels or checks will be made.

Customizable backends

One of the greatest flexibilities of this plugin is its flexible backend architecture. You can provide whatever value you want for the backend option of this plugin (X for example) as long as there is an executable script accessible to the agent named cache_X that respects the following execution protocol:

  • cache_X exists $KEY

Should exit successfully (0 return code) if any previous call to this very same plugin was made with cache_x save $KEY. Any other exit code will mean that there is no valid cache and will be ignored.

  • cache_X get $KEY $FILENAME

Will restore whatever was previously saved on $KEY (using the save call described next) to the file or folder $FILENAME. A non-0 exit code will cause the whole execution to halth and the current step to fail.

You can assume that all calls like this will be preceded by an exists call to ensure that there is something to get.

  • cache_X save $KEY $FILENAME

Will save whatever is in the $FILENAME path (which can be a file or folder) in a way that can be identified by the string $KEY. A non-0 return code will cause the whole execution to halt and the current step to fail.

  • should fail with error 255 on any instance, preferably without output

Examples

You can always have more complicated logic by using the plugin multiple times with different levels and on different steps. In the following example the node_modules folder will be saved and restored with the following logic:

  • first step:
    • if the package-lock.json file has not changed, node_modules will be restored as is, run the npm install (that should do nothing because no dependencies changed), and skip saving the cache because it already exists
    • if the package-lock.json file has changed, it will restore step-level, branch-level and pipeline-level caches of the node_modules folder (the first one that exists), run npm install (that should be quick, just installing the differences), and then save the resulting node_modules folder as a file-level cache
  • second step:
    • will restore the file-level cache of the node_modules folder saved by the first step and run npm test
  • third step (that will only run on the master branch):
    • will restore the file-level cache saved by the first step, run npm run deploy and finally save the contents of the node_modules folder as both a pipeline-level and global (all-level) cache for usage as a basis even when the lockfile changes (in the first step)
steps:
  - label: ':nodejs: Install dependencies'
    command: npm ci
    plugins:
      - cache#v1.1.0:
          manifest: package-lock.json
          path: node_modules
          restore: pipeline
          save:
            - file
            - branch
  - wait: ~
  - label: ':test_tube: Run tests'
    command: npm test # does not save cache, not necessary
    plugins:
      - cache#v1.1.0:
          manifest: package-lock.json
          path: node_modules
          restore: file
  - wait: ~  # don't run deploy until tests pass
  - label: ':goal_net: Save stable cache after deployment'
    if: build.branch == "master"
    command: npm run deploy
    plugins:
      - cache#v1.1.0:
          manifest: package-lock.json
          path: node_modules
          restore: file
          save:
            - pipeline
            - all

License

MIT (see LICENSE)

cache-buildkite-plugin's People

Contributors

davidgregory084 avatar hugeirl avatar lox avatar peter-svensson avatar pzeballos avatar renovate-bot avatar renovate[bot] avatar tomowatt avatar toolmantim avatar toote avatar yob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cache-buildkite-plugin's Issues

Permission denied when restoring cache with S3

We're restoring files from an S3 cache with v0.3.2 and seeing this error when the cache has a hit and the restore is attempted:

Cache hit at file level, restoring /bundle...
--
  |  
  | [Errno 13] Permission denied: '/bundle'

Here's the relevant part of our setup:

steps:
  - label: ":docker: :bundler: Bundler"
    key: docker-gem-build
    env:
      BUILDKIT_PROGRESS: plain
      COMPOSE_DOCKER_CLI_BUILD: 1
      DOCKER_BUILDKIT: 1
    plugins:
      - ecr#v2.6.0:
          login: true
          account_ids: <REDACTED>
          region: <REDACTED>
      - cache#v0.3.2:
          backend: s3
          manifest: Gemfile.lock
          path: /bundle
          restore: file
          save: file
      - docker#v5.6.0:
          image: <REDACTED>
          command: ["bin/ci/bundle_install"]
          mount-checkout: true
          env-propagation-list: BUNDLE_ENTERPRISE__CONTRIBSYS__COM
          volumes:
            - "/bundle:/bundle"

The `s3` backend is not very suitable for common use cases

I've tried using the S3 backend with a configuration like the one from the README:

steps:
  - label: ':nodejs: Install dependencies'
    command: npm ci
    plugins:
      - cache#v0.4.0:
+          backend: s3
           manifest: package-lock.json
           path: node_modules
           restore: file
           save: file

There are problems with using S3 to cache node_modules, especially via the AWS CLI:

  1. S3 does not preserve file permissions
  2. S3 does not preserve symlinks

These problems are pretty noticeable for TypeScript projects, since tsc is no longer executable once it is restored from S3:

image

If we manage to fix that problem (e.g. by using MinIO client instead of AWS CLI to restore the file permissions), we will run into the next problem - restoring node_modules from S3 replaces all of the symlinks in node_modules/.bin with a duplicate file containing the contents of the symlink target, which leads to obscure errors when running executables:

image

I don't know if other build tools rely quite so heavily on symlinks, but it seems unwise to advertise a workflow that is unlikely to work for a lot of folks so prominently in the README.

Perhaps these limitations should be documented until something like #44 is merged?

S3 backend fails to sync with `tgz` compression

The S3 backend is unable to sync when using tgz compression:

image

I think this is because the s3 sync command does not expect to receive a single file as argument, so it tries to find a directory with the same name instead. Perhaps it should create a temporary directory in /tmp and archive a .tgz file named after the cache key into that temporary directory instead, i.e.

ACTUAL_PATH="$(mktemp -d)"
"${COMPRESS_COMMAND[@]}" "${ACTUAL_PATH}/${KEY}.tgz" "${CACHE_PATH}"

I was able to replicate this on the build agent host manually:

$ ACTUAL_PATH=$(mktemp)
$ echo $ACTUAL_PATH 
/tmp/tmp.WUc3AxyL2d
$ tar czf "$ACTUAL_PATH" .gradle
$ aws s3 sync --endpoint-url "https://data.mina-lang.org" "$ACTUAL_PATH" "s3://buildkite-cache/test-key"          
warning: Skipping file /tmp/tmp.WUc3AxyL2d/. File does not exist.

The pipeline YAML can be viewed here.

You can view a gist of a build log exhibiting this problem here.

One other thing that I noticed in this log is that the second cache post-command hook did not run after the first one failed. Is that expected?

Getting a "Waiting for folder lock"

I'm trying to integrate this plugin to the building process in our company with the following:

      - cache#v0.5.0:
          manifest: webapp/npm-shrinkwrap.json # this exists, I've tried anything else and I get a `No such file or directory` message.
          path: webapp/node_modules
          restore: file
          save: file

then when the build starts, as part of the initial groups I can see this:

~~~ Running plugin cache post-checkout hook
�[90m$�[0m /var/lib/buildkite-agent/plugins/github-com-buildkite-plugins-cache-buildkite-plugin-v0-5-0/hooks/post-checkout
Cache miss up to file-level, sorry

which I think is normal, because there's no cache created yet, but right after the last step, where the new image is pushed to aws, I got this new last step which makes the build to fail:

~~~ Running plugin cache post-command hook
�[90m$�[0m /var/lib/buildkite-agent/plugins/github-com-buildkite-plugins-cache-buildkite-plugin-v0-5-0/hooks/post-command
Saving file-level cache of webapp/node_modules

Waiting for folder lock

Waiting for folder lock

Waiting for folder lock

Waiting for folder lock

Waiting for folder lock

�[31m🚨 Error: The plugin cache post-command hook exited with status 1�[0m

not sure if it's involved, but we also use the docker-compose plugin, so we create a container with the web app built that is eventually pushed to aws, so maybe this plugin operates out of that context? Is there a way to make it work in such conditions?

Support for backend: artifacts

Is there a reason buildkite artifacts are not a suitable backend for this?

I read

Buildkite recommends using Artifacts for build artifacts that are the result of a build and useful for humans, where as we see cache as being an optional byproduct of builds that doesn't need to be content addressable.

but that doesn't explain why we shouldn't/can't use artifacts as the backend for this. To me that seems like a more suitable and native buildkite integration, than relying on S3 for remote caching.

`BUILDKITE_PLUGIN_S3_CACHE_ONLY_SHOW_ERRORS` must be set

At the moment, the following error is produced when using the AWS CLI v1 with the s3 backend, unless BUILDKITE_PLUGIN_S3_CACHE_ONLY_SHOW_ERRORS is set:

image

I think this is because of the double quotes around the "$(verbose)" argument in save_cache and restore_cache, which forces the argument to be interpreted as an empty option string.

Skip save step if manifest hash has not changed

Currently if the save parameter is configured, a cache will always be saved on the post-command hook.

Saving cache can be a costly operation if the file/folder is large.

It would be good if there was an option to skip saving the cache if it is detected that a cache already exists with the same cache key.

This would mimic what some other CI platforms like CircleCI do:

image

Is `fs` backend supposed to work with EBS?

It seems like using the fs backend keeps failing with the following error (the directory exists and s3 backend works just fine)

Waiting for folder lock
Waiting for folder lock
Waiting for folder lock
Waiting for folder lock
Waiting for folder lock

I'm guessing this might have something to do with using AWS EBS instead of built-in NVMe storage?

Support for caching multiple directories

Currently, the plugin only takes in a single directory/path for caching. It would be great to have support for caching multiple directories so you don't have to create multiple entries of cache.

Simpler scopes?

Can we make the “most basic example” in the readme even simpler, by just having a restrictive scope to start with? Or is there a way to SHA256 things and fully trust the manifest?

For yarn and bundler and things, the only things to bust a SHA256 manifest would be an arch difference between agent machines for native deps.

S3 backend fails with status 255 on Elastic CI Stack for AWS

Hi team,

I've tried using the s3 backend on agents running in the Elastic CI Stack for AWS using this configuration:

steps:
  - label: 'test caching'
    command: echo "test caching"
    plugins:
      - cache#v0.5.0:
          backend: s3
          manifest: .buildkite/pipeline.yml
          path: .buildkite
          restore: file
          save: file

and am getting this error: Error: The plugin cache post-command hook exited with status 255.

Permission to interact with the S3 bucket are granted through the EC2 instance role. I think I've configured the BUILDKITE_PLUGIN_S3_CACHE_BUCKET environment variable correctly: I tested it with a step:

steps:
  - label: 'test s3'
    command: aws s3 sync log "s3://${BUILDKITE_PLUGIN_S3_CACHE_BUCKET}/log"

which completed successfully.

Support saving multiple cache types in one plugin definition

Use case

I've got a single build step that caches a path based off a manifest file. I want to restore the cache based off that manifest, but fall back to a pipeline-level cache.

Unless I've misunderstood some part of the docs, currently the only way to achieve this is to define two separate instances of the plugin. For example:

steps:
  plugins:
    - cache#v1.0.1:
        manifest: some-file
        path: ./some/path
        restore: pipeline
        save: file
    - cache#v1.0.1:
        path: ./some/path
        save: pipeline

When using the compression option, the above config will compress the same file/folder twice, instead of just uploading differently-named copies of the same compressed file.

Potential implementation

IMO a nice way to represent this would be to allow specifying an array of values to save. For example:

steps:
  plugins:
    - cache#v1.0.1:
        manifest: some-file
        path: ./some/path
        restore: pipeline
        save:
          - file
          - pipeline

An alternative could be some kind of restore key, similar to what the cache github action provides, though I'm not sure if that would align with this plugin's current configuration API.

Error with S3 backend: "Unknown options: --recursive"

We're running the Elastic CI stack v5.16.1 and trying to use the S3 storage option. At the end of our build, during the cache plugin's post-command hook, we're getting this error:

Running plugin cache post-command hook | 0s
-- | --
  | $ /var/lib/buildkite-agent/plugins/github-com-buildkite-plugins-cache-buildkite-plugin-v0-3-0/hooks/post-command
  | Saving file-level cache of /bundle
  |  
  | Unknown options: --recursive
  | 🚨 Error: The plugin cache post-command hook exited with status 255

It looks like the flag is being passed in here: https://github.com/buildkite-plugins/cache-buildkite-plugin/blob/master/backends/cache_s3#L22-L26

Are we doing something wrong?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

buildkite
.buildkite/pipeline.yml
  • docker-compose v3.9.0
  • shellcheck v1.1.2
  • plugin-linter v3.0.0
docker-compose
docker-compose.yml

  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.