GithubHelp home page GithubHelp logo

Comments (9)

bigdaz avatar bigdaz commented on July 28, 2024 1

The dependencies caches are shared between Jobs, if those Jobs have the exact same set of dependencies resolved.

I might want to clarify the docs a little:

  • The Gradle User Home changes on pretty much every invocation of Gradle. So you get a new GUH cache entry on every run. We try to keep this cache entry as small as possible.
  • Before saving the GUH, we extract out important stable parts into separate entries: these can be shared between different Jobs, but are primarily designed to be shared between different executions of the same Job.
    • Ideally, this means that a lot of Job executions will see:
      • A new, small GUH cache entry saved.
      • All of the other extracted entries not saved, since they are not changed.
    • In some cases the 'instrumented jars' and 'transforms' are not stable for a build, and you'll see a new entry being saved on every execution. This is a problem with Gradle or the build script: ideally these should not change if build logic doesn't change.
  • If you run the same Gradle build in 2 Jobs, you should see some sharing between the cache entries for those Jobs. But if you run different tasks, then different dependencies will be resolved, and the dependencies entries will be different.

If the setup-gradle action were smarter, it could use a "layered" approach where the set of dependencies that is commmon between Jobs would be extracted into a shared entry, and a second entry would have the dependencies that are unique to the Job.

from actions.

bigdaz avatar bigdaz commented on July 28, 2024

As far as I can see, this is working as expected.

  • Each Job will have save and restore it's own Gradle User Home cache entry: the only time that build-second will start with a cache entry from build-first is in the case that there's no cache entry from a previous build-second run. The Gradle User Home cache entry then references all of the other entries (dependencies, wrapper-zips, etc)
  • If the jobs use exactly the same set of dependencies, then the Gradle User Home Cache entries will reference the same gradle-dependencies-v1-* entry: this entry will be shared. But since the builds use different dependencies, they have separate dependency entries. Unfortunately if there is overlap between the sets of dependencies for different Jobs this results in some duplication of storage. (** see below)
  • The contents of gradle-generated-gradle-jars-* and gradle-wrapper-zips-* entries are identical between the Jobs, so the Gradle User Home entries reference the exact same sub-entries: these sub-entries are shared between the jobs. So there is no duplication in the cache storage of this content.

Note that when you re-run a Job, it doesn't get a new git SHA and thus you don't get a new cache key for Gradle User Home. This is why you see the "Entry not saved: referencing 'Gradle User Home' cache entry not saved". To test this properly, you'll need to push a dummy commit for each run.

Please check out https://github.com/gradle/actions/blob/main/docs/setup-gradle.md#how-gradle-user-home-caching-works for more details on how this works.

** There are definitely ways this could be made more efficient, if we either stored one big entry with all of the dependencies for all of the jobs, or by extracting the common dependencies into one entry and only storing the unique dependencies per-job.

from actions.

cloudshiftchris avatar cloudshiftchris commented on July 28, 2024

Thanks for the clarification; have re-run tests (with dummy commits) - still seeing cleanups happening but presumably these relate to unused items within that build (e.g. removing the unused Groovy DSLs as its a Kotlin DSL script). Subsequent runs don't redownload artifacts, so good there.

Familiar with the documentation - what led me to believe that say dependencies were a shared cache across jobs:

  1. The cache keys, as shown in the GHA cache summary, don't include job/step names, e.g. gradle-dependencies-v1-dadd467fc5b383227281c888d7a80dbc vs gradle-home-v1|Linux-X64|build-second[27df40b80d25f4e0c5be504ed712742e]-dd00fdf2b4588b9de17b10ade0eb6d7891e723b3, indicating (incorrectly) that these are separate from the job. (it isn't obvious that the opaque hash key represents the set of deps)

  2. The documentation has a whole section on cache deduplication that lists certain caches as being "independent" which could be interpreted as "shared across jobs" (the bottom sentence there reinforces that, and indeed some caches are shared across jobs).

If these aren't indeed "shared" caches the documentation should be clearer on that - perhaps a table indicating what items are shared across jobs (where all other attributes match) etc.

from actions.

bigdaz avatar bigdaz commented on July 28, 2024

I did try to have a separate cache entry for each dependency, for maximal cache storage efficiency. This approach quickly led to 429 errors in the cache: GHA cache doesn't like too many small entries.

from actions.

cloudshiftchris avatar cloudshiftchris commented on July 28, 2024

lol, yea, can see that being a challenge. Lots of complexities here. Thanks for continuing to evolve this.

from actions.

aSemy avatar aSemy commented on July 28, 2024

Thank you both for investigating and explaining.

I understand that the caches are saved 'per job', and I wonder how that works with re-usable workflows?

Kotest uses a re-useable workflow to call Gradle. The Gradle workflow is called twice in the same job: once to generate a JVM API dump (using BCV), and then again to run all Kotlin Multiplatform tests.

BCV only needs JVM dependencies, while the KMP tests require all KMP dependencies.

How would the dependencies cache for the job be populated? Would a single one be re-used for both invocations of the re-usable workflow, or a separate one each? Could it cause the dependency cache to be unnecessarily discarded?

from actions.

cloudshiftchris avatar cloudshiftchris commented on July 28, 2024

Here's a Kotest run that shows the problem.

Specifically, this workflow snippet:

jobs:

   validate-api:
      name: Validate API
      if: github.repository == 'kotest/kotest'
      uses: ./.github/workflows/run-gradle.yml
      secrets: inherit
      with:
         runs-on: ubuntu-latest
         ref: ${{ inputs.ref }}
         task: apiCheck

   validate-primary:
      name: Validate on primary runner
      if: github.repository == 'kotest/kotest'
      needs: validate-api
      uses: ./.github/workflows/run-gradle.yml
      secrets: inherit
      with:
         runs-on: ubuntu-latest
         ref: ${{ inputs.ref }}
         task: check

...doesn't store cache information on the second job, many entries such as:

Entry: /home/runner/.gradle/caches/modules-*/files-*/*/*/*/*
    Requested Key : gradle-dependencies-v1-a702ec8f84890c954ac77cc8b9b7e1be
    Restored  Key : gradle-dependencies-v1-a702ec8f84890c954ac77cc8b9b7e1be
              Size: 445 MB (466172742 B)
              (Entry restored: exact match found)
    Saved     Key : 
              Size: 
              (Entry not saved: referencing 'Gradle User Home' cache entry not saved)

...perhaps due to Gradle User Home cache not changing (same SHA, same OS, ...):

Entry: Gradle User Home
    Requested Key : gradle-home-v1|Linux|run-tests[2c202e428a822dfa2e29a4f7ab04d507]-0054608c69772bd253aec319bedc2231983e13f1
    Restored  Key : gradle-home-v1|Linux|run-tests[2c202e428a822dfa2e29a4f7ab04d507]-0054608c69772bd253aec319bedc2231983e13f1
              Size: 1 MB (791418 B)
              (Entry restored: exact match found)
    Saved     Key : 
              Size: 
              (Entry not saved: cache key not changed)

from actions.

cloudshiftchris avatar cloudshiftchris commented on July 28, 2024

Ahh....

This cache key for GUH: gradle-home-v1|Linux|run-tests[2c202e428a822dfa2e29a4f7ab04d507]-0054608c69772bd253aec319bedc2231983e13f1 contains run-tests as the job name - would expect it to change, to be validate-api or validate-primary.

run-tests is the job name in the reusable Github workflow that setups & executes Gradle via uses: ./.github/workflows/run-gradle.yml.

So it would appear that setup-gradle has the immediate job name, but not the overall context and we're colliding as a result.

from actions.

cloudshiftchris avatar cloudshiftchris commented on July 28, 2024

(this is the issue I thought was a race condition in the initial part of this ticket)

from actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.