GithubHelp home page GithubHelp logo

Comments (9)

dmikusa avatar dmikusa commented on September 27, 2024

Thanks for raising this up.

Presently, you could make this work by packaging up your own copy of the Bellsoft Liberica buildpack. You would first need to adjust sha256 and uri to point to the CRAC-enabled JVM you want, then package the buildpack. Lastly, you can consume your buildpack with the instructions in that link. Instead of specifying an alternative Paketo JVM buildpack, point to your image. i.e. docker.io/foo/my-buildpack.

Long term, I think we're waiting to see how folks want to use CRAC with buildpacks. We can certainly bundle and install a CRAC-enabled JVM, but there's more work that needs to happen for CRAC to be useful. In particular, you need to start and run the app to generate the checkpoint. How long the app runs and what it does while the checkpoint is being generated are unclear, and buildpacks would have limits if they were to attempt to automatically generate a checkpoint. Going further, that checkpoint needs to live somewhere. Buildpacks could put it into the image, but it's unclear if that's what people would want/expect.

As someone asking for this feature, we'd like to hear your thoughts on how you plan to use CRAC & what you'd expect buildpacks to do. That will help us create better support in buildpacks for this functionality. Thanks

from bellsoft-liberica.

bitgully avatar bitgully commented on September 27, 2024

Thanks for the hint of packaging a custom buildpack. It would just be more sustainable having the mentioned support built-in.

I would like to use buildpacks as part of a more comprehensive pipeline (i.e. Tekton) for deployments on Kubernetes clusters. This pipeline consists of a sequence of steps.

For example:

  1. Git clone of application source code
  2. Build image using buildpacks (with CRaC enabled JVM)
  3. Deploy newly built image as Kubernetes pod and warm-up
  4. Set CRaC checkpoint
  5. Add the checkpoint file in an additional layer to the previously built image

This could resemble the workflow in the build environment where the first start-up would still be slow. But all future deployments (e.g. staging or production environments) would benefit from the CRaC enabled image.

from bellsoft-liberica.

dmikusa avatar dmikusa commented on September 27, 2024

The trouble is that you can't do step 5.) there. Your app image is generated in step 2.) when you build with buildpacks. Once the image is written, you can't change it. OCI images are immutable (guaranteed via hashes).

You could build a new one with that information included, but it's a rebuild of the image that produces a new image.

So something like:

  1. Git clone of application source code
  2. Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info
  3. Deploy newly built image as Kubernetes pod and warm-up
  4. Set CRaC checkpoint & save the checkpoint info
  5. Re-run buildpacks providing the checkpoint info. Buildpacks could then include that into the produced image.
  6. Run your image w/CRaC info in K8s

The second build could be pretty quick because of all the caching that we do. It's extra steps though.

Another possibility is:

  1. Git clone of application source code
  2. Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info
  3. Deploy newly built image as Kubernetes pod and warm-up
  4. Set CRaC checkpoint & save the checkpoint info
  5. Run your image w/out CRaC info in K8s but volume mount (or config map) the checkpoint info into the container where it can be used by the application.

This is one step less, but requires a volume mount and those can be a problem/non-starter for some users. The other option might be a config map, but I suspect the checkpoint info might be too larger for that.

and another is:

  1. Git clone of application source code
  2. Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info. Part of what the buildpacks does will start your application (maybe for X seconds?) and then it stops the process saving the checkpoint info. Checkpoint info is then included with the image the first time through.
  3. Run your image w/CRaC info in K8s

This is the least steps/most automated but it is very difficult for buildpacks to start up a random app successfully. It might require resources not available like a service (DB, message queue, etc..). We might be able to get a little farther if we constrain the types of apps supported like if we only support Spring apps. In that case, we might be able to more reliably start the app but even then, you could still have issues with required services.

and yet another possibility is:

  1. Git clone of application source code
  2. Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info. Install a helper tool that will run before the app to fetch the checkpoint info from somewhere (maybe an HTTP server?).
  3. Deploy newly built image as Kubernetes pod and warm-up
  4. Set CRaC checkpoint & save the checkpoint info
  5. Run your image w/CRaC info in K8s. Before the app starts up, the helper tool runs and fetches the checkpoint info. The info is then available for the app to start ultra-quick.

I can see some advantages to this approach, but it has the drawback of needing work done before the app starts which takes time. Ultimately, CRaC is about starting the app super fast, so that it negates the benefits of it.

Anyway, I appreciate your thoughts and feedback. If anyone else comes across this thread, please add your feedback too.

from bellsoft-liberica.

bitgully avatar bitgully commented on September 27, 2024

The two possibilities mentioned in between (second and third) seem a little hard to implement. Possibility 2 would require the creation (and deletion) of an additional Persistent Volume/ConfigMap for each version of the app in every environment. As far as possibility 3 is concerned: I'm afraid it won't be possible for the buildpack to start all containers as part of the build process in enterprise environments. Because they often depend on a number of other resources (ConfigMaps, DBs, Leader/Follower instances...) that might only be available in the namespace where they get deployed afterwards (e.g. by a Helm chart that provides these artifacts).

But I would like to pick up on your first and your last proposed possibilities. Let's name them "Heavy" (=first) and "Light" (=last) for now. The heavyweight option includes the checkpoint data in the app's image itself and the lightweight option fetches the checkpoint info separately at startup.

Heavy

Pros:

  • Easy to use/start later on.

Cons:

  • Two image builds necessary (second one is faster).
  • Large image size for future pulls (though layers get pulled in parallel).
  • App's state is bundled with the implementation/image.
    • Checkpoint data can only be updated by rebuilding the image.
    • Potential future removal of CRaC functionality would require a rebuild of the app image.

These cons are not present in the lightweight version. But "Light" comes with other challenges already mentioned.

Light

Pros:

  • Only one image build necessary.
  • Small image size for future pulls (checkpoint data must still be fetched in case CRaC deployment is desired).
  • App's state is separated from it's implementation/image.
    • Checkpoint data can be updated without rebuilding the image.
    • Potential future removal of CRaC functionality doesn't require rebuild of image (just rebase of run image).
  • Existing buildpack must only add option for CRaC enabled JDK versions. Everything regarding checkpoint creation/restore is handled by the build/deploy workflow.

Cons:

  • Checkpoint data must be published in addition to the app image.
  • Deployment of app using CRaC needs extra configuration.

The cons of the "Light" version could be dealt with like below.

Checkpoint Data Storage

Since the checkpoint data must be fetched on-demand during app deployment, it needs a place to be published at alongside it's corresponding image version.

Option A:
We could leverage the OCI's "generic artifacts" feature that is already part of container registries. The build pipeline could use ORAS to push the checkpoint file next to it's image in the same location like so: oras push.

Option B:
Alternatively, we could simply create two images for every application. One standard app image and one checkpoint image (same name but with "-checkpoint" postfix).

Fast App Start of "Light" Option on Kubernetes

The concern about the extra work needed for fetching the checkpoint data on-the-fly at startup might be addressed like this:
It is assumed that many containers will be deployed on orchestration platforms like Kubernetes.
Since containers inside a pod share the same filesystem, we could inject a sidecar container in addition to the regular app's container.
The sole purpose of this tiny container is to fetch the checkpoint file from the same container image's location and save it to the shared filesystem (ephemeral "emptyDir").
This way, the app's image and the checkpoint file are pulled simultanously (using serializeImagePulls=false), assuming the startup delay would not be any longer than it already is today.
The normal app's container can then boot and simply restore the checkpoint from the pod's local filesystem.

The sidecar container, that fetches the checkpoint data, should write an additional file (e.g. "checkpoint-fetch-completed") when there was either no checkpoint available or it's download was completed.
The presence of this "checkpoint-fetch-completed" file can be checked at the app container's startup (e.g. override default entrypoint with some bash command) to prevent it from booting while the checkpoint is still being downloaded.

Conclusion

The disadvantages of storing app image and checkpoint data separately can be handled by existing Kubernetes functionalities.
Even though this requires extra configuration, it leads to faster builds and allows for a more flexible update/removal of checkpoints or CRaC if need be.

I suppose both, the heavyweight as well as the lightweight, solutions come with some trade-offs but could work.

from bellsoft-liberica.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.