GithubHelp home page GithubHelp logo

Comments (7)

zachaller avatar zachaller commented on June 10, 2024

I know this might not allude to the root issue but could you try bumping --analysis-threads flag on the controller to say something like 60 --analysis-threads=60 I think it defaults to 30, and see if that helps at all?

from argo-rollouts.

kevinqian-db avatar kevinqian-db commented on June 10, 2024

I know this might not allude to the root issue but could you try bumping --analysis-threads flag on the controller to say something like 60 --analysis-threads=60 I think it defaults to 30, and see if that helps at all?

Our args configuration looks like the following:

- args:
        - --analysis-threads
        - "264"
        - --rollout-threads
        - "88"
        - --qps
        - "80"
        - --burst
        - "160"
        - --leader-elect
        - "true"

IMO it should have been sufficient amount of threads

from argo-rollouts.

kevinqian-db avatar kevinqian-db commented on June 10, 2024

It seems that with v1.6.3, the reconciliation time for completed AnalysisRuns becomes substantially slower under heavy load, and when combined with periodic rescheduling of all completed AnalysisRuns for reconciliation due to resync period of AnalysisRun informers (15min) + workqueue being FIFO, this repeatedly starves live AnalysisRuns that requires progress.

I think it strengthens the necessity of #3285. Do you mind help check if it makes sense? Thanks! @zachaller

from argo-rollouts.

zachaller avatar zachaller commented on June 10, 2024

I do think you analysis of the issue makes sense I don't know if I am sold just yet on rollouts managing the ttl, I don't think the original design of analysis runs was meant to be used outside of a rollout. I have to see and think on what a proposal of that would look like.

from argo-rollouts.

zachaller avatar zachaller commented on June 10, 2024

How are you guys creating your analysis runs, what do those specs look like, does it make sense for that tool to manage the cleanup?

from argo-rollouts.

kevinqian-db avatar kevinqian-db commented on June 10, 2024

How are you guys creating your analysis runs, what do those specs look like, does it make sense for that tool to manage the cleanup?

We have an internal tool that directly generates an AnalysisRun (without template) manifest based on a set of configurations (so internal users do not realistically know the presence of Argo, but they will configure their own special metrics, and these metrics can be quite volatile, so template would not be too helpful), along with the manifest of the type of the resource user requested (e.g. StatefulSet). After applying both manifest, the tool will await for updates of AnalysisRun, and decide whether to rollback based on its terminal status (or rollback if timeout). Basically we are relying on Argo's versatility to contact different metrics endpoints and periodic metrics collection capabilities for this use case. Not super sure if other people have also tried to use Argo in similar ways

Since this tool is essentially just a long running script with one separate instance per deployment, it does not have knowledge about other historical AnalysisRuns. We can definitely choose to let it delete the AnalysisRun after completion, but we do hope to keep these completed AnalysisRuns alive for 1-2 months before we are sure it is safe to delete. We can also just create separate cron jobs on each cluster to do the cleanups, but due to the amount of k8s clusters we are maintaining and other infra complexity, baking it directly into Argo might be the easiest way, as long as it makes sense to be a meaningful feature for general Argo use cases

Also cc @gavin-db for other possible input.

from argo-rollouts.

gavin-db avatar gavin-db commented on June 10, 2024

@zachaller Users can apply AnalysisRuns directly to their cluster using kubecfg apply. This is actually explicitly supported by the API and Argo docs, which support the following command:

This command creates a new AnalysisRun from an existing AnalysisTemplate resources or from an AnalysisTemplate file.

kubectl argo rollouts create analysisrun [flags]

The use case is performing Analyses for non-Rollout resources (eg StatefulSets, DaemonSets, etc). Users can simply trigger an AnalysisRun when deploying StatefulSets/DaemonSets/ConfigMaps/etc and use the result to gauge the health of their system (and make subsequent decisions). We described this in some detail at ArgoCon last year.

Anytime a user triggers an AnalysisRun in this way, there is no cleanup mechanism today, and Argo Rollouts just leaves the AnalysisRuns in a terminal phase indefinitely (and continues reindexing them every 15 minutes forever, which inevitably kills the controller's performance).

We have a rough proposal for supporting a TTL (an alternative would be to upgrade the controller to not reindex terminal AnalysisRuns, but that would require a more significant change). #3285. Can contribute upstream if interface makes sense.

from argo-rollouts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.