GithubHelp home page GithubHelp logo

openshift / managed-upgrade-operator Goto Github PK

View Code? Open in Web Editor NEW
36.0 45.0 79.0 2.14 MB

V4 cluster upgrade automation

License: Apache License 2.0

Dockerfile 0.13% Shell 11.87% Go 83.02% Makefile 2.96% Python 2.02%
osdv4

managed-upgrade-operator's Introduction

Managed Upgrade Operator

Go Report Card codecov GoDoc License


The Managed Upgrade Operator has been created for the OpenShift Dedicated Platform (OSD) to manage the orchestration of automated in-place cluster upgrades.

Whilst the operator's job is to invoke a cluster upgrade, it does not perform any activities of the cluster upgrade process itself. This remains the responsibility of the OpenShift Container Platform. The operator's goal is to satisfy the operating conditions that a managed cluster must hold, both pre- and post-invocation of the cluster upgrade.

Examples of activities that are not core to an OpenShift upgrade process but could be handled by the operator include:

  • Pre and post-upgrade health checks.
  • Worker capacity scaling during the upgrade period.
  • Alerting silence window management.

If you like to contribute to the Managed Upgrade Operator, please read our Contribution Policy first.


Info

Documentation

  • FAQ -- Frequently Asked Questions.

For Developers

  • Design -- Describes the interaction between the operator and the custom resource definition.
  • Development -- Instructions for developing and deploying the operator.
  • Metrics -- Prometheus metrics produced by the operator.
  • Testing -- Instructions for writing tests.

Workflow - UpgradeConfig

  1. The operator watches all namespaces for an UpgradeConfig resource.
  2. When an UpgradeConfig is found or modified, the operator checks the Status History to determine if this upgrade has been applied to the cluster.
    • If the UpgradeConfig history indicates that the cluster has been successfully upgraded to the defined version, no further action is taken.
  3. If there is no previous history for this UpgradeConfig, or if it indicates that the upgrade is New, Pending or Ongoing, the operator creates a ClusterUpgrader to either initiate a new upgrade or or maintain an ongoing upgrade.
  4. The ClusterUpgrader runs through an ordered series of upgrade steps, executing them or waiting for them to complete.
    • As steps are launched or complete, they are added to the UpgradeConfig's Status History.
  5. Once all steps have been completed, the upgrade is considered complete and a Status History entry is written to indicate that the UpgradeConfig has been applied.

Sample UpgradeConfig CR definition

Example 1 - OSD upgrade using to version 4.4.6 using fast channel

apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
  name: managed-upgrade-config
spec:
  type: "OSD"
  upgradeAt: "2020-01-01T00:00:00Z"
  PDBForceDrainTimeout: 60
  capacityReservation: true
  desired:
    channel: "fast-4.4"
    version: "4.4.6"

Example 2 - OSD upgrade using to 4.7.13 using image digest

apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
  name: managed-upgrade-config
spec:
  type: "OSD"
  upgradeAt: "2021-01-01T00:00:00Z"
  PDBForceDrainTimeout: 60
  desired:
    image: "quay.io/openshift-release-dev/ocp-release@sha256:783a2c963f35ccab38e82e6a8c7fa954c3a4551e07d2f43c06098828dd986ed4"

managed-upgrade-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

managed-upgrade-operator's Issues

Failed to get original machineset in ARO

I created a brand new ARO cluster and attempted to apply an UpgradeConfig CR to it, using the capacityReservation: true setting. I'm encountering an error failed to get original machineset, which doesn't make sense to me because I haven't touched the original machinesets other than to scale out machines from 1 to 2. Based on the stacktrace error, it's also unclear what actions I need to take to resolve the issue.

UpgradeConfig CR:

apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
  name: managed-upgrade-config
  namespace: openshift-managed-upgrade-operator
spec:
  type: "ARO"
  upgradeAt: "2024-03-08T15:35:00Z"
  PDBForceDrainTimeout: 60
  capacityReservation: true
  desired:
    channel: "stable-4.12"
    version: "4.12.26"

The machineset:

$ oc get machineset -A
NAMESPACE               NAME                                     DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   aro-cluster-5c2tn-k4hhb-worker-eastus1   2         2         2       2           16h
openshift-machine-api   aro-cluster-5c2tn-k4hhb-worker-eastus2   2         2         2       2           16h
openshift-machine-api   aro-cluster-5c2tn-k4hhb-worker-eastus3   2         2         2       2           16h

Logs from MUO pod:

$ oc logs managed-upgrade-operator-6d7d6d8d65-2mwlx -f -n openshift-managed-upgrade-operator
ts=2024-03-08T15:40:46.358441985Z level=info logger=controller_upgradeconfig msg="Reconciling UpgradeConfig" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:46.994941661Z level=info logger=controller_upgradeconfig msg="Current cluster status" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config status=Upgrading
ts=2024-03-08T15:40:46.994972261Z level=info logger=controller_upgradeconfig msg="Cluster detected as already upgrading." Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.578525781Z level=info logger=controller_upgradeconfig msg="running step StartedNotificationSent" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.701591604Z level=info logger=controller_upgradeconfig msg="running step ClusterHealthyBeforeUpgrade" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.77726004Z level=info logger=controller_upgradeconfig msg="running step ExternalDependenciesAvailable" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.874410242Z level=info logger=controller_upgradeconfig msg="No external dependencies configured for availability checks. Skipping." Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.874441943Z level=info logger=controller_upgradeconfig msg="running step ComputeCapacityReserved" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.899053847Z level=info logger=controller_upgradeconfig msg="failed to get machineset" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2024-03-08T15:40:47.899081947Z level=error logger=controller_upgradeconfig msg="error when ComputeCapacityReserved" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config error="failed to get original machineset" stacktrace="github.com/openshift/managed-upgrade-operator/pkg/upgradesteps.Run\n\t/workdir/pkg/upgradesteps/runner.go:30\ngithub.com/openshift/managed-upgrade-operator/pkg/upgraders.(*clusterUpgrader).runSteps\n\t/workdir/pkg/upgraders/upgrader.go:63\ngithub.com/openshift/managed-upgrade-operator/pkg/upgraders.(*aroUpgrader).UpgradeCluster\n\t/workdir/pkg/upgraders/aroupgrader.go:86\ngithub.com/openshift/managed-upgrade-operator/controllers/upgradeconfig.(*ReconcileUpgradeConfig).upgradeCluster\n\t/workdir/controllers/upgradeconfig/upgradeconfig_controller.go:253\ngithub.com/openshift/managed-upgrade-operator/controllers/upgradeconfig.(*ReconcileUpgradeConfig).Reconcile\n\t/workdir/controllers/upgradeconfig/upgradeconfig_controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
ts=2024-03-08T15:40:47.910533589Z level=error msg="Reconciler error" controller=upgradeconfig controllerGroup=upgrade.managed.openshift.io controllerKind=UpgradeConfig UpgradeConfig="{managed-upgrade-config openshift-managed-upgrade-operator}" namespace=openshift-managed-upgrade-operator name=managed-upgrade-config reconcileID=b3169141-583d-4d3b-be9c-8efbae168f3d error="1 error occurred:\n\t* failed to get original machineset\n\n" stacktrace="sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"

I'm running 4.12.25 on ARO subscribed to stable-4.12 channel:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.25   True        False         15h     Cluster version is 4.12.25

Operator unable to find an available upgrade

The managed-upgrade-operator is encountering an issue where it is unable to find an available upgrade e.g. clusterversion 4.13.19.

UpgradeConfig

apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
  name: managed-upgrade-config
  namespace: openshift-managed-upgrade-operator
spec:
  PDBForceDrainTimeout: 60
  capacityReservation: false
  desired:
    channel: stable-4.13
    version: 4.13.19
  type: OSD
  upgradeAt: "2023-11-14T08:00:00Z"

The operator reports the following error:

ts=2023-11-14T08:46:01Z level=info logger=controller_upgradeconfig msg="An error occurred while validating UpgradeConfig: no available upgrade for the given clusterversion 4.13.19" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2023-11-14T08:46:01Z level=error logger=controller.upgradeconfig-controller msg="Reconciler error" name=managed-upgrade-config namespace=openshift-managed-upgrade-operator error="no available upgrade for the given clusterversion 4.13.19" stacktrace="sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"
ts=2023-11-14T08:48:33Z level=info logger=controller_machineconfigpool msg="Reconciling MachineConfigPool" Request.Namespace= Request.Name=worker
ts=2023-11-14T08:50:45Z level=info logger=controller_upgradeconfig msg="Reconciling UpgradeConfig" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config
ts=2023-11-14T08:50:45Z level=info logger=controller_upgradeconfig msg="Current cluster status" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config status=New
ts=2023-11-14T08:50:45Z level=info logger=controller_upgradeconfig msg="Validating UpgradeConfig" Request.Namespace=openshift-managed-upgrade-operator Request.Name=managed-upgrade-config

Environment:
OpenShift Cluster Version: 4.13.14

Manual Upgrade via Console UI
If you upgrade the cluster manually through the console UI, you have the option to select the desired version and execute the update manually. This is an alternative workaround.

Any information would be helpful.

Add Managed Upgrade Operator to OperatorHub

Hello team,

The Managed Upgrade Operator can also be used by Red Hats customers to partially automate the upgrades for their own OpenShift clusters. Customers must be aware that this is not a supported Operator, but this has so far never been a problem.

However, customers have asked about having the Managed Upgrade Operator available in the OpenShift OperatorHub so it can be installed and managed via the OperatorHub.

Likely it would make sense to add the Operator to the "Community" marketplace in OperatorHub, as Red Hat does not officially support this Operator.

What do you think, would that be a possible enhancement?

Thanks.

Documentation to include ARO and ROSA

Can we include documentation that notes this is also available on ARO and ROSA and also include any examples needed for ARO or ROSA?

ARO is mentioned in the design doc but not ROSA.

We should call out directly if ARO and ROSA is supported with this in addition to OSD.

UpgradeConfig CR with LOCAL config does not sync. #881

Operator Image: quay.io/app-sre/managed-upgrade-operator:latest

Operator keeps reconciling but doesn't add status fields to the UpgradeConfig CR:

{"level":"info","ts":1626093892.7293298,"msg":"Using local CR as the upgrade config provider"}
{"level":"info","ts":1626093892.731769,"logger":"upgradeconfig-localprovider","msg":"Read the upgrade config from the cluster directly"}
{"level":"info","ts":1626094079.5829825,"logger":"controller_machineconfigpool","msg":"Reconciling MachineConfigPool","Request.Namespace":"","Request.Name":"worker"}
{"level":"info","ts":1626094174.783578,"msg":"Using local CR as the upgrade config provider"}
{"level":"info","ts":1626094174.78903,"logger":"upgradeconfig-localprovider","msg":"Read the upgrade config from the cluster directly"}
{"level":"info","ts":1626094375.0481997,"logger":"controller_machineconfigpool","msg":"Reconciling MachineConfigPool","Request.Namespace":"","Request.Name":"worker"}

UpgradeConfig CR:

apiVersion: upgrade.managed.openshift.io/v1alpha1
kind: UpgradeConfig
metadata:
  name: managed-upgrade-config
  namespace: openshift-managed-upgrade-operator
spec:
  type: "OSD"
  upgradeAt: "2021-07-06T10:00:00Z"
  PDBForceDrainTimeout: 60
  capacityReservation: true
  desired:
    channel: "stable-4.7"
    version: "4.7.18"

Config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: managed-upgrade-operator-config
  namespace: openshift-managed-upgrade-operator
data:
  config.yaml: |
    configManager:
      source: LOCAL
      localConfigName: managed-upgrade-config
      watchInterval: 5
    maintenance:
      controlPlaneTime: 90
      ignoredAlerts:
        controlPlaneCriticals:
        - ClusterOperatorDown
        - ClusterOperatorDegraded
    scale:
      timeOut: 30
    upgradeWindow:
      delayTrigger: 30
      timeOut: 120
    nodeDrain:
      timeOut: 45
      expectedNodeDrainTime: 8
    healthCheck:
      ignoredCriticals:
      - DNSErrors05MinSRE
      - MetricsClientSendFailingSRE
      - UpgradeNodeScalingFailedSRE
      - UpgradeClusterCheckFailedSRE
      - PrometheusRuleFailures
      - CannotRetrieveUpdates
      - FluentdNodeDown
      ignoredNamespaces:
      - openshift-logging
      - openshift-redhat-marketplace
      - openshift-operators
      - openshift-customer-monitoring
      - openshift-route-monitoring-operator
      - openshift-user-workload-monitoring
      - openshift-pipelines
    extDependencyAvailabilityChecks: {}
    verification:
      ignoredNamespaces:
      - openshift-logging
      namespacePrefixesToCheck:
      - openshift
      - kube
      - default

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.