GithubHelp home page GithubHelp logo

keng-operator's Introduction

KENG Operator

Project Status: Active – The project has reached a stable, usable state and is being actively developed. license release) Build LGTM Grade LGTM Alerts

Kubernetes Operator is built on the basic Kubernetes resources and controller concepts and includes application specific knowledge to automate common tasks like create, configure and manage instances on behalf of a Kubernetes user. It extends the functionality of the Kubernetes API and is used to package, deploy and manage Kubernetes application.

KENG Operator defines CRD for KENG specific network device (IxiaTG) and can be used to build up different network topologies with network devices from other vendors. Network interconnects between the topology nodes can be setup with various container network interface (CNI) plugins for Kubernetes for attaching multiple network interfaces to the nodes.

This process happens in two phases; in the first phase the operator identifies the network interconnects that needs to be setup externally. In the second phase, once the network interconnects are setup, the operator deploys the containers and services. This entire process has been much simplified with the use of KNE. It automates this process and enables us to setup network topologies in Kubernetes. KNE uses Meshnet CNI to setup the network interconnects. KENG Operator watches out for IxiaTG CRDs to be instantiated in Kubernetes environment and accordingly initiates Ixia specific resource management.

The various KENG component versions to be deployed is derived from the KENG release version as specified in the IxiaTG config. These component mappings are captured in ixiatg-configmap.yaml for each KENG release. The configmap, as shown in the snippet below, comprise of the KENG release version ("release"), and the list of qualified component versions, for that release. KENG Operator first tries to access these details from Keysight published releases; if unable to so, it tries to locate them in Kubernetes configmap. This allows users to have the operator load images from private repositories, by updating the configmap entries. Thus, for deployment with custom images, the user is expected to download release specific ixiatg-configmap.yaml from published releases. Then, in the configmap, update the specific container image "path" / "tag" fields and also update the "release" to some custom name. Start the operator first as specified in the deployment section below, before applying the configmap locally. After this the operator can be used to deploy the containers and services.

  "release": "v0.1",
  "images": [
      {
          "name": "controller",
          "path": "ghcr.io/open-traffic-generator/keng-controller",
          "tag": "0.1.0-3"
      },
      {
          "name": "gnmi-server",
          "path": "ghcr.io/open-traffic-generator/otg-gnmi-server",
          "tag": "1.13.0"
      },
      {
          "name": "traffic-engine",
          "path": "ghcr.io/open-traffic-generator/ixia-c-traffic-engine",
          "tag": "1.6.0.85"
      },
      {

The operator deploys one single Controller pod with Ixia-c and gNMI containers for user control, management and statistics reporting of KENG specific network devices. It also deploys KENG network device nodes for control and data plane. The deployed KENG resource release versions are anchored and dictated by the KENG release as defined in the KNE config file.

The KENG Controller can be deployed with or without licensing installed (default).

  • Community: Default deployment with no licensing; functionality is restricted to a subset of features
  • NEM: License enforcement based on number of concurrent test runs, uses a VM-based licensing



IxiaTG CRD

The IxiaTG CRD instance specifies the list of Ixia components to be deployed. These deployment details are captured in the CRD "spec" and comprise of the following fields.

  • Release - KENG release specific components version to deploy
  • Desired State - specify phase of deployment either INITIATED or DEPLOYED
  • Api Endpoint Map - service end points for control and management of all KENG nodes in the topology
  • Interfaces - the KENG list of interfaces and groups in the topology

In the first phase of deployment (desired state set to INITIATED), the operator determines the pod names and their interfaces that it will deploy in the second phase. It updates these details in the "status" component of the CRD instance, the "state" is also updated as specified in the "spec" desired state. The CRD instance "status" comprise of the following fields.

  • State - status of the operation, either as specified in desired state or FAILED
  • Reason - error message on failure
  • Api Endpoint - generated service names for reference
  • Interfaces - list of interface mappings with pod name and interface name

Based on these details, once the mesh of interconnects are setup, the IxiaTG CRD instance is updated with "spec" desired state set to DEPLOYED to trigger the pod and services deployment phase to start in the operator. On successful deployment the operator again updates the "status" state component to DEPLOYED. On failure state is set to FAILED and reason is updated with to error message. Below is an example of CRD instance.

spec:
  api_endpoint_map:
    gnmi:
      in: 50051
    http:
      in: 8443
  desired_state: DEPLOYED
  interfaces:
  - name: eth1
  - group: lag
    name: eth2
  - group: lag
    name: eth3
  release: local-latest
status:
  api_endpoint:
    pod_name: otg-controller
    service_names:
    - service-gnmi-otg-controller
    - service-http-otg-controller
  interfaces:
  - interface: eth1
    name: eth1
    pod_name: otg-port-eth1
  - interface: eth2
    name: eth2
    pod_name: otg-port-group-lag
  - interface: eth3
    name: eth3
    pod_name: otg-port-group-lag
  state: DEPLOYED

Note: The operator sets the minimum cpu and memory requirement to the default value for each component, depending on the port configuration, based on the data captured here.

Deployment

KENG Components

The following KENG release components are deployed by the operator.

  • keng-controller
  • otg-gnmi-server
  • ixia-c-protocol-engine
  • ixia-c-traffic-engine

Deployment Steps

Please make sure that the setup meets Deployment Prerequisites.

  • Available Releases https://github.com/open-traffic-generator/keng-operator/releases

  • Download Deployment yaml

    curl -kLO "https://github.com/open-traffic-generator/keng-operator/releases/tag/v0.3.13/ixiatg-operator.yaml"
  • Load Image

    docker pull ghcr.io/open-traffic-generator/keng-operator:0.3.13
  • Running as K8S Pod

    kubectl apply -f ixiatg-operator.yaml
  • Enable licensing (optional)

    kubectl create secret -n ixiatg-op-system generic license-server --from-literal=addresses="<space separated IP addresses>"

    Note for operator upgrades the previous secret, if any, is required to be deleted

    kubectl delete secret/license-server -n ixiatg-op-system

    The license can also be added by updating the ixiatg-configmap.yaml with a controller environment variable 'LICENSE_SERVERS' and applying the configmap.

       "release": "v0.1",
       "images": [
           {
               "name": "controller",
               "path": "ghcr.io/open-traffic-generator/keng-controller",
               "tag": "0.1.0-3",
               "env": {
                  "LICENSE_SERVERS": "<space separated IP addresses>"
               }
           },
           {
    kubectl apply -f ixiatg-configmap.yaml

Deployment Prerequisites

  • Please make sure you have kubernetes cluster up in your setup.

Build

  • Clone this project

    git clone https://github.com/open-traffic-generator/keng-operator.git
    cd keng-operator/
  • For Production

    export VERSION=latest
    export IMAGE_TAG_BASE=keng-operator
    
    # Generating keng-operator deployment yaml using Makefile
    make yaml
    # Generating docker build with name & tag (keng-operator:latest) using Makefile
    make docker-build
  • For Development

    # after cloning the repo, some dependencies need to get installed for further development
    chmod u+x ./do.sh
    ./do.sh deps

Quick Tour

do.sh covers most of what needs to be done manually. If you wish to extend it, just define a function (e.g. install_deps()) and call it like so: ./do.sh install_deps.

# install dependencies
./do.sh deps
# build production docker image
./do.sh build
# generate production yaml for operator deployment
./do.sh yaml

Test Changes

TBD

keng-operator's People

Contributors

anjan-keysight avatar ankur-sheth avatar ashutshkumr avatar biplamal avatar hashwini-keysight avatar raballew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

anjan-keysight

keng-operator's Issues

Allow Ixia-c Community Edition deployments

We should allow to deploy Ixia-c Community Edition (Traffic Engine only). Right now, if there is no Protocol Engine in the configmap, the deployment fails with:

Error: Error in cpdp node deploy Failed to find protocol engine image for release local-latest

Tested with configmap:

apiVersion: v1
kind: ConfigMap
metadata:
    name: ixiatg-release-config
    namespace: ixiatg-op-system
data:
    versions: |
        {
          "release": "local-latest",
          "images": [
                {
                    "name": "controller",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-controller",
                    "tag": "0.0.1-3662"
                },
                {
                    "name": "gnmi-server",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-gnmi-server",
                    "tag": "1.9.9"
                },
                {
                    "name": "traffic-engine",
                    "path": "ghcr.io/open-traffic-generator/ixia-c-traffic-engine",
                    "tag": "1.6.0.19"
                }
            ]
        }

And a deployment spec:

{
    "metadata": {
        "name": "otg",
        "namespace": "ixia-c"
    },
    "spec": {
        "api_endpoint_map": {
            "https": {
                "in": 443,
                "out": 31001
            },
            "grpc": {
                "in": 40051,
                "out": 31002
            },
            "gnmi": {
                "in": 50051,
                "out": 31003
            }
        },
        "interfaces": [
            {
                "name": "eth1",
                "peer": "localhost",
                "peer_interface": "veth0"
            },
            {
                "name": "eth2",
                "peer": "localhost",
                "peer_interface": "veth1"
            }
        ],
        "release": "local-latest"
    }
}

otg-controller pod containers do not support arbitrary user IDs

Both containers ixia-c and gnmi of the otg-controller pod fail to start due to permission denied errors when trying to run the operator on OpenShift. This is most likely due to the usage of arbitrary UIDs as part of the OpenShift multi layer security strategy as described here.

panic: Logger init failed: mkdir /home/keysight/ixia-c/controller/logs: permission denied
goroutine 1 [running]:
keysight/athena/controller/config.init.0()
/home/keysight/athena/controller/config/init.go:102 +0x1b7
panic: Logger init failed: mkdir /home/keysight/ixia-c-gnmi-server/logs: permission denied
goroutine 1 [running]:
github.com/open-traffic-generator/ixia-c-gnmi-server/config.init.0()
/home/keysight/ixia-c-gnmi-server/config/init.go:76 +0x173

To support using this operator on OpenShift the files access should be readable and writable by GID=0 (a container is always member of the root group). Thus commands invoked by the Entrypoint will be executed with a unprivileged UID and GID=0 pair. That means, it is an unprivileged user executing the commands and the UID that will be used during execution is not known in advance. From the technical design perspective, that means, directories and files that may be written to by processes in the Container should be owned by the root group and be read/writable by GID=0. Files to be executed should also have group execute permissions.

If you could point me in the right direction, I could contribute the required changes myself.

Support deploying Ixia-C pods for topologies involving OTG SW ports and DUT HW ports in docker environment

Here’s the summary:

  • FeatureProfiles has already been enhanced to support static binding in order to run OTG tests (earlier only ATE tests were supported).
    • OTG tests will now run without issues against both Ixia-C S/W ports and Ixia-C H/W ports
    • The PR is still under review by a team at Google
  • From test execution POV, The OTG and DUT endpoint addresses need to be specified in static binding file (no change here).
  • From deployment POV,
    • ixia-c-operator will need to be deployed on host node, mapped to docker socket
    • ixia-c controller and port containers shall be deployed using docker API on same host based on a declarative YAML
    • the YAML will need to be pushed using curl. e.g. curl -k -X POST https://localhost:6443/ -d @deploy.yaml
    • the YAML would specify list of port containers to deploy - i.e. each item consisting of the name of interfaces inside port container and name of corresponding real interface on host it’ll bind to (real interface shall be connected to DUT H/W ports)
    • ixia-c-operator will automate network plumbing (e.g. use MacVLAN to bind port container interface to and host interface)
  • Deployment is limited to single node and KNE is not needed.

Transient container pull from remote repos causes flakes with topology creation in KNE

Hello @anjan-keysight @biplamal, seeing some flakes when bringing up KNE topos with IxiaTG:

creating topology: failed to create topology: Node "otg": Status FAILED Reason got failure in ixia CRD status: Container ixia-c failed - rpc error: code = Unknown desc = failed to pull and unpack image "us-west1-docker.pkg.dev/.../ixia-c-controller:0.0.1-4013": failed to copy: read tcp 172.18.0.2:47650->74.125.132.82:443: read: connection reset by peer

This happens rarely (less than 1%) but is still affecting our KNE test runs. It appears to me that the Ixia operator treats image pull as a FAILED state (https://github.com/open-traffic-generator/ixia-c-operator/blob/a6bc34d9bc987a7d01869cfbfae670c7294862b7/README.md#ixiatg-crd) however cases where there is a flake in the pull (interrupted, etc.) there should be retry allowed before returning FAILED. If the image is not found and thats what lead to the failure then FAILED makes sense, but for the transient pull errors FAILED is too harsh and instead INITIATED should be returned for a certain amount of pull failures before declaring FAILED.

K8s retires ErrImagePull failures automatically, however for ixiatg we poll the status from the operator and thats whats causing the error

This is specifically when there is a transient error with kubernetes pulling the image from a remote repo (in this case read: connection reset by peer). Normally k8 silently retries these errors and will hang in a backoff loop indefinitely. The ixia-c operator treats these transient errors as unrecoverable failures.

otg-port pods require extended privileges but still use the default service account

When deploying a topology with IXIA to a cluster a bunch of otg-port pods are created. Since they are not using a specific service account on OpenShift only minimal privileges are used to run the container. This causes logs in the controller manager such as this entry:

time="2022-10-11T07:44:07Z" level=error msg="Failed to create pod for otg in 3-node-ceos-with-traffic - pods \"otg-port-eth1\" is forbidden: unable to validate against any security context constraint: [provider \"anyuid\": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider \"nonroot\": Forbidden: not usable by user or serviceaccount, provider \"hostmount-anyuid\": Forbidden: not usable by user or serviceaccount, provider \"machine-api-termination-handler\": Forbidden: not usable by user or serviceaccount, provider \"hostnetwork\": Forbidden: not usable by user or serviceaccount, provider \"hostaccess\": Forbidden: not usable by user or serviceaccount, provider \"node-exporter\": Forbidden: not usable by user or serviceaccount, provider \"meshnet\": Forbidden: not usable by user or serviceaccount, provider \"privileged\": Forbidden: not usable by user or serviceaccount]"

For more details please check the attached log file.
ixiatg-op-controller-manager-66d9845cd9-27v25-manager.log

This seems to be fixable by extending the privileges of the default service account as shown below, but in general this is not a practice recommend anywhere as other pods that do not specify an other service account will also inherit these privileges.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ixiatg-role
rules:
  - apiGroups:
      - security.openshift.io
    resourceNames:
      - privileged
    resources:
      - securitycontextconstraints
    verbs:
      - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ixiatg-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ixiatg-role
subjects:
  - kind: ServiceAccount
    name: default

A better solution would be to use a dedicated service account for pods created by the controller, so that extending privileges is limited to a specific set of application running in this namespace.

Missing readme.md

Please describe:

  • How to build production image
  • How to setup dev environment
  • How to ensure the changes I've made are validated / tested

otg-controller uses privileged port 443

When creating the otg-controller pod (ixia-c container) there is no way to change the desired port on which the HTTPS server should be started

https://github.com/open-traffic-generator/ixia-c-operator/blob/372d38785bd5210586b9d256ab2cd070bbd63674/controllers/ixiatg_controller.go#L1064

Port 443 is a privileged port so that the application requires additional user privileges (root) to run properly. Either this port should be configurable or set to a non-privileged port (>1024).

OOM killed while trying to locate the configmap

I have installed both, the operator and the configmap to my cluster:

kubectl apply -f https://github.com/open-traffic-generator/ixia-c-operator/releases/download/v0.2.2/ixiatg-operator.yaml
kubectl apply -f https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-3423/ixia-configmap.yaml
kne create ...
I1007 11:39:19.682752 1 request.go:601] Waited for 1.046900147s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/migration.k8s.io/v1alpha1?timeout=32s
{"level":"info","ts":1665142761.2360508,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.2364528,"logger":"setup","msg":"starting manager - version 0.2.1\n"}
{"level":"info","ts":1665142761.2368538,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"127.0.0.1:8080"}
{"level":"info","ts":1665142761.236937,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I1007 11:39:21.237000 1 leaderelection.go:248] attempting to acquire leader lease ixiatg-op-system/b867187a.keysight.com...
I1007 11:39:36.934777 1 leaderelection.go:258] successfully acquired lease ixiatg-op-system/b867187a.keysight.com
{"level":"info","ts":1665142776.936252,"logger":"controller.ixiatg","msg":"Starting EventSource","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","source":"kind source: *v1beta1.IxiaTG"}
{"level":"info","ts":1665142776.936296,"logger":"controller.ixiatg","msg":"Starting Controller","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG"}
{"level":"info","ts":1665142777.0372126,"logger":"controller.ixiatg","msg":"Starting workers","reconciler group":"network.keysight.com","reconciler kind":"IxiaTG","worker count":1}
time="2022-10-07T11:39:37Z" level=info msg="Reconcile: otg (Desired State: INITIATED), Namespace: 3-node-ceos-withtraffic"
time="2022-10-07T11:39:37Z" level=info msg="Checking for finalizer"
time="2022-10-07T11:39:37Z" level=info msg="IXIA DS INITIATED CS "
time="2022-10-07T11:39:37Z" level=info msg="Contacting Ixia server for release dependency info - https://github.com/open-traffic-generator/ixia-c/releases/download/v0.0.1-9999/ixia-configmap.yaml"
time="2022-10-07T11:39:37Z" level=error msg="Failed to download release config file - Got http response 404"
time="2022-10-07T11:39:37Z" level=info msg="Try locating in ConfigMap..."

Additionally it is not able to locate the configmap which I created before creating any topology resulting in OOM kills and finally in a CrashLoopBackOff.

Review the dependencies needed to setup operator

As of now, I noticed following dependencies are installed, which may necessarily not be needed:

curl git openssh-server vim unzip tar make bash wget sshpass build-essential

Let's review and see if we really need openssh-server and build-essential at least.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.