seagate / cortx-k8s Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 47.0 6.15 MB

CORTX Kubernetes Orchestration Repository

Home Page: https://github.com/Seagate/cortx

License: Apache License 2.0

Shell 63.69% Smarty 12.66% Python 23.65%

kubernetes

cortx-k8s's People

Contributors

Stargazers

Watchers

cortx-k8s's Issues

Deployment Guide Needed for GCP

We currently have a guide for deployment in AWS: https://github.com/Seagate/cortx-k8s/blob/integration/doc/cortx-aws-k8s-installation.md

It would be good to have a comparable deployment guide for GCP.

CORTX v0.2.0: creating IAM user and S3 I/O testing

Hi,

Been following the cortx-aws-k8s-installation guide, but encountered a trouble at section 4.1.

After successfully authenticating using the CORTX credentials, I received the expected "Bearer bf7axxx" token. However, when using it to send a create-account request, I got a "404 not found" error from the S8 API:

[root@master cc]# curl -H 'Authorization: Bearer bf7a24a8aac14a8387177f548b34781f' -d '{ "account_name": "gts3account", "account_email": "[email protected]", "password": "Account1!", "access_key": "gregoryaccesskey", "secret_key": "gregorysecretkey" }' https://$CSM_IP:8081/api/v2/s3_accounts --insecure
404: Not Found

Here is how I requested the "Bearer token":

[root@master cc]# curl -v -d '{"username": "cortxadmin", "password": "Cortxadmin@123"}' https://$CSM_IP:8081/api/v2/login --insecure
* About to connect() to 10.107.201.208 port 8081 (#0)
*   Trying 10.107.201.208...
* Connected to 10.107.201.208 (10.107.201.208) port 8081 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* 	subject: CN=seagate.com,O=Seagate Tech,L=Pune,C=IN
* 	start date: Feb 18 11:58:25 2021 GMT
* 	expire date: Feb 16 11:58:25 2031 GMT
* 	common name: seagate.com
* 	issuer: CN=seagate.com,O=Seagate Tech,L=Pune,C=IN
> POST /api/v2/login HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.107.201.208:8081
> Accept: */*
> Content-Length: 56
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 56 out of 56 bytes
< HTTP/1.1 200 OK
< Authorization: Bearer bf7a24a8aac14a8387177f548b34781f
< Content-Type: application/json
< Content-Length: 25
< Server: NULL
< Strict-Transport-Security: max-age=63072000; includeSubdomains
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Security-Policy: script-src 'self'; object-src 'self'
< Referrer-Policy: no-referrer, strict-origin-when-cross-origin
< Pragma: no-cache
< Expires: 0
< Cache-control: no-cache, no-store, must-revalidate, max-age=0
< Date: Mon, 21 Mar 2022 04:04:14 GMT
< 
* Connection #0 to host 10.107.201.208 left intact
{"reset_password": false}[root@master cc]#

The Kubernetes cluster consists of one master and one worker (Centos 8). CORTX is deployed using the latest main branch. The baremetals are not from AWS, but from Chameleon.

pods

[root@master cc]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
consul-client-rv52r                    1/1     Running   0          2d13h
consul-server-0                        1/1     Running   0          2d13h
cortx-control-5dc5f7b6-ttbsk           1/1     Running   0          2d13h
cortx-data-node-1-6949c7c88b-8lwlw     3/3     Running   0          2d13h
cortx-ha-679b57d66b-j6vg8              3/3     Running   0          2d13h
cortx-server-node-1-5464b57b76-f2ttc   2/2     Running   0          2d13h
kafka-0                                1/1     Running   0          2d13h
openldap-0                             1/1     Running   0          2d13h
zookeeper-0                            1/1     Running   0          2d13h

services

[root@master cc]# kubectl get svc
NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                   AGE
consul-dns                          ClusterIP   10.97.150.134    <none>        53/TCP,53/UDP                                                             2d13h
consul-server                       ClusterIP   None             <none>        8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP   2d13h
cortx-control-loadbal-svc           NodePort    10.107.201.208   <none>        8081:32239/TCP                                                            2d13h
cortx-data-clusterip-svc-node-1     ClusterIP   10.106.110.22    <none>        22003/TCP,29001/TCP,29000/TCP                                             2d13h
cortx-data-headless-svc-node-1      ClusterIP   None             <none>        <none>                                                                    2d13h
cortx-ha-headless-svc               ClusterIP   None             <none>        <none>                                                                    2d13h
cortx-hax-svc                       ClusterIP   10.97.236.246    <none>        22003/TCP                                                                 2d13h
cortx-io-svc-0                      NodePort    10.97.118.40     <none>        8000:32262/TCP,8443:30626/TCP                                             2d13h
cortx-server-clusterip-svc-node-1   ClusterIP   10.100.128.209   <none>        22003/TCP                                                                 2d13h
cortx-server-headless-svc-node-1    ClusterIP   None             <none>        <none>                                                                    2d13h
cortx-server-loadbal-svc-node-1     NodePort    10.109.192.150   <none>        8000:32280/TCP,8443:31222/TCP                                             2d13h
kafka                               ClusterIP   10.108.103.229   <none>        9092/TCP                                                                  2d13h
kafka-headless                      ClusterIP   None             <none>        9092/TCP,9093/TCP                                                         2d13h
kubernetes                          ClusterIP   10.96.0.1        <none>        443/TCP                                                                   2d15h
openldap-svc                        ClusterIP   10.106.219.194   <none>        389/TCP                                                                   2d13h
zookeeper                           ClusterIP   10.109.117.186   <none>        2181/TCP,2888/TCP,3888/TCP                                                2d13h
zookeeper-headless                  ClusterIP   None             <none>        2181/TCP,2888/TCP,3888/TCP                                                2d13h

Sorry I might be a little new to CORTX -- been trying for a few hours! Would appreciate any suggestion!

Many thanks,
Faradawn

Convert confluence doc into markdown

At this location (https://github.com/Seagate/cortx-k8s#using-cortx-on-kubernetes), there is a TODO item which looks like:

Using CORTX on Kubernetes
TODO Port https://seagate-systems.atlassian.net/wiki/spaces/PUB/pages/754155622/CORTX+Kubernetes+N-Pod+Deployment+and+Upgrade+Document+using+Services+Framework#5.-Understanding-Management-and-S3-Endpoints-and-configuring-External-Load-balancer-service(Optional) here or into a linked doc/readme file.

This action needs to be done and the PR doing it needs to refer back to this Issue.

EOS-27621: Sync to Jira?

Hello @john-a-fletcher . The other cortx repos have an integration set up such that new issues created in github get automatically mirrored into jira and future updates on the jira side get mirrored also to github. Note that new issues created in jira are NOT mirrored to github (i.e. we do it safely so that community things are mirrored but internal things are not).

Do you want us to set this up for this repo as well?

Update README Pre-reqs section with minimal disk requirements

Update https://github.com/Seagate/cortx-k8s/tree/main#cortx-on-kubernetes-prerequisites section to include the minimal disk requirements for deploying CORTX on Kubernetes:

requiring a mounted disk for the path to the value defined in solution.yaml via the "storage_provisioner_path". This can be the local root volume as well, but if using the prereq-deploy-cortx-cloud.sh script, it expects an entire disk that it can use.
requiring at least two volumes for a single CVG per each node - 1 for metadata and at least 1 for data
- this should be resolved or determined if this needs to be the entire block device (eg /dev/sdg) or if this can be a volume on the device (eg /dev/sdg1 , /dev/sdg2 etc).

Complete "Solution YAML Overview" details on root README

As we are able to accommodate the solution.yaml changes in our current development cycle, we need to complete the https://github.com/Seagate/cortx-k8s/tree/integration#solution-yaml-overview table for a comprehensive documentation of the solution.yaml format and structure.

cortx-aws-k8s-installation.md requires updates to include specific versions used in deployment

The current cortx-aws-k8s-installation.md documentation does not specify which version of the https://github.com/Seagate/cortx-k8s/releases codebase is used for deployment. This causes issues since there has been a rearchitecture of Data & Server components and certain commands need to be updated to reflect which component the end user is expecting to interact with.

We need to include a specific tagged version in Step 3.1, as well as the associated tested and released cortx-all container image for default user interaction and deployment. As well as updating the existing instructions to match the current architecture.

v0.3.0 CSM login failed

Problem

After deploying CORTX v0.3.0, tried to log in CMS (control service) with username cortxadmin and password V1Acv%V8$2qL!JIP with the following command:

curl -d '{"username": "cortxadmin", "password": "$control_password"}' https://$CSM_IP:8081/api/v2/login -k -i

But got a 401 unauthorized error:

I got an unauthorized error:
HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Server: NULL
Strict-Transport-Security: max-age=63072000; includeSubdomains
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Security-Policy: script-src 'self'; object-src 'self'
Referrer-Policy: no-referrer, strict-origin-when-cross-origin

Used to be able to login to CSM on CORTX v.0.2.0 with the above commands. Wondered did I missed something?

Expected behavior

The CSM API should return a bearer token, instead of a 401 error.

How to reproduce

1 - Deploy Kubernetes and CORTX

deploy kubernetes

git clone https://github.com/Seagate/cortx-k8s;

./prereq-deploy-cortx-cloud.sh -d /dev/sdb -s solution.example.yaml

kubectl taint node master node-role.kubernetes.io/master:NoSchedule-

./deploy-cortx-cloud.sh solution.example.yaml

2 - Try login CSM

control_password=$(kubectl get secrets/cortx-secret --namespace default --template={{.data.csm_mgmt_admin_secret}} | base64 -d) 

export CSM_IP=`kubectl get svc cortx-control-loadbal-svc -ojsonpath='{.spec.clusterIP}'`

curl -d '{"username": "cortxadmin", "password": "$control_password"}' https://$CSM_IP:8081/api/v2/login -k -i

CORTX on Kubernetes version

v.0.3.0 on integration

Deployment information

Kubernetes version: v1.23.5
kubectl version: latest
Cluster and Client OS: CentOS 7.8
Provider: Chameleon
Setup: four-node Kubernetes cluster, with master untainted (allowed scheduling).

Solution configuration file YAML

# same as solution.example.yaml at integration branch
solution:
  namespace: default
  deployment_type: standard
  secrets:
    name: cortx-secret
    content:
      kafka_admin_secret: null
      consul_admin_secret: null
      common_admin_secret: null
      s3_auth_admin_secret: null
      csm_auth_admin_secret: null
      csm_mgmt_admin_secret: null
  images:
    cortxcontrol: ghcr.io/seagate/cortx-all:2.0.0-725
    cortxdata: ghcr.io/seagate/cortx-all:2.0.0-725
    cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-725
    cortxha: ghcr.io/seagate/cortx-all:2.0.0-725
    cortxclient: ghcr.io/seagate/cortx-all:2.0.0-725
    consul: ghcr.io/seagate/consul:1.11.4
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r7
    zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    storage_provisioner_path: /mnt/fs-local-volume
    container_path:
      local: /etc/cortx
      log: /etc/cortx/log
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      max_start_timeout: 240
      extra_configuration: ""
    motr:
      num_client_inst: 0
      start_port_num: 29000
      extra_configuration: ""
    hax:
      protocol: https
      service_name: cortx-hax-svc
      port_num: 22003
    storage_sets:
      name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
    external_services:
      s3:
        type: NodePort
        count: 1
        ports:
          http: 80
          https: 443
        nodePorts:
          http: null
          https: null
      control:
        type: NodePort
        ports:
          https: 8081
        nodePorts:
          https: null
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
        client:
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        log_persistence_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1
  storage:
    cvg1:
      name: cvg-01
      type: ios
      devices:
        metadata:
          device: /dev/sdc
          size: 5Gi
        data:
          d1:
            device: /dev/sdd
            size: 5Gi
          d2:
            device: /dev/sde
            size: 5Gi
  nodes:
    node1:
      name: master
    node2:
      name: node-1
    node3:
      name: node-2
    node4:
      name: node-3

Logs

deployment.log

Additional information

Refine terminology used in README to align with existing project standards

Per feedback, we should rename the following elements in the root README:

Quick Starts section to Getting Started
Quick Install Guide to Quick Starts Guide`
Update the http://cortx.io/ to point to https://github.com/Seagate/cortx (as http://cortx.io/ is currently not working)

Couldn't deploy Kafka during CORTX deployment

[Edit: solution at the end of the thread]

To Whom It May Concern,

Error Description

When running the deploy-cortx-cloud.sh script, I kept getting the error that "Kafka installation failed: time out waiting for condition."

[root@master-node k8_cortx_cloud]# ./deploy-cortx-cloud.sh solution.yaml

Validate solution file result: success
Number of worker nodes detected: 1
W0302 14:56:09.990541    9500 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0302 14:56:10.007528    9500 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: cortx-platform
LAST DEPLOYED: Wed Mar  2 14:56:09 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
"hashicorp" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "hashicorp" chart repository
Update Complete. ⎈Happy Helming!⎈
Install Rancher Local Path Provisionernamespace/local-path-storage created
serviceaccount/local-path-provisioner-service-account created
clusterrole.rbac.authorization.k8s.io/local-path-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/local-path-provisioner-bind created
deployment.apps/local-path-provisioner created
storageclass.storage.k8s.io/local-path created
configmap/local-path-config created
######################################################
# Deploy Consul                                       
######################################################
NAME: consul
LAST DEPLOYED: Wed Mar  2 14:56:11 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing HashiCorp Consul!

Your release is named consul.

To learn more about the release, run:

  $ helm status consul
  $ helm get all consul

Consul on Kubernetes Documentation:
https://www.consul.io/docs/platform/k8s

Consul on Kubernetes CLI Reference:
https://www.consul.io/docs/k8s/k8s-cli
serviceaccount/consul-client patched
serviceaccount/consul-server patched
statefulset.apps/consul-server restarted
daemonset.apps/consul-client restarted
######################################################
# Deploy openLDAP                                     
######################################################
NAME: openldap
LAST DEPLOYED: Wed Mar  2 14:56:36 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Wait for openLDAP PODs to be ready..............

===========================================================
Setup OpenLDAP replication                                 
===========================================================
######################################################
# Deploy Zookeeper                                    
######################################################
"bitnami" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈

Registry: ghcr.io
Repository: seagate/zookeeper
Tag: 3.7.0-debian-10-r182
NAME: zookeeper
LAST DEPLOYED: Wed Mar  2 14:56:57 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: zookeeper
CHART VERSION: 8.1.1
APP VERSION: 3.7.0

** Please be patient while the chart is being deployed **

ZooKeeper can be accessed via port 2181 on the following DNS name from within your cluster:

    zookeeper.default.svc.cluster.local

To connect to your ZooKeeper server run the following commands:

    export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=zookeeper,app.kubernetes.io/instance=zookeeper,app.kubernetes.io/component=zookeeper" -o jsonpath="{.items[0].metadata.name}")
    kubectl exec -it $POD_NAME -- zkCli.sh

To connect to your ZooKeeper server from outside the cluster execute the following commands:

    kubectl port-forward --namespace default svc/zookeeper 2181: &
    zkCli.sh 127.0.0.1:2181

Wait for Zookeeper to be ready before starting kafka

######################################################
# Deploy Kafka                                        
######################################################

Registry: ghcr.io
Repository: seagate/kafka
Tag: 3.0.0-debian-10-r7
Error: INSTALLATION FAILED: timed out waiting for the condition



Wait for CORTX 3rd party to be ready.....................................................

Crashed Pod Description

Here is a description of the crashed Kafka pod:

[root@master-node cc]# kubectl get pod
NAME                  READY   STATUS             RESTARTS       AGE
consul-client-cs7b7   0/1     Running            0              3h8m
consul-server-0       1/1     Running            0              3h8m
kafka-0               0/1     CrashLoopBackOff   40 (44s ago)   3h7m
openldap-0            1/1     Running            0              3h8m
zookeeper-0           1/1     Running            0              3h7m
[root@master-node cc]# kubectl describe pod kafka
Name:         kafka-0
Namespace:    default
Priority:     0
Node:         worker-node-1/10.52.1.106
Start Time:   Wed, 02 Mar 2022 14:57:30 +0000
Labels:       app.kubernetes.io/component=kafka
              app.kubernetes.io/instance=kafka
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kafka
              controller-revision-hash=kafka-866fd78b49
              helm.sh/chart=kafka-15.3.4
              statefulset.kubernetes.io/pod-name=kafka-0
Annotations:  <none>
Status:       Running
IP:           10.32.0.7
IPs:
  IP:           10.32.0.7
Controlled By:  StatefulSet/kafka
Containers:
  kafka:
    Container ID:  docker://fdd090e633af20142df15e3d69869c38317e654d37081b3c349e729e076c8563
    Image:         ghcr.io/seagate/kafka:3.0.0-debian-10-r7
    Image ID:      docker-pullable://ghcr.io/seagate/kafka@sha256:91155a01d7dc9de2e3909002b3c9fa308c8124d525de88e2acd55f1b95a8341d
    Ports:         9092/TCP, 9093/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /scripts/setup.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 02 Mar 2022 18:03:58 +0000
      Finished:     Wed, 02 Mar 2022 18:04:08 +0000
    Ready:          False
    Restart Count:  40
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      250m
      memory:   1Gi
    Liveness:   tcp-socket :kafka-client delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:  tcp-socket :kafka-client delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:                                       false
      MY_POD_IP:                                            (v1:status.podIP)
      MY_POD_NAME:                                         kafka-0 (v1:metadata.name)
      KAFKA_CFG_ZOOKEEPER_CONNECT:                         zookeeper.default.svc.cluster.local
      KAFKA_INTER_BROKER_LISTENER_NAME:                    INTERNAL
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP:            INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT
      KAFKA_CFG_LISTENERS:                                 INTERNAL://:9093,CLIENT://:9092
      KAFKA_CFG_ADVERTISED_LISTENERS:                      INTERNAL://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9093,CLIENT://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9092
      ALLOW_PLAINTEXT_LISTENER:                            yes
      KAFKA_VOLUME_DIR:                                    /bitnami/kafka
      KAFKA_LOG_DIR:                                       /opt/bitnami/kafka/logs
      KAFKA_CFG_DELETE_TOPIC_ENABLE:                       true
      KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE:                 true
      KAFKA_HEAP_OPTS:                                     -Xmx1024m -Xms1024m
      KAFKA_CFG_LOG_FLUSH_INTERVAL_MESSAGES:               10000
      KAFKA_CFG_LOG_FLUSH_INTERVAL_MS:                     1000
      KAFKA_CFG_LOG_RETENTION_BYTES:                       1073741824
      KAFKA_CFG_LOG_RETENTION_CHECK_INTERVALS_MS:          300000
      KAFKA_CFG_LOG_RETENTION_HOURS:                       168
      KAFKA_CFG_MESSAGE_MAX_BYTES:                         1000012
      KAFKA_CFG_LOG_SEGMENT_BYTES:                         1073741824
      KAFKA_CFG_LOG_DIRS:                                  /bitnami/kafka/data
      KAFKA_CFG_DEFAULT_REPLICATION_FACTOR:                1
      KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR:          1
      KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR:  1
      KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR:             2
      KAFKA_CFG_NUM_IO_THREADS:                            8
      KAFKA_CFG_NUM_NETWORK_THREADS:                       3
      KAFKA_CFG_NUM_PARTITIONS:                            1
      KAFKA_CFG_NUM_RECOVERY_THREADS_PER_DATA_DIR:         1
      KAFKA_CFG_SOCKET_RECEIVE_BUFFER_BYTES:               102400
      KAFKA_CFG_SOCKET_REQUEST_MAX_BYTES:                  104857600
      KAFKA_CFG_SOCKET_SEND_BUFFER_BYTES:                  102400
      KAFKA_CFG_ZOOKEEPER_CONNECTION_TIMEOUT_MS:           6000
      KAFKA_CFG_AUTHORIZER_CLASS_NAME:                     
      KAFKA_CFG_ALLOW_EVERYONE_IF_NO_ACL_FOUND:            true
      KAFKA_CFG_SUPER_USERS:                               User:admin
      KAFKA_CFG_LOG_SEGMENT_DELETE_DELAY_MS:               1000
      KAFKA_CFG_LOG_FLUSH_OFFSET_CHECKPOINT_INTERVAL_MS:   1000
      KAFKA_CFG_LOG_RETENTION_CHECK_INTERVAL_MS:           1000
    Mounts:
      /bitnami/kafka from data (rw)
      /opt/bitnami/kafka/logs from logs (rw)
      /scripts/setup.sh from scripts (rw,path="setup.sh")
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-kafka-0
    ReadOnly:   false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kafka-scripts
    Optional:  false
  logs:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
    SizeLimit:   <unset>
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Normal   Pulled   29m (x35 over 3h8m)     kubelet  Container image "ghcr.io/seagate/kafka:3.0.0-debian-10-r7" already present on machine
  Warning  BackOff  4m19s (x847 over 3h8m)  kubelet  Back-off restarting failed container

Disk layout

Repartitioned the disks and rebooted the server many times, but still couldn't get over the Kafka deployment issue. Wondered may I ask for some help on what the issue might be?

Below is my disk layout, and I ran ./prereq-deploy-cortx-cloud.sh /dev/sdb1 with the disk parameter as /dev/sdb1.

[root@master-node cc]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   1.8T  0 disk 
sdb      8:16   0   1.8T  0 disk 
└─sdb1   8:17   0   1.8T  0 part 
sdc      8:32   0   1.8T  0 disk 
└─sdc1   8:33   0   1.8T  0 part 
sdd      8:48   0   1.8T  0 disk 
└─sdd1   8:49   0   1.8T  0 part 
sde      8:64   0   1.8T  0 disk 
└─sde1   8:65   0   1.8T  0 part 
sdf      8:80   0   1.8T  0 disk 
sdg      8:96   0   1.8T  0 disk 
sdh      8:112  0   1.8T  0 disk 
sdi      8:128  0   1.8T  0 disk 
sdj      8:144  0   1.8T  0 disk 
sdk      8:160  0   1.8T  0 disk 
sdl      8:176  0   1.8T  0 disk 
sdm      8:192  0   1.8T  0 disk 
sdn      8:208  0   1.8T  0 disk 
sdo      8:224  0   1.8T  0 disk 
sdp      8:240  0   1.8T  0 disk 
sdq     65:0    0 372.6G  0 disk 
└─sdq1  65:1    0 372.6G  0 part /

Solution.yaml:

solution:
  namespace: default
  secrets:
    name: cortx-secret
    content:
      openldap_admin_secret: seagate1
      kafka_admin_secret: Seagate@123
      consul_admin_secret: Seagate@123
      common_admin_secret: Seagate@123
      s3_auth_admin_secret: ldapadmin
      csm_auth_admin_secret: seagate2
      csm_mgmt_admin_secret: Cortxadmin@123
  images:
    cortxcontrol: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
    cortxdata: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
    cortxserver: cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-120-custom-ci
    cortxha: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
    cortxclient: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
    openldap: ghcr.io/seagate/symas-openldap:2.4.58
    consul: ghcr.io/seagate/consul:1.10.0
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r7
    zookeeper: ghcr.io/seagate/zookeeper:3.7.0-debian-10-r182
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    setup_size: large
    storage_provisioner_path: /mnt/fs-local-volume
    container_path:
      local: /etc/cortx
      shared: /share
      log: /etc/cortx/log
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      num_inst: 2
      start_port_num: 28051
      max_start_timeout: 240
    motr:
      num_client_inst: 0
      start_port_num: 29000
    hax:
      protocol: https
      service_name: cortx-hax-svc
      port_num: 22003
    storage_sets:
      name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
    external_services:
      type: LoadBalancer
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
        client:
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
      openldap:
        resources:
          requests:
            memory: 1Gi
            cpu: 2
          limits:
            memory: 1Gi
            cpu: 2
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        log_persistence_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1
  storage:
    cvg1:
      name: cvg-01
      type: ios
      devices:
        metadata:
          device: /dev/sdh
          size: 5Gi
        data:
          d1:
            device: /dev/sdi
            size: 5Gi
  nodes:
    node1:
      name: worker-node-1

Sorry I was a little new, and had been trying for a few days. Anything suggestion would help!

Thanks in advance!

Deployment errors with AWS Kubernetes guide

I'm using the Kubernetes on AWS QSG while working on Seagate/cortx#1381, but I'm running into trouble with the CORTX deployment. There were a couple of minor issues:

generate-cvg-yaml.sh depends on yq, I just grabbed the latest release from GitHub
In step 3.1, there's no stable branch, so I just used main

The more serious issue is during deploy-cortx-cloud.sh. I keep getting a timeout and failure around the "Deploy CORTX Server" phase, and I can see the data and server pods crashing and restarting. Not sure if this is due to changes in the deployment scripts or mismatched component versions. Any help getting this working would be greatly appreciated!

Getting connection refused while trying to run deploy-cortx-cloud.sh

As per the instructions in the Readme when tried to execute deploy-cortx-cloud.sh got connection refused error as shown below:

[root@dora k8_cortx_cloud]# ./deploy-cortx-cloud.sh 

Validate solution file result: success
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Number of worker nodes detected: 1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Can't deploy CORTX cloud.
List of nodes don't exist in the cluster:
- dora
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused
"hashicorp" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "hashicorp" chart repository
Update Complete. ⎈Happy Helming!⎈
Install Rancher Local Path ProvisionerThe connection to the server localhost:8080 was refused - did you specify the right host or port?
######################################################
# Deploy Consul                                       
######################################################
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
######################################################
# Deploy openLDAP                                     
######################################################
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused

Wait for openLDAP PODs to be readyThe connection to the server localhost:8080 was refused - did you specify the right host or port?
.The connection to the server localhost:8080 was refused - did you specify the right host or port?
.The connection to the server localhost:8080 was refused - did you specify the right host or port?

firewalld was off and SElinux was in permissive mode as shown below:

[root@dora k8_cortx_cloud]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2022-01-28 14:13:20 IST; 50min ago
     Docs: man:firewalld(1)
 Main PID: 853 (code=exited, status=0/SUCCESS)

Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:40 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (d...at chain?).
Jan 28 14:13:17 dora systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jan 28 14:13:20 dora systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.
[root@dora k8_cortx_cloud]# 
[root@dora k8_cortx_cloud]# 
[root@dora k8_cortx_cloud]# 
[root@dora k8_cortx_cloud]# getenforce
Permissive

Update AWS deployment guide and CloudFormation templates

Problem

Release v0.3.0 introduced some changes that affects the current AWS deployment guide. Here are some known cases:

Some solution.yaml keys are no longer required
References to solution.yaml should be changed to solution.example.yaml

There may be other changes needed, such as interaction with CSM and IAM usage. The guide and templates refer to v0.0.22. It is likely out of date after the switch to RGW S3 (v0.1.0). The guide and templates should be audited and updated as needed.

Expected behavior

Don't configure solution.common.setup_size solution.yaml setting
Don't configure solution.images.openldap solution.yaml setting
Replace usage of solution.yaml with solution.example.yaml
Update usage of prereq-deploy-cortx-cloud.sh script (script args changed)
etc...

How to reproduce

N/A

CORTX on Kubernetes version

v0.3.0

Deployment information

AWS CloudFormation

Solution configuration file YAML

No response

Logs

No response

Additional information

Related to:

PR #183 (CORTX-29720)
Issue #173

README has incorrect URL to AWS QSG

README.md currently points to https://github.com/Seagate/cortx-k8s/blob/UDX-6683_move_documentation_to_readme_md/doc/cortx-aws-k8s-installation.md

It should point to https://github.com/Seagate/cortx-k8s/blob/main/doc/cortx-aws-k8s-installation.md instead

Unable to access the Jenkins job mentioned in the document

While trying to follow CORTX Kubernetes N-Pod Deployment and Upgrade Document (using Services Framework) I encountered a unreachable link.

Unable to form Kubernetes cluster as the below is inaccessible

http://eos-jenkins.mero.colo.seagate.com/job/Cortx-kubernetes/job/setup-kubernetes-cluster/

Kafka pod failed when deploying CORTX on CRI-O

Problem

During the deployment of CORTX, Kafka pods kept crashing and looping.

Expected behavior

Should be able to deploy CORTX successfully, as Rick did it once with CRI-O.

How to reproduce

Can run the following script directly on CENTOS7

source <(curl -s https://raw.githubusercontent.com/faradawn/tutorials/main/linux/cortx/kube.sh)

Where the link to the deployment script is here.

Thanks again for taking a look!

CORTX on Kubernetes version

v0.6.0

Deployment information

Kubernetes version: v1.23.0
kubectl version: v1.23.0

Solution configuration file YAML

Attached below and here is a summary:
- only had node-1 and node-2
- master node is node-1, which is untainted
- storage only had sdc, sdd, sde

Logs

All pods

[root@node-1 cc]# kc get pods --all-namespaces
NAMESPACE            NAME                                       READY   STATUS             RESTARTS        AGE
calico-apiserver     calico-apiserver-68444c48d5-9f7hl          1/1     Running            0               7h38m
calico-apiserver     calico-apiserver-68444c48d5-nhrbb          1/1     Running            0               7h38m
calico-system        calico-kube-controllers-69cfd64db4-gvswf   1/1     Running            0               7h39m
calico-system        calico-node-dfqfj                          1/1     Running            0               7h39m
calico-system        calico-node-llqxx                          1/1     Running            0               7h39m
calico-system        calico-typha-7c59c5d99c-flv5m              1/1     Running            0               7h39m
default              cortx-consul-client-7r776                  1/1     Running            0               7h36m
default              cortx-consul-client-x6j8w                  0/1     Running            0               7h36m
default              cortx-consul-server-0                      1/1     Running            0               7h36m
default              cortx-consul-server-1                      1/1     Running            0               7h36m
default              cortx-kafka-0                              0/1     CrashLoopBackOff   162 (28s ago)   7h36m
default              cortx-kafka-1                              0/1     CrashLoopBackOff   99 (4m3s ago)   7h36m
default              cortx-zookeeper-0                          1/1     Running            0               7h36m
default              cortx-zookeeper-1                          1/1     Running            0               7h36m
kube-system          coredns-64897985d-9qn5m                    1/1     Running            0               7h40m
kube-system          coredns-64897985d-z8t5b                    1/1     Running            0               7h40m
kube-system          etcd-node-1                                1/1     Running            0               7h40m
kube-system          kube-apiserver-node-1                      1/1     Running            0               7h40m
kube-system          kube-controller-manager-node-1             1/1     Running            0               7h40m
kube-system          kube-proxy-7hpgl                           1/1     Running            0               7h40m
kube-system          kube-proxy-m4fcz                           1/1     Running            0               7h39m
kube-system          kube-scheduler-node-1                      1/1     Running            0               7h40m
local-path-storage   local-path-provisioner-756898894-bgxgk     1/1     Running            0               7h36m
tigera-operator      tigera-operator-7d8c9d4f67-69rlk           1/1     Running            0               7h40m

Kafka pod log

[root@node-1 k8_cortx_cloud]# kc logs cortx-kafka-0
kafka 12:13:50.75 
kafka 12:13:50.75 Welcome to the Bitnami kafka container
kafka 12:13:50.76 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 12:13:50.76 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 12:13:50.76 
kafka 12:13:50.76 INFO  ==> ** Starting Kafka setup **
kafka 12:13:50.82 WARN  ==> You set the environment variable ALLOW_PLAINTEXT_LISTENER=yes. For safety reasons, do not use this flag in a production environment.
kafka 12:13:50.83 INFO  ==> Initializing Kafka...
kafka 12:13:50.84 INFO  ==> No injected configuration files found, creating default config files
kafka 12:13:51.13 INFO  ==> Configuring Kafka for inter-broker communications with PLAINTEXT authentication.
kafka 12:13:51.14 WARN  ==> Inter-broker communications are configured as PLAINTEXT. This is not safe for production environments.
kafka 12:13:51.14 INFO  ==> Configuring Kafka for client communications with PLAINTEXT authentication.
kafka 12:13:51.14 WARN  ==> Client communications are configured using PLAINTEXT listeners. For safety reasons, do not use this in a production environment.
kafka 12:13:51.16 INFO  ==> ** Kafka setup finished! **

kafka 12:13:51.19 INFO  ==> ** Starting Kafka **
[2022-06-04 12:13:52,698] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2022-06-04 12:13:53,334] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2022-06-04 12:13:53,524] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2022-06-04 12:13:53,526] INFO starting (kafka.server.KafkaServer)
[2022-06-04 12:13:53,527] INFO Connecting to zookeeper on cortx-zookeeper (kafka.server.KafkaServer)
[2022-06-04 12:13:53,541] INFO [ZooKeeperClient Kafka server] Initializing a new session to cortx-zookeeper. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:13:53,545] INFO Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:host.name=cortx-kafka-0.cortx-kafka-headless.default.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.version=11.0.14 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.vendor=BellSoft (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.home=/opt/bitnami/java (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.class.path=/opt/bitnami/kafka/bin/../libs/activation-1.1.1.jar:/opt/bitnami/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/bitnami/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/bitnami/kafka/bin/../libs/commons-cli-1.4.jar:/opt/bitnami/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/connect-api-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-basic-auth-extension-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-file-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-json-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-client-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-runtime-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-transforms-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/hk2-api-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-locator-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-utils-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-annotations-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-core-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-databind-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-dataformat-csv-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-datatype-jdk8-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-base-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-json-provider-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-jaxb-annotations-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-scala_2.12-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jakarta.activation-api-1.2.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.annotation-api-1.3.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.inject-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.validation-api-2.0.2.jar:/opt/bitnami/kafka/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/opt/bitnami/kafka/bin/../libs/jakarta.xml.bind-api-2.3.2.jar:/opt/bitnami/kafka/bin/../libs/javassist-3.27.0-GA.jar:/opt/bitnami/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/bitnami/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/bitnami/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/bitnami/kafka/bin/../libs/jersey-client-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-common-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-core-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-hk2-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-server-2.34.jar:/opt/bitnami/kafka/bin/../libs/jetty-client-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-continuation-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-http-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-io-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-security-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-server-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlet-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlets-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-ajax-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jline-3.12.1.jar:/opt/bitnami/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/bitnami/kafka/bin/../libs/kafka-clients-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-log4j-appender-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-metadata-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-raft-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-server-common-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-shell-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-storage-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-storage-api-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-examples-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-scala_2.12-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-test-utils-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-tools-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/log4j-1.2.17.jar:/opt/bitnami/kafka/bin/../libs/lz4-java-1.7.1.jar:/opt/bitnami/kafka/bin/../libs/maven-artifact-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-4.1.12.1.jar:/opt/bitnami/kafka/bin/../libs/netty-buffer-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-codec-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-handler-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-resolver-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-epoll-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-unix-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/osgi-resource-locator-1.0.3.jar:/opt/bitnami/kafka/bin/../libs/paranamer-2.8.jar:/opt/bitnami/kafka/bin/../libs/plexus-utils-3.2.1.jar:/opt/bitnami/kafka/bin/../libs/reflections-0.9.12.jar:/opt/bitnami/kafka/bin/../libs/rocksdbjni-6.19.3.jar:/opt/bitnami/kafka/bin/../libs/scala-collection-compat_2.12-2.4.4.jar:/opt/bitnami/kafka/bin/../libs/scala-java8-compat_2.12-1.0.0.jar:/opt/bitnami/kafka/bin/../libs/scala-library-2.12.14.jar:/opt/bitnami/kafka/bin/../libs/scala-logging_2.12-3.9.3.jar:/opt/bitnami/kafka/bin/../libs/scala-reflect-2.12.14.jar:/opt/bitnami/kafka/bin/../libs/slf4j-api-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/slf4j-log4j12-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/snappy-java-1.1.8.1.jar:/opt/bitnami/kafka/bin/../libs/trogdor-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-jute-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/zstd-jni-1.5.0-2.jar (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.version=3.10.0-1127.19.1.el7.x86_64 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.name=1001 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.home=/ (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.free=1010MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.max=1024MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.total=1024MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,548] INFO Initiating client connection, connectString=cortx-zookeeper sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@51972dc7 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,552] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2022-06-04 12:13:53,557] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
[2022-06-04 12:13:53,559] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:13:59,560] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:14:13,580] ERROR Unable to resolve address: cortx-zookeeper:2181 (org.apache.zookeeper.client.StaticHostProvider)
java.net.UnknownHostException: cortx-zookeeper: Temporary failure in name resolution
	at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
	at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)
	at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88)
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141)
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207)
[2022-06-04 12:14:13,588] WARN An exception was thrown while closing send thread for session 0x0. (org.apache.zookeeper.ClientCnxn)
java.lang.IllegalArgumentException: Unable to canonicalize address cortx-zookeeper:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78)
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1161)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1210)
[2022-06-04 12:14:13,693] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:14:13,694] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2022-06-04 12:14:13,695] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:14:13,697] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
	at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:254)
	at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:250)
	at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:108)
	at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1981)
	at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:457)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:196)
	at kafka.Kafka$.main(Kafka.scala:109)
	at kafka.Kafka.main(Kafka.scala)
[2022-06-04 12:14:13,698] INFO shutting down (kafka.server.KafkaServer)
[2022-06-04 12:14:13,703] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2022-06-04 12:14:13,703] INFO shut down completed (kafka.server.KafkaServer)
[2022-06-04 12:14:13,703] ERROR Exiting Kafka. (kafka.Kafka$)
[2022-06-04 12:14:13,704] INFO shutting down (kafka.server.KafkaServer)

Additional information

solution.example.yaml.txt

PreBind plugin "VolumeBinding" timed out waiting for the condition

Problem

Deployed Kubernetes v1.23 with CRI-O as the container runtime. But when deploying the latest CORTX, all the pods failed to be scheduled due the "VolumeBining" time out error. Search for a while but still could resolve. Any suggestion would be appreciated!

Expected behavior

Should schedule the pods successfully.

How to reproduce

Centos 7, Kubernetes and CRI-O both v1.23, CORTX 0.5.0 (latest).

# install cri-o
OS=CentOS_7
VERSION=1.23
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/devel:kubic:libcontainers:stable.repo
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo
sudo yum install cri-o -y
 
# Install Kubernetes, specify Version as CRI-O
yum install -y kubelet-1.23.0-0 kubeadm-1.23.0-0 kubectl-1.23.0-0 --disableexcludes=kubernetes

The complete installation guide is here.

CORTX on Kubernetes version

v0.5.0

Deployment information

Kubernetes version: 1.23
kubectl version: 1.23

Solution configuration file YAML

solution:
  namespace: default
  deployment_type: standard
  secrets:
    name: cortx-secret
    content:
      kafka_admin_secret: null
      consul_admin_secret: null
      common_admin_secret: null
      s3_auth_admin_secret: Cortx123!
      csm_auth_admin_secret: Cortx123!
      csm_mgmt_admin_secret: Cortx123!
  images:
    cortxcontrol: ghcr.io/seagate/cortx-all:2.0.0-756
    cortxdata: ghcr.io/seagate/cortx-all:2.0.0-756
    cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-756
    cortxha: ghcr.io/seagate/cortx-all:2.0.0-756
    cortxclient: ghcr.io/seagate/cortx-all:2.0.0-756
    consul: ghcr.io/seagate/consul:1.11.4
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
    zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    storage_provisioner_path: /mnt/fs-local-volume
    container_path:
      local: /etc/cortx
      log: /etc/cortx/log
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      max_start_timeout: 240
      extra_configuration: ""
    motr:
      num_client_inst: 0
      start_port_num: 29000
      extra_configuration: ""
    hax:
      protocol: https
      service_name: cortx-hax-svc
      port_num: 22003
    storage_sets:
      name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
    external_services:
      s3:
        type: NodePort
        count: 1
        ports:
          http: 80
          https: 443
        nodePorts:
          http: null
          https: null
      control:
        type: NodePort
        ports:
          https: 8081
        nodePorts:
          https: null
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
        client:
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1
  storage:
    cvg1:
      name: cvg-01
      type: ios
      devices:
        metadata:
          device: /dev/sdc
          size: 5Gi
        data:
          d1:
            device: /dev/sdd
            size: 5Gi
          d2:
            device: /dev/sde
            size: 5Gi
  nodes:
    node1:
      name: node-1
    node2:
      name: node-2

Logs

logs-cortx.zip

Additional information

All the pods are not ready

[root@node-1 k8_cortx_cloud]# kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
cortx-consul-client-85gbv   0/1     Running   0          58m
cortx-consul-client-kx58q   0/1     Running   0          58m
cortx-consul-server-0       0/1     Pending   0          58m
cortx-consul-server-1       0/1     Pending   0          58m
cortx-kafka-0               0/1     Pending   0          58m
cortx-kafka-1               0/1     Pending   0          58m
cortx-zookeeper-0           0/1     Pending   0          58m
cortx-zookeeper-1           0/1     Pending   0          58m

Checking the log for consul pod, the error seemed to be VoumeBining causing causing the scheduling to fail

 Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  3m25s (x5 over 43m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition

The same error appeared kafka pod, and others

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  3m25s (x5 over 43m)  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition

Update in the k8s installation document

I was following the cortx-k8s-installation and encountered an error at step 3.1

3.1 Clone Cortx-K8s framework

git clone -b stable https://github.com/Seagate/cortx-k8s.git

There is no stable branch to this repository.

@johnbent fixed this earlier in this document.

[Solved] cortx serve pod failed for during deployment

Problem

cortx-server-0 and cortx-data-g0-0 could not be initialized during deployment.

Expected behavior

Should be able to deploy. [See the new comment below]

How to reproduce

Can follow the commands from this guide https://github.com/faradawn/tutorials/blob/main/linux/cortx/README.md

CORTX on Kubernetes version

main, v0.10.0
cortx_k8s commit id: 7a920e1
Date: Mon Aug 15 17:02:52 2022 -0700

Deployment information

Kubernetes version: v1.24
kubectl version: v1.24

Solution configuration file YAML

solution:
  namespace: default
  deployment_type: standard
  secrets:
    name: cortx-secret
    content:
      kafka_admin_secret: null
      consul_admin_secret: null
      common_admin_secret: null
      s3_auth_admin_secret: null
      csm_auth_admin_secret: null
      csm_mgmt_admin_secret: Cortx123!
  images:
    cortxcontrol: ghcr.io/seagate/cortx-control:2.0.0-895
    cortxdata: ghcr.io/seagate/cortx-data:2.0.0-895
    cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-895
    cortxha: ghcr.io/seagate/cortx-control:2.0.0-895
    cortxclient: ghcr.io/seagate/cortx-data:2.0.0-895
    consul: ghcr.io/seagate/consul:1.11.4
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
    zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    storage_provisioner_path: /mnt/fs-local-volume
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      max_start_timeout: 240
      instances_per_node: 1
      extra_configuration: ""
    motr:
      num_client_inst: 0
      extra_configuration: ""
    hax:
      protocol: https
      port_num: 22003
    external_services:
      s3:
        type: NodePort
        count: 1
        ports:
          http: 80
          https: 443
        nodePorts:
          http: null
          https: null
      control:
        type: NodePort
        ports:
          https: 8081
        nodePorts:
          https: null
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 200Mi
              cpu: 200m
            limits:
              memory: 500Mi
              cpu: 500m
        client:
          resources:
            requests:
              memory: 200Mi
              cpu: 200m
            limits:
              memory: 500Mi
              cpu: 500m
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1000m
      hare:
        hax:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 2Gi
              cpu: 1000m
      data:
        motr:
          resources:
            requests:
              memory: 1Gi
              cpu: 250m
            limits:
              memory: 2Gi
              cpu: 1000m
        confd:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 512Mi
              cpu: 500m
      server:
        rgw:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 2Gi
              cpu: 2000m
      control:
        agent:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 256Mi
              cpu: 500m
      ha:
        fault_tolerance:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 1Gi
              cpu: 500m
        health_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 1Gi
              cpu: 500m
        k8s_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu: 250m
            limits:
              memory: 1Gi
              cpu: 500m
  storage_sets:
    - name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
      container_group_size: 1
      nodes:
        - sky-2.novalocal
      storage:
        - name: cvg-01
          type: ios
          devices:
            metadata:
              - path: /dev/loop1
                size: 5Gi
            data:
              - path: /dev/loop2
                size: 5Gi

Logs

Get all pods

[cc@sky-2 k8_cortx_cloud]$ kc get pods -A -o wide
NAMESPACE            NAME                                      READY   STATUS     RESTARTS      AGE   IP            NODE              NOMINATED NODE   READINESS GATES
default              cortx-consul-client-926b4                 1/1     Running    0             14m   10.85.0.26    sky-2.novalocal   <none>           <none>
default              cortx-consul-server-0                     1/1     Running    0             14m   10.85.0.32    sky-2.novalocal   <none>           <none>
default              cortx-control-f4b57d4dd-c486j             1/1     Running    0             14m   10.85.0.25    sky-2.novalocal   <none>           <none>
default              cortx-data-g0-0                           0/3     Init:0/2   0             14m   <none>        sky-2.novalocal   <none>           <none>
default              cortx-ha-56fb4b495-ptrps                  3/3     Running    0             14m   10.85.0.35    sky-2.novalocal   <none>           <none>
default              cortx-kafka-0                             1/1     Running    0             14m   10.85.0.37    sky-2.novalocal   <none>           <none>
default              cortx-server-0                            0/2     Init:0/1   0             14m   10.85.0.33    sky-2.novalocal   <none>           <none>
default              cortx-zookeeper-0                         1/1     Running    0             14m   10.85.0.36    sky-2.novalocal   <none>           <none>
kube-system          coredns-5769f8787-l4vzb                   1/1     Running    0             61s   10.85.0.38    sky-2.novalocal   <none>           <none>
kube-system          coredns-5769f8787-rrqxn                   1/1     Running    0             61s   10.85.0.39    sky-2.novalocal   <none>           <none>
kube-system          etcd-sky-2.novalocal                      1/1     Running    0             74m   10.52.2.232   sky-2.novalocal   <none>           <none>
kube-system          kube-apiserver-sky-2.novalocal            1/1     Running    1 (27m ago)   74m   10.52.2.232   sky-2.novalocal   <none>           <none>
kube-system          kube-controller-manager-sky-2.novalocal   1/1     Running    5 (16m ago)   74m   10.52.2.232   sky-2.novalocal   <none>           <none>
kube-system          kube-proxy-ksjjr                          1/1     Running    0             73m   10.52.2.232   sky-2.novalocal   <none>           <none>
kube-system          kube-scheduler-sky-2.novalocal            1/1     Running    6 (16m ago)   73m   10.52.2.232   sky-2.novalocal   <none>           <none>
local-path-storage   local-path-provisioner-7f45fdfb8-86rz6    1/1     Running    0             54m   10.85.0.4     sky-2.novalocal   <none>           <none>

Describe server-pod

[cc@sky-2 k8_cortx_cloud]$ kc describe pod  cortx-server-0
Name:         cortx-server-0
Namespace:    default
Priority:     0
Node:         sky-2.novalocal/10.52.2.232

Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-cortx-server-0
    ReadOnly:   false
  cortx-configuration:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cortx
    Optional:  false
  cortx-ssl-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cortx-ssl-cert
    Optional:  false
  configuration-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cortx-secret
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  14m   default-scheduler  Successfully assigned default/cortx-server-0 to sky-2.novalocal
  Normal  Pulled     14m   kubelet            Container image "ghcr.io/seagate/cortx-rgw:2.0.0-895" already present on machine
  Normal  Created    14m   kubelet            Created container cortx-setup
  Normal  Started    14m   kubelet            Started container cortx-setup

Tried to get lods

[cc@sky-2 k8_cortx_cloud]$ kc logs cortx-server-0
Defaulted container "cortx-hax" out of: cortx-hax, cortx-rgw, cortx-setup (init)
Error from server (BadRequest): container "cortx-hax" in pod "cortx-server-0" is waiting to start: PodInitializing

Additional information

Applied Calico with IPIP and 192.168.0.0
Tried to restart core-dns pods, but of no avail
Thanks in advance!

Complete "Using CORTX on Kubernetes" in root README

The https://github.com/Seagate/cortx-k8s/tree/integration#using-cortx-on-kubernetes section needs to be completed to have a quick, high-level usage guide for interacting with CORTX once it is deployed. A deeper-dive can be linked off to another document in the repository for a comprehensive user guide once deployed, but we should have some level of a smoke test under the root README quick starts section.

Add "Contribution" section to root README

We need to update the root README.md to include a "Contribution" section detailing the expectations on PRs and project coordination, taking guidance from existing CORTX repositories as available.

A 12-data pods deployment encountered HA deployment timeout

Problem

Tried to deploy CORTX with 12 data pods on a 8-node Kubernetes cluster, but encounter a HA-deployment timeout error. However, think that deployed successfully two days ago. But now, tried two times, all encountered the HA timeout error. Wondered may I ask for some help?

Expected behavior

Should be able to deploy 12 data pods, as 15 disks are available besides the 1 for system and 1 for fs-local-volumn. In addition, think that I deployed it successfully once.

How to reproduce

Can follow this deployment script:
https://github.com/faradawn/tutorials/blob/main/linux/cortx/kube.sh

CORTX on Kubernetes version

v0.6.0

Deployment information

Kubernetes version: v1.24.0
kubectl version: v1.24.0
Container runtime: CRI-O

Solution configuration file YAML

solution:
  namespace: default
  deployment_type: standard
  secrets:
    name: cortx-secret
    content:
      kafka_admin_secret: null
      consul_admin_secret: null
      common_admin_secret: null
      s3_auth_admin_secret: null
      csm_auth_admin_secret: null
      csm_mgmt_admin_secret: Cortx123!
  images:
    cortxcontrol: ghcr.io/seagate/cortx-control:2.0.0-803
    cortxdata: ghcr.io/seagate/cortx-data:2.0.0-803
    cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-803
    cortxha: ghcr.io/seagate/cortx-control:2.0.0-803
    cortxclient: ghcr.io/seagate/cortx-data:2.0.0-803
    consul: ghcr.io/seagate/consul:1.11.4
    kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
    zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
    rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
    busybox: ghcr.io/seagate/busybox:latest
  common:
    storage_provisioner_path: /mnt/fs-local-volume
    container_path:
      local: /etc/cortx
      log: /etc/cortx/log
    s3:
      default_iam_users:
        auth_admin: "sgiamadmin"
        auth_user: "user_name"
        #auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
      max_start_timeout: 240
      extra_configuration: ""
    motr:
      num_client_inst: 0
      start_port_num: 29000
      extra_configuration: ""
    hax:
      protocol: https
      port_num: 22003
    storage_sets:
      name: storage-set-1
      durability:
        sns: 1+0+0
        dix: 1+0+0
    external_services:
      s3:
        type: NodePort
        count: 1
        ports:
          http: 80
          https: 443
        nodePorts:
          http: null
          https: null
      control:
        type: NodePort
        ports:
          https: 8081
        nodePorts:
          https: null
    resource_allocation:
      consul:
        server:
          storage: 10Gi
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
        client:
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 300Mi
              cpu: 100m
      zookeeper:
        storage_request_size: 8Gi
        data_log_dir_request_size: 8Gi
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
      kafka:
        storage_request_size: 8Gi
        resources:
          requests:
            memory: 1Gi
            cpu: 250m
          limits:
            memory: 2Gi
            cpu: 1
      hare:
        hax:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    1000m
      data:
        motr:
          resources:
            requests:
              memory: 1Gi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    1000m
        confd:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 512Mi
              cpu:    500m
      server:
        rgw:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 2Gi
              cpu:    2000m
      control:
        agent:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 256Mi
              cpu:    500m
      ha:
        fault_tolerance:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
        health_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
        k8s_monitor:
          resources:
            requests:
              memory: 128Mi
              cpu:    250m
            limits:
              memory: 1Gi
              cpu:    500m
  storage:
    cvg1:
      name: cvg-01
      type: ios
      devices:
        metadata:
          device: /dev/sdc
          size: 64Gi
        data:
          d1:
            device: /dev/sdd
            size: 64Gi
          d2:
            device: /dev/sde
            size: 64Gi
          d3:
            device: /dev/sdf
            size: 64Gi
          d4:
            device: /dev/sdg
            size: 64Gi
          d5:
            device: /dev/sdh
            size: 64Gi
          d6:
            device: /dev/sdi
            size: 64Gi
    cvg2:
      name: cvg-02
      type: ios
      devices:
        metadata:
          device: /dev/sdk
          size: 64Gi
        data:
          d1:
            device: /dev/sdl
            size: 64Gi
          d2:
            device: /dev/sdm
            size: 64Gi
          d3:
            device: /dev/sdn
            size: 64Gi
          d4:
            device: /dev/sdo
            size: 64Gi
          d5:
            device: /dev/sdp
            size: 64Gi
          d6:
            device: /dev/sdj
            size: 64Gi
  nodes:
    node1:
      name: node-1
    node2:
      name: node-2
    node3:
      name: node-3
    node4:
      name: node-4
    node5:
      name: node-5
    node6:
      name: node-6
    node7:
      name: node-7
    node8:
      name: node-8

Logs

First, the HA pods seemed to be running fine. Here is the result of get pod all namespace:

[root@node-1 cc]# all
NAMESPACE            NAME                                       READY   STATUS    RESTARTS        AGE     IP                NODE     NOMINATED NODE   READINESS GATES
calico-apiserver     calico-apiserver-7676694b58-8r8xf          1/1     Running   0               4d22h   192.168.247.2     node-2   <none>           <none>
calico-apiserver     calico-apiserver-7676694b58-sbtlk          1/1     Running   0               4d22h   192.168.247.1     node-2   <none>           <none>
calico-system        calico-kube-controllers-68884f975d-dkjtd   1/1     Running   0               4d22h   10.85.0.2         node-2   <none>           <none>
calico-system        calico-node-497jm                          1/1     Running   0               4d22h   10.52.2.98        node-2   <none>           <none>
calico-system        calico-node-54chh                          1/1     Running   0               4d22h   10.52.3.120       node-5   <none>           <none>
calico-system        calico-node-bhcww                          1/1     Running   0               4d22h   10.52.3.25        node-6   <none>           <none>
calico-system        calico-node-fdhx6                          1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
calico-system        calico-node-h4kgm                          1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
calico-system        calico-node-k244b                          1/1     Running   0               4d22h   10.52.2.217       node-4   <none>           <none>
calico-system        calico-node-ltpzg                          1/1     Running   0               4d22h   10.52.2.200       node-8   <none>           <none>
calico-system        calico-node-wkt2h                          1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
calico-system        calico-typha-789b8bc756-4qtcr              1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
calico-system        calico-typha-789b8bc756-hl57v              1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
calico-system        calico-typha-789b8bc756-q6vvg              1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
default              cortx-consul-client-6gjs7                  1/1     Running   0               7h35m   192.168.49.211    node-6   <none>           <none>
default              cortx-consul-client-dtcb9                  1/1     Running   0               7h35m   192.168.84.152    node-1   <none>           <none>
default              cortx-consul-client-lbdxt                  1/1     Running   0               7h35m   192.168.217.98    node-4   <none>           <none>
default              cortx-consul-client-m2l7h                  1/1     Running   0               7h35m   192.168.247.55    node-2   <none>           <none>
default              cortx-consul-client-pbs28                  1/1     Running   0               7h36m   192.168.150.108   node-5   <none>           <none>
default              cortx-consul-client-q58gs                  1/1     Running   0               7h36m   192.168.227.90    node-7   <none>           <none>
default              cortx-consul-client-sfhkk                  1/1     Running   0               7h36m   192.168.139.109   node-3   <none>           <none>
default              cortx-consul-client-wvvg6                  1/1     Running   0               7h35m   192.168.144.185   node-8   <none>           <none>
default              cortx-consul-server-0                      1/1     Running   0               7h34m   192.168.217.104   node-4   <none>           <none>
default              cortx-consul-server-1                      1/1     Running   0               7h35m   192.168.150.101   node-5   <none>           <none>
default              cortx-consul-server-2                      1/1     Running   0               7h36m   192.168.139.96    node-3   <none>           <none>
default              cortx-control-5fd7bb76f7-8gcrm             1/1     Running   0               7h34m   192.168.144.132   node-8   <none>           <none>
default              cortx-data-node-1-84f75868fd-z76l7         4/4     Running   0               7h33m   192.168.84.149    node-1   <none>           <none>
default              cortx-data-node-2-7bd5bf54b7-nsvx2         4/4     Running   0               7h33m   192.168.247.54    node-2   <none>           <none>
default              cortx-data-node-3-599d7f746-llkg8          4/4     Running   0               7h33m   192.168.139.108   node-3   <none>           <none>
default              cortx-data-node-4-7b8c6bf545-gn62n         4/4     Running   0               7h33m   192.168.217.108   node-4   <none>           <none>
default              cortx-data-node-5-56fb948c74-25r9v         4/4     Running   0               7h33m   192.168.150.100   node-5   <none>           <none>
default              cortx-data-node-6-86c94c46f-nfdgc          4/4     Running   0               7h33m   192.168.49.216    node-6   <none>           <none>
default              cortx-data-node-7-59668fd6fd-5wp5r         4/4     Running   0               7h33m   192.168.227.98    node-7   <none>           <none>
default              cortx-data-node-8-5dd6b5c5ff-dmrqf         4/4     Running   0               7h33m   192.168.144.191   node-8   <none>           <none>
default              cortx-ha-775dcbd84b-7tqdv                  3/3     Running   0               7h25m   192.168.144.182   node-8   <none>           <none>
default              cortx-kafka-0                              1/1     Running   1 (7h37m ago)   7h37m   192.168.217.92    node-4   <none>           <none>
default              cortx-kafka-1                              1/1     Running   0               7h37m   192.168.49.213    node-6   <none>           <none>
default              cortx-kafka-2                              1/1     Running   0               7h37m   192.168.150.107   node-5   <none>           <none>
default              cortx-server-node-1-576c5d794c-xd5r6       2/2     Running   0               7h30m   192.168.84.150    node-1   <none>           <none>
default              cortx-server-node-2-6987744f59-96xdd       2/2     Running   0               7h30m   192.168.247.52    node-2   <none>           <none>
default              cortx-server-node-3-7bbdddd479-xdfqt       2/2     Running   0               7h30m   192.168.139.106   node-3   <none>           <none>
default              cortx-server-node-4-5c94fc889c-rl8jj       2/2     Running   0               7h30m   192.168.217.107   node-4   <none>           <none>
default              cortx-server-node-5-5b75d49b67-vjx8q       2/2     Running   0               7h30m   192.168.150.109   node-5   <none>           <none>
default              cortx-server-node-6-76c5dddc4c-d74bw       2/2     Running   0               7h30m   192.168.49.218    node-6   <none>           <none>
default              cortx-server-node-7-797df6dc67-9s4dv       2/2     Running   0               7h30m   192.168.227.96    node-7   <none>           <none>
default              cortx-server-node-8-78858c774f-mzhhl       2/2     Running   0               7h30m   192.168.144.189   node-8   <none>           <none>
default              cortx-zookeeper-0                          1/1     Running   0               7h37m   192.168.144.171   node-8   <none>           <none>
default              cortx-zookeeper-1                          1/1     Running   0               7h37m   192.168.139.98    node-3   <none>           <none>
default              cortx-zookeeper-2                          1/1     Running   0               7h37m   192.168.217.106   node-4   <none>           <none>
kube-system          coredns-64455c7956-l2sbf                   1/1     Running   0               4d17h   192.168.217.65    node-4   <none>           <none>
kube-system          coredns-64455c7956-zb5nl                   1/1     Running   0               4d17h   192.168.150.66    node-5   <none>           <none>
kube-system          etcd-node-1                                1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-apiserver-node-1                      1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-controller-manager-node-1             1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-proxy-6kfz7                           1/1     Running   0               4d22h   10.52.0.72        node-7   <none>           <none>
kube-system          kube-proxy-f5b4h                           1/1     Running   0               4d22h   10.52.2.217       node-4   <none>           <none>
kube-system          kube-proxy-jg5tz                           1/1     Running   0               4d22h   10.52.3.120       node-5   <none>           <none>
kube-system          kube-proxy-qgdmg                           1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
kube-system          kube-proxy-qmgd2                           1/1     Running   0               4d22h   10.52.2.98        node-2   <none>           <none>
kube-system          kube-proxy-skqk7                           1/1     Running   0               4d22h   10.52.2.200       node-8   <none>           <none>
kube-system          kube-proxy-vm8xq                           1/1     Running   0               4d22h   10.52.3.25        node-6   <none>           <none>
kube-system          kube-proxy-z8hst                           1/1     Running   0               4d22h   10.52.3.71        node-3   <none>           <none>
kube-system          kube-scheduler-node-1                      1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>
local-path-storage   local-path-provisioner-7f45fdfb8-r88fj     1/1     Running   0               4d17h   192.168.49.193    node-6   <none>           <none>
tigera-operator      tigera-operator-5fb55776df-fjhqz           1/1     Running   0               4d22h   10.52.3.226       node-1   <none>           <none>

Second, here is all the deployments. The HA deployment also seemed alright.

[root@node-1 cc]# kc get deployment --all-namespaces
NAMESPACE            NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
calico-apiserver     calico-apiserver          2/2     2            2           4d22h
calico-system        calico-kube-controllers   1/1     1            1           4d22h
calico-system        calico-typha              3/3     3            3           4d22h
default              cortx-control             1/1     1            1           7h21m
default              cortx-data-node-1         1/1     1            1           7h21m
default              cortx-data-node-2         1/1     1            1           7h21m
default              cortx-data-node-3         1/1     1            1           7h21m
default              cortx-data-node-4         1/1     1            1           7h21m
default              cortx-data-node-5         1/1     1            1           7h21m
default              cortx-data-node-6         1/1     1            1           7h21m
default              cortx-data-node-7         1/1     1            1           7h21m
default              cortx-data-node-8         1/1     1            1           7h21m
default              cortx-ha                  1/1     1            1           7h13m
default              cortx-server-node-1       1/1     1            1           7h18m
default              cortx-server-node-2       1/1     1            1           7h18m
default              cortx-server-node-3       1/1     1            1           7h18m
default              cortx-server-node-4       1/1     1            1           7h18m
default              cortx-server-node-5       1/1     1            1           7h18m
default              cortx-server-node-6       1/1     1            1           7h18m
default              cortx-server-node-7       1/1     1            1           7h18m
default              cortx-server-node-8       1/1     1            1           7h18m
kube-system          coredns                   2/2     2            2           4d22h
local-path-storage   local-path-provisioner    1/1     1            1           4d17h
tigera-operator      tigera-operator           1/1     1            1           4d22h

Finally, here is the error during deployment:

########################################################
# Deploy CORTX HA                                       
########################################################
NAME: cortx-ha-default
LAST DEPLOYED: Mon Jun 13 06:52:07 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Wait for CORTX HA to be ready.............................error: timed out waiting for the condition on deployments/cortx-ha

Deployment CORTX HA timed out after 240 seconds

Failed.  Exiting script.

Here is the disk layout:

[root@node-1 cc]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   1.8T  0 disk /mnt/fs-local-volume
sdb      8:16   0   1.8T  0 disk 
sdc      8:32   0   1.8T  0 disk 
sdd      8:48   0   1.8T  0 disk 
sde      8:64   0   1.8T  0 disk 
sdf      8:80   0   1.8T  0 disk 
sdg      8:96   0   1.8T  0 disk 
sdh      8:112  0   1.8T  0 disk 
sdi      8:128  0   1.8T  0 disk 
sdj      8:144  0   1.8T  0 disk 
sdk      8:160  0   1.8T  0 disk 
sdl      8:176  0   1.8T  0 disk 
sdm      8:192  0   1.8T  0 disk 
sdn      8:208  0   1.8T  0 disk 
sdo      8:224  0   1.8T  0 disk 
sdp      8:240  0   1.8T  0 disk 
sdq     65:0    0 372.6G  0 disk 
└─sdq1  65:1    0 372.6G  0 part /
loop0    7:0    0   1.8T  0 loop 
loop1    7:1    0   1.8T  0 loop 
loop2    7:2    0   1.8T  0 loop 
loop3    7:3    0   1.8T  0 loop 
loop4    7:4    0   1.8T  0 loop 
loop5    7:5    0   1.8T  0 loop 
loop6    7:6    0   1.8T  0 loop 
loop7    7:7    0   1.8T  0 loop 
loop8    7:8    0   1.8T  0 loop 
loop9    7:9    0   1.8T  0 loop 
loop10   7:10   0   1.8T  0 loop 
loop11   7:11   0   1.8T  0 loop 
loop12   7:12   0   1.8T  0 loop 
loop13   7:13   0   1.8T  0 loop

Additional information

Thanks in advance!

Getting permission denied while trying to setup cortx-k8s

Error while trying to follow the steps in the below document :
https://github.com/Seagate/cortx-k8s/blob/stable/doc/cortx-aws-k8s-installation.md

While trying to execute step number 2.7 I am getting the following error :

# ssh $SSH_FLAGS centos@$ClusterControlPlaneIP "kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml; kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml"
Warning: Permanently added 'XX.XXX.XX.XXX' (ED14408) to the list of known hosts.
error: error loading config file "/home/centos/.kube/config": open /home/centos/.kube/config: permission denied
error: error loading config file "/home/centos/.kube/config": open /home/centos/.kube/config: permission denied

More PR checks needed

Here is the PR check section for this repo:

It should be uniform with the rest of our repos and have many more checks in place like this:

I will create a Jira ticket and ask Amol's team to do the needful.

CSM API call returns 404

I was trying out the new v0.1.0 release with the AWS QSG (and CloudFormation for #119), and the deployment steps went smoothly.
In section 4.1, however, I can't get the CSM calls to work. The login works as expected and gives me a bearer token, but the call to s3_accounts just returns 404.

Did something with CSM change since the last release, or am I missing something simple?

cortx-minikube-k8s-installation.md

@r-wambui , can you please work with @alfhad to make sure that the awesome google doc that @alfhad created becomes a fully working document and is contributed to our GitHub and linked appropriately from an existing page?
References:
Slack Discussion
Google Doc

Please reach out to @osowski if you have any questions about either how to get the document working or how to appropriately create documentation in this repo.

Documentation updates to support user-defined PersistentVolumes

Problem

Through https://github.com/Seagate/cortx-k8s/tree/CORTX-29859_migrate_data_pods_statefulset, we are breaking the explicit linkage that the PVs and PVCs used in CORTX are mirror copies of one another. This then allows users to create their own PVs ahead of time and allow the CORTX Helm Charts to use these PVs instead.

Expected behavior

This issue should cover the documentation (and minor code) changes required to expose this functionality to the user - which for the most part entails creating PVs with the appropriate StorageClass and cortx.io/device-path k8s Label so the Data Pods PVCs will be matched correctly.

In this way, users would be able to create more Data Pods than k8s Worker Nodes, all while ensuring the PVs used for each Pod were pointing to distinct underlying paths on the local node's filesystem.

How to reproduce

n/a

CORTX on Kubernetes version

v0.7.0

Deployment information

Kubernetes version: 1.22
kubectl version: 1.22

Solution configuration file YAML

n/a

Logs

n/a

Additional information

n/a

Running `generate-cvg-yaml.sh` with verbose flag can fail

Problem

If you run generate-cvg-yaml.sh with the -v or --verbose flag it will fail in some circumstances.

Expected behavior

The script should work correctly with those flags.

How to reproduce

Pass in the -v or --verbose flag to the script. It fails, unless it's the very last argument.

Bad: ./generate-cvg-yaml.sh -v -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi"
Good: ./generate-cvg-yaml.sh -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi" -v

CORTX on Kubernetes version

af2f9f7

Deployment information

N/A

Solution configuration file YAML

No response

Logs

❯ ./generate-cvg-yaml.sh -v -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi" 
./generate-cvg-yaml.sh:  -- OPTIONS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |   NUM_CVGS=1
./generate-cvg-yaml.sh: |   NUM_DATA_DRIVES=1
./generate-cvg-yaml.sh: |   SIZE_DATA_DRIVE=5Gi
./generate-cvg-yaml.sh: |   NUM_METADATA_DRIVES=1
./generate-cvg-yaml.sh: |   SIZE_METADATA_DRIVE=5Gi
./generate-cvg-yaml.sh: |   SOLUTION_YAML=solution.yaml
./generate-cvg-yaml.sh: |   NODE_LIST_FILE=UNSET
./generate-cvg-yaml.sh: |   DEVICE_PATHS_FILE=UNSET
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh:  -- FLAGS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |   _VERBOSE=1
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh:  -- PRE-REQS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |    NODE_LIST_FILE="UNSET"
./generate-cvg-yaml.sh: NODE_LIST_FILE is a required parameter and is unset.

Notice how all the command line arguments were ignored and the default values are printed. Obviously, the args were never processed.

Good:

❯ ./generate-cvg-yaml.sh -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi" -v
./generate-cvg-yaml.sh:  -- OPTIONS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |   NUM_CVGS=2
./generate-cvg-yaml.sh: |   NUM_DATA_DRIVES=52
./generate-cvg-yaml.sh: |   SIZE_DATA_DRIVE=200Gi
./generate-cvg-yaml.sh: |   NUM_METADATA_DRIVES=1
./generate-cvg-yaml.sh: |   SIZE_METADATA_DRIVE=200Gi
./generate-cvg-yaml.sh: |   SOLUTION_YAML=solution9_106.yaml
./generate-cvg-yaml.sh: |   NODE_LIST_FILE=nodes.txt
./generate-cvg-yaml.sh: |   DEVICE_PATHS_FILE=devices.txt
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh:  -- FLAGS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |   _VERBOSE=1
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh:  -- PRE-REQS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |    NODE_LIST_FILE="nodes.txt"
./generate-cvg-yaml.sh: |    DEVICE_PATHS_FILE="devices.txt"
./generate-cvg-yaml.sh: |    YQ_AVAILABLE="/home/kpine/bin/yq"
./generate-cvg-yaml.sh:  -- PARSED PARAMETERS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |    NODE_LIST_FILE:
./generate-cvg-yaml.sh: |      node1
./generate-cvg-yaml.sh: |      node2
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: |    DEVICE_PATHS_FILE:
./generate-cvg-yaml.sh: |      /dev/sda
./generate-cvg-yaml.sh: |      /dev/sdb

Additional information

No response

seagate / cortx-k8s Goto Github PK

cortx-k8s's People

Contributors

Stargazers

Watchers

Forkers

cortx-k8s's Issues

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Error Description

Crashed Pod Description

Disk layout

Solution.yaml:

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

All pods

Kafka pod log

Additional information

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

3.1 Clone Cortx-K8s framework

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Problem

Expected behavior

How to reproduce

CORTX on Kubernetes version

Deployment information

Solution configuration file YAML

Logs

Additional information

Recommend Projects

Recommend Topics