seagate / cortx-k8s Goto Github PK
View Code? Open in Web Editor NEWCORTX Kubernetes Orchestration Repository
Home Page: https://github.com/Seagate/cortx
License: Apache License 2.0
CORTX Kubernetes Orchestration Repository
Home Page: https://github.com/Seagate/cortx
License: Apache License 2.0
We currently have a guide for deployment in AWS: https://github.com/Seagate/cortx-k8s/blob/integration/doc/cortx-aws-k8s-installation.md
It would be good to have a comparable deployment guide for GCP.
Hi,
Been following the cortx-aws-k8s-installation guide, but encountered a trouble at section 4.1.
After successfully authenticating using the CORTX credentials, I received the expected "Bearer bf7axxx" token. However, when using it to send a create-account request, I got a "404 not found" error from the S8 API:
[root@master cc]# curl -H 'Authorization: Bearer bf7a24a8aac14a8387177f548b34781f' -d '{ "account_name": "gts3account", "account_email": "[email protected]", "password": "Account1!", "access_key": "gregoryaccesskey", "secret_key": "gregorysecretkey" }' https://$CSM_IP:8081/api/v2/s3_accounts --insecure
404: Not Found
Here is how I requested the "Bearer token":
[root@master cc]# curl -v -d '{"username": "cortxadmin", "password": "Cortxadmin@123"}' https://$CSM_IP:8081/api/v2/login --insecure
* About to connect() to 10.107.201.208 port 8081 (#0)
* Trying 10.107.201.208...
* Connected to 10.107.201.208 (10.107.201.208) port 8081 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=seagate.com,O=Seagate Tech,L=Pune,C=IN
* start date: Feb 18 11:58:25 2021 GMT
* expire date: Feb 16 11:58:25 2031 GMT
* common name: seagate.com
* issuer: CN=seagate.com,O=Seagate Tech,L=Pune,C=IN
> POST /api/v2/login HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.107.201.208:8081
> Accept: */*
> Content-Length: 56
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 56 out of 56 bytes
< HTTP/1.1 200 OK
< Authorization: Bearer bf7a24a8aac14a8387177f548b34781f
< Content-Type: application/json
< Content-Length: 25
< Server: NULL
< Strict-Transport-Security: max-age=63072000; includeSubdomains
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Security-Policy: script-src 'self'; object-src 'self'
< Referrer-Policy: no-referrer, strict-origin-when-cross-origin
< Pragma: no-cache
< Expires: 0
< Cache-control: no-cache, no-store, must-revalidate, max-age=0
< Date: Mon, 21 Mar 2022 04:04:14 GMT
<
* Connection #0 to host 10.107.201.208 left intact
{"reset_password": false}[root@master cc]#
The Kubernetes cluster consists of one master and one worker (Centos 8). CORTX is deployed using the latest main branch. The baremetals are not from AWS, but from Chameleon.
pods
[root@master cc]# kubectl get pods
NAME READY STATUS RESTARTS AGE
consul-client-rv52r 1/1 Running 0 2d13h
consul-server-0 1/1 Running 0 2d13h
cortx-control-5dc5f7b6-ttbsk 1/1 Running 0 2d13h
cortx-data-node-1-6949c7c88b-8lwlw 3/3 Running 0 2d13h
cortx-ha-679b57d66b-j6vg8 3/3 Running 0 2d13h
cortx-server-node-1-5464b57b76-f2ttc 2/2 Running 0 2d13h
kafka-0 1/1 Running 0 2d13h
openldap-0 1/1 Running 0 2d13h
zookeeper-0 1/1 Running 0 2d13h
services
[root@master cc]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-dns ClusterIP 10.97.150.134 <none> 53/TCP,53/UDP 2d13h
consul-server ClusterIP None <none> 8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 2d13h
cortx-control-loadbal-svc NodePort 10.107.201.208 <none> 8081:32239/TCP 2d13h
cortx-data-clusterip-svc-node-1 ClusterIP 10.106.110.22 <none> 22003/TCP,29001/TCP,29000/TCP 2d13h
cortx-data-headless-svc-node-1 ClusterIP None <none> <none> 2d13h
cortx-ha-headless-svc ClusterIP None <none> <none> 2d13h
cortx-hax-svc ClusterIP 10.97.236.246 <none> 22003/TCP 2d13h
cortx-io-svc-0 NodePort 10.97.118.40 <none> 8000:32262/TCP,8443:30626/TCP 2d13h
cortx-server-clusterip-svc-node-1 ClusterIP 10.100.128.209 <none> 22003/TCP 2d13h
cortx-server-headless-svc-node-1 ClusterIP None <none> <none> 2d13h
cortx-server-loadbal-svc-node-1 NodePort 10.109.192.150 <none> 8000:32280/TCP,8443:31222/TCP 2d13h
kafka ClusterIP 10.108.103.229 <none> 9092/TCP 2d13h
kafka-headless ClusterIP None <none> 9092/TCP,9093/TCP 2d13h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d15h
openldap-svc ClusterIP 10.106.219.194 <none> 389/TCP 2d13h
zookeeper ClusterIP 10.109.117.186 <none> 2181/TCP,2888/TCP,3888/TCP 2d13h
zookeeper-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 2d13h
Sorry I might be a little new to CORTX -- been trying for a few hours! Would appreciate any suggestion!
Many thanks,
Faradawn
At this location (https://github.com/Seagate/cortx-k8s#using-cortx-on-kubernetes), there is a TODO item which looks like:
Using CORTX on Kubernetes
TODO Port https://seagate-systems.atlassian.net/wiki/spaces/PUB/pages/754155622/CORTX+Kubernetes+N-Pod+Deployment+and+Upgrade+Document+using+Services+Framework#5.-Understanding-Management-and-S3-Endpoints-and-configuring-External-Load-balancer-service(Optional) here or into a linked doc/readme file.
This action needs to be done and the PR doing it needs to refer back to this Issue.
Hello @john-a-fletcher . The other cortx repos have an integration set up such that new issues created in github get automatically mirrored into jira and future updates on the jira side get mirrored also to github. Note that new issues created in jira are NOT mirrored to github (i.e. we do it safely so that community things are mirrored but internal things are not).
Do you want us to set this up for this repo as well?
Update https://github.com/Seagate/cortx-k8s/tree/main#cortx-on-kubernetes-prerequisites section to include the minimal disk requirements for deploying CORTX on Kubernetes:
prereq-deploy-cortx-cloud.sh
script, it expects an entire disk that it can use.As we are able to accommodate the solution.yaml changes in our current development cycle, we need to complete the https://github.com/Seagate/cortx-k8s/tree/integration#solution-yaml-overview table for a comprehensive documentation of the solution.yaml format and structure.
The current cortx-aws-k8s-installation.md documentation does not specify which version of the https://github.com/Seagate/cortx-k8s/releases codebase is used for deployment. This causes issues since there has been a rearchitecture of Data & Server components and certain commands need to be updated to reflect which component the end user is expecting to interact with.
We need to include a specific tagged version in Step 3.1, as well as the associated tested and released cortx-all
container image for default user interaction and deployment. As well as updating the existing instructions to match the current architecture.
After deploying CORTX v0.3.0, tried to log in CMS (control service) with username cortxadmin
and password V1Acv%V8$2qL!JIP
with the following command:
curl -d '{"username": "cortxadmin", "password": "$control_password"}' https://$CSM_IP:8081/api/v2/login -k -i
But got a 401 unauthorized error:
I got an unauthorized error:
HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Server: NULL
Strict-Transport-Security: max-age=63072000; includeSubdomains
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Security-Policy: script-src 'self'; object-src 'self'
Referrer-Policy: no-referrer, strict-origin-when-cross-origin
Used to be able to login to CSM on CORTX v.0.2.0 with the above commands. Wondered did I missed something?
The CSM API should return a bearer token, instead of a 401 error.
1 - Deploy Kubernetes and CORTX
deploy kubernetes
git clone https://github.com/Seagate/cortx-k8s;
./prereq-deploy-cortx-cloud.sh -d /dev/sdb -s solution.example.yaml
kubectl taint node master node-role.kubernetes.io/master:NoSchedule-
./deploy-cortx-cloud.sh solution.example.yaml
2 - Try login CSM
control_password=$(kubectl get secrets/cortx-secret --namespace default --template={{.data.csm_mgmt_admin_secret}} | base64 -d)
export CSM_IP=`kubectl get svc cortx-control-loadbal-svc -ojsonpath='{.spec.clusterIP}'`
curl -d '{"username": "cortxadmin", "password": "$control_password"}' https://$CSM_IP:8081/api/v2/login -k -i
v.0.3.0 on integration
Kubernetes version: v1.23.5
kubectl version: latest
Cluster and Client OS: CentOS 7.8
Provider: Chameleon
Setup: four-node Kubernetes cluster, with master untainted (allowed scheduling).
# same as solution.example.yaml at integration branch
solution:
namespace: default
deployment_type: standard
secrets:
name: cortx-secret
content:
kafka_admin_secret: null
consul_admin_secret: null
common_admin_secret: null
s3_auth_admin_secret: null
csm_auth_admin_secret: null
csm_mgmt_admin_secret: null
images:
cortxcontrol: ghcr.io/seagate/cortx-all:2.0.0-725
cortxdata: ghcr.io/seagate/cortx-all:2.0.0-725
cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-725
cortxha: ghcr.io/seagate/cortx-all:2.0.0-725
cortxclient: ghcr.io/seagate/cortx-all:2.0.0-725
consul: ghcr.io/seagate/consul:1.11.4
kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r7
zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
busybox: ghcr.io/seagate/busybox:latest
common:
storage_provisioner_path: /mnt/fs-local-volume
container_path:
local: /etc/cortx
log: /etc/cortx/log
s3:
default_iam_users:
auth_admin: "sgiamadmin"
auth_user: "user_name"
#auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
max_start_timeout: 240
extra_configuration: ""
motr:
num_client_inst: 0
start_port_num: 29000
extra_configuration: ""
hax:
protocol: https
service_name: cortx-hax-svc
port_num: 22003
storage_sets:
name: storage-set-1
durability:
sns: 1+0+0
dix: 1+0+0
external_services:
s3:
type: NodePort
count: 1
ports:
http: 80
https: 443
nodePorts:
http: null
https: null
control:
type: NodePort
ports:
https: 8081
nodePorts:
https: null
resource_allocation:
consul:
server:
storage: 10Gi
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
client:
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
zookeeper:
storage_request_size: 8Gi
data_log_dir_request_size: 8Gi
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
kafka:
storage_request_size: 8Gi
log_persistence_request_size: 8Gi
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1
storage:
cvg1:
name: cvg-01
type: ios
devices:
metadata:
device: /dev/sdc
size: 5Gi
data:
d1:
device: /dev/sdd
size: 5Gi
d2:
device: /dev/sde
size: 5Gi
nodes:
node1:
name: master
node2:
name: node-1
node3:
name: node-2
node4:
name: node-3
na
Per feedback, we should rename the following elements in the root README:
Quick Starts
section to Getting Started
Quick Install Guide
to Quick Starts Guide`[Edit: solution at the end of the thread]
To Whom It May Concern,
When running the deploy-cortx-cloud.sh
script, I kept getting the error that "Kafka installation failed: time out waiting for condition."
[root@master-node k8_cortx_cloud]# ./deploy-cortx-cloud.sh solution.yaml
Validate solution file result: success
Number of worker nodes detected: 1
W0302 14:56:09.990541 9500 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0302 14:56:10.007528 9500 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: cortx-platform
LAST DEPLOYED: Wed Mar 2 14:56:09 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
"hashicorp" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "hashicorp" chart repository
Update Complete. ⎈Happy Helming!⎈
Install Rancher Local Path Provisionernamespace/local-path-storage created
serviceaccount/local-path-provisioner-service-account created
clusterrole.rbac.authorization.k8s.io/local-path-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/local-path-provisioner-bind created
deployment.apps/local-path-provisioner created
storageclass.storage.k8s.io/local-path created
configmap/local-path-config created
######################################################
# Deploy Consul
######################################################
NAME: consul
LAST DEPLOYED: Wed Mar 2 14:56:11 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing HashiCorp Consul!
Your release is named consul.
To learn more about the release, run:
$ helm status consul
$ helm get all consul
Consul on Kubernetes Documentation:
https://www.consul.io/docs/platform/k8s
Consul on Kubernetes CLI Reference:
https://www.consul.io/docs/k8s/k8s-cli
serviceaccount/consul-client patched
serviceaccount/consul-server patched
statefulset.apps/consul-server restarted
daemonset.apps/consul-client restarted
######################################################
# Deploy openLDAP
######################################################
NAME: openldap
LAST DEPLOYED: Wed Mar 2 14:56:36 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Wait for openLDAP PODs to be ready..............
===========================================================
Setup OpenLDAP replication
===========================================================
######################################################
# Deploy Zookeeper
######################################################
"bitnami" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Registry: ghcr.io
Repository: seagate/zookeeper
Tag: 3.7.0-debian-10-r182
NAME: zookeeper
LAST DEPLOYED: Wed Mar 2 14:56:57 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: zookeeper
CHART VERSION: 8.1.1
APP VERSION: 3.7.0
** Please be patient while the chart is being deployed **
ZooKeeper can be accessed via port 2181 on the following DNS name from within your cluster:
zookeeper.default.svc.cluster.local
To connect to your ZooKeeper server run the following commands:
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=zookeeper,app.kubernetes.io/instance=zookeeper,app.kubernetes.io/component=zookeeper" -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $POD_NAME -- zkCli.sh
To connect to your ZooKeeper server from outside the cluster execute the following commands:
kubectl port-forward --namespace default svc/zookeeper 2181: &
zkCli.sh 127.0.0.1:2181
Wait for Zookeeper to be ready before starting kafka
######################################################
# Deploy Kafka
######################################################
Registry: ghcr.io
Repository: seagate/kafka
Tag: 3.0.0-debian-10-r7
Error: INSTALLATION FAILED: timed out waiting for the condition
Wait for CORTX 3rd party to be ready.....................................................
Here is a description of the crashed Kafka pod:
[root@master-node cc]# kubectl get pod
NAME READY STATUS RESTARTS AGE
consul-client-cs7b7 0/1 Running 0 3h8m
consul-server-0 1/1 Running 0 3h8m
kafka-0 0/1 CrashLoopBackOff 40 (44s ago) 3h7m
openldap-0 1/1 Running 0 3h8m
zookeeper-0 1/1 Running 0 3h7m
[root@master-node cc]# kubectl describe pod kafka
Name: kafka-0
Namespace: default
Priority: 0
Node: worker-node-1/10.52.1.106
Start Time: Wed, 02 Mar 2022 14:57:30 +0000
Labels: app.kubernetes.io/component=kafka
app.kubernetes.io/instance=kafka
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kafka
controller-revision-hash=kafka-866fd78b49
helm.sh/chart=kafka-15.3.4
statefulset.kubernetes.io/pod-name=kafka-0
Annotations: <none>
Status: Running
IP: 10.32.0.7
IPs:
IP: 10.32.0.7
Controlled By: StatefulSet/kafka
Containers:
kafka:
Container ID: docker://fdd090e633af20142df15e3d69869c38317e654d37081b3c349e729e076c8563
Image: ghcr.io/seagate/kafka:3.0.0-debian-10-r7
Image ID: docker-pullable://ghcr.io/seagate/kafka@sha256:91155a01d7dc9de2e3909002b3c9fa308c8124d525de88e2acd55f1b95a8341d
Ports: 9092/TCP, 9093/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/scripts/setup.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 02 Mar 2022 18:03:58 +0000
Finished: Wed, 02 Mar 2022 18:04:08 +0000
Ready: False
Restart Count: 40
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 250m
memory: 1Gi
Liveness: tcp-socket :kafka-client delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: tcp-socket :kafka-client delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
MY_POD_IP: (v1:status.podIP)
MY_POD_NAME: kafka-0 (v1:metadata.name)
KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper.default.svc.cluster.local
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT
KAFKA_CFG_LISTENERS: INTERNAL://:9093,CLIENT://:9092
KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9093,CLIENT://$(MY_POD_NAME).kafka-headless.default.svc.cluster.local:9092
ALLOW_PLAINTEXT_LISTENER: yes
KAFKA_VOLUME_DIR: /bitnami/kafka
KAFKA_LOG_DIR: /opt/bitnami/kafka/logs
KAFKA_CFG_DELETE_TOPIC_ENABLE: true
KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: true
KAFKA_HEAP_OPTS: -Xmx1024m -Xms1024m
KAFKA_CFG_LOG_FLUSH_INTERVAL_MESSAGES: 10000
KAFKA_CFG_LOG_FLUSH_INTERVAL_MS: 1000
KAFKA_CFG_LOG_RETENTION_BYTES: 1073741824
KAFKA_CFG_LOG_RETENTION_CHECK_INTERVALS_MS: 300000
KAFKA_CFG_LOG_RETENTION_HOURS: 168
KAFKA_CFG_MESSAGE_MAX_BYTES: 1000012
KAFKA_CFG_LOG_SEGMENT_BYTES: 1073741824
KAFKA_CFG_LOG_DIRS: /bitnami/kafka/data
KAFKA_CFG_DEFAULT_REPLICATION_FACTOR: 1
KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR: 2
KAFKA_CFG_NUM_IO_THREADS: 8
KAFKA_CFG_NUM_NETWORK_THREADS: 3
KAFKA_CFG_NUM_PARTITIONS: 1
KAFKA_CFG_NUM_RECOVERY_THREADS_PER_DATA_DIR: 1
KAFKA_CFG_SOCKET_RECEIVE_BUFFER_BYTES: 102400
KAFKA_CFG_SOCKET_REQUEST_MAX_BYTES: 104857600
KAFKA_CFG_SOCKET_SEND_BUFFER_BYTES: 102400
KAFKA_CFG_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 6000
KAFKA_CFG_AUTHORIZER_CLASS_NAME:
KAFKA_CFG_ALLOW_EVERYONE_IF_NO_ACL_FOUND: true
KAFKA_CFG_SUPER_USERS: User:admin
KAFKA_CFG_LOG_SEGMENT_DELETE_DELAY_MS: 1000
KAFKA_CFG_LOG_FLUSH_OFFSET_CHECKPOINT_INTERVAL_MS: 1000
KAFKA_CFG_LOG_RETENTION_CHECK_INTERVAL_MS: 1000
Mounts:
/bitnami/kafka from data (rw)
/opt/bitnami/kafka/logs from logs (rw)
/scripts/setup.sh from scripts (rw,path="setup.sh")
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-kafka-0
ReadOnly: false
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kafka-scripts
Optional: false
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 29m (x35 over 3h8m) kubelet Container image "ghcr.io/seagate/kafka:3.0.0-debian-10-r7" already present on machine
Warning BackOff 4m19s (x847 over 3h8m) kubelet Back-off restarting failed container
Repartitioned the disks and rebooted the server many times, but still couldn't get over the Kafka deployment issue. Wondered may I ask for some help on what the issue might be?
Below is my disk layout, and I ran ./prereq-deploy-cortx-cloud.sh /dev/sdb1
with the disk parameter as /dev/sdb1
.
[root@master-node cc]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
sdf 8:80 0 1.8T 0 disk
sdg 8:96 0 1.8T 0 disk
sdh 8:112 0 1.8T 0 disk
sdi 8:128 0 1.8T 0 disk
sdj 8:144 0 1.8T 0 disk
sdk 8:160 0 1.8T 0 disk
sdl 8:176 0 1.8T 0 disk
sdm 8:192 0 1.8T 0 disk
sdn 8:208 0 1.8T 0 disk
sdo 8:224 0 1.8T 0 disk
sdp 8:240 0 1.8T 0 disk
sdq 65:0 0 372.6G 0 disk
└─sdq1 65:1 0 372.6G 0 part /
solution:
namespace: default
secrets:
name: cortx-secret
content:
openldap_admin_secret: seagate1
kafka_admin_secret: Seagate@123
consul_admin_secret: Seagate@123
common_admin_secret: Seagate@123
s3_auth_admin_secret: ldapadmin
csm_auth_admin_secret: seagate2
csm_mgmt_admin_secret: Cortxadmin@123
images:
cortxcontrol: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
cortxdata: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
cortxserver: cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-120-custom-ci
cortxha: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
cortxclient: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-2192-custom-ci
openldap: ghcr.io/seagate/symas-openldap:2.4.58
consul: ghcr.io/seagate/consul:1.10.0
kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r7
zookeeper: ghcr.io/seagate/zookeeper:3.7.0-debian-10-r182
rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
busybox: ghcr.io/seagate/busybox:latest
common:
setup_size: large
storage_provisioner_path: /mnt/fs-local-volume
container_path:
local: /etc/cortx
shared: /share
log: /etc/cortx/log
s3:
default_iam_users:
auth_admin: "sgiamadmin"
auth_user: "user_name"
#auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
num_inst: 2
start_port_num: 28051
max_start_timeout: 240
motr:
num_client_inst: 0
start_port_num: 29000
hax:
protocol: https
service_name: cortx-hax-svc
port_num: 22003
storage_sets:
name: storage-set-1
durability:
sns: 1+0+0
dix: 1+0+0
external_services:
type: LoadBalancer
resource_allocation:
consul:
server:
storage: 10Gi
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
client:
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
openldap:
resources:
requests:
memory: 1Gi
cpu: 2
limits:
memory: 1Gi
cpu: 2
zookeeper:
storage_request_size: 8Gi
data_log_dir_request_size: 8Gi
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
kafka:
storage_request_size: 8Gi
log_persistence_request_size: 8Gi
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1
storage:
cvg1:
name: cvg-01
type: ios
devices:
metadata:
device: /dev/sdh
size: 5Gi
data:
d1:
device: /dev/sdi
size: 5Gi
nodes:
node1:
name: worker-node-1
Sorry I was a little new, and had been trying for a few days. Anything suggestion would help!
Thanks in advance!
I'm using the Kubernetes on AWS QSG while working on Seagate/cortx#1381, but I'm running into trouble with the CORTX deployment. There were a couple of minor issues:
generate-cvg-yaml.sh
depends on yq
, I just grabbed the latest release from GitHubstable
branch, so I just used main
The more serious issue is during deploy-cortx-cloud.sh
. I keep getting a timeout and failure around the "Deploy CORTX Server" phase, and I can see the data and server pods crashing and restarting. Not sure if this is due to changes in the deployment scripts or mismatched component versions. Any help getting this working would be greatly appreciated!
As per the instructions in the Readme when tried to execute deploy-cortx-cloud.sh got connection refused error as shown below:
[root@dora k8_cortx_cloud]# ./deploy-cortx-cloud.sh
Validate solution file result: success
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Number of worker nodes detected: 1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Can't deploy CORTX cloud.
List of nodes don't exist in the cluster:
- dora
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused
"hashicorp" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "hashicorp" chart repository
Update Complete. ⎈Happy Helming!⎈
Install Rancher Local Path ProvisionerThe connection to the server localhost:8080 was refused - did you specify the right host or port?
######################################################
# Deploy Consul
######################################################
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The connection to the server localhost:8080 was refused - did you specify the right host or port?
######################################################
# Deploy openLDAP
######################################################
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused
Wait for openLDAP PODs to be readyThe connection to the server localhost:8080 was refused - did you specify the right host or port?
.The connection to the server localhost:8080 was refused - did you specify the right host or port?
.The connection to the server localhost:8080 was refused - did you specify the right host or port?
firewalld was off and SElinux was in permissive mode as shown below:
[root@dora k8_cortx_cloud]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2022-01-28 14:13:20 IST; 50min ago
Docs: man:firewalld(1)
Main PID: 853 (code=exited, status=0/SUCCESS)
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-2' failed: iptables: No chain/targe... that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:39 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
Jan 28 12:27:40 dora firewalld[853]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (d...at chain?).
Jan 28 14:13:17 dora systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jan 28 14:13:20 dora systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.
[root@dora k8_cortx_cloud]#
[root@dora k8_cortx_cloud]#
[root@dora k8_cortx_cloud]#
[root@dora k8_cortx_cloud]# getenforce
Permissive
Release v0.3.0 introduced some changes that affects the current AWS deployment guide. Here are some known cases:
There may be other changes needed, such as interaction with CSM and IAM usage. The guide and templates refer to v0.0.22. It is likely out of date after the switch to RGW S3 (v0.1.0). The guide and templates should be audited and updated as needed.
solution.common.setup_size
solution.yaml settingsolution.images.openldap
solution.yaml settingsolution.yaml
with solution.example.yaml
prereq-deploy-cortx-cloud.sh
script (script args changed)N/A
v0.3.0
AWS CloudFormation
No response
No response
Related to:
README.md currently points to https://github.com/Seagate/cortx-k8s/blob/UDX-6683_move_documentation_to_readme_md/doc/cortx-aws-k8s-installation.md
It should point to https://github.com/Seagate/cortx-k8s/blob/main/doc/cortx-aws-k8s-installation.md instead
While trying to follow CORTX Kubernetes N-Pod Deployment and Upgrade Document (using Services Framework) I encountered a unreachable link.
Unable to form Kubernetes cluster as the below is inaccessible
http://eos-jenkins.mero.colo.seagate.com/job/Cortx-kubernetes/job/setup-kubernetes-cluster/
During the deployment of CORTX, Kafka pods kept crashing and looping.
Should be able to deploy CORTX successfully, as Rick did it once with CRI-O.
Can run the following script directly on CENTOS7
source <(curl -s https://raw.githubusercontent.com/faradawn/tutorials/main/linux/cortx/kube.sh)
Where the link to the deployment script is here.
Thanks again for taking a look!
v0.6.0
Kubernetes version: v1.23.0
kubectl version: v1.23.0
Attached below and here is a summary:
- only had node-1 and node-2
- master node is node-1, which is untainted
- storage only had sdc, sdd, sde
[root@node-1 cc]# kc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-68444c48d5-9f7hl 1/1 Running 0 7h38m
calico-apiserver calico-apiserver-68444c48d5-nhrbb 1/1 Running 0 7h38m
calico-system calico-kube-controllers-69cfd64db4-gvswf 1/1 Running 0 7h39m
calico-system calico-node-dfqfj 1/1 Running 0 7h39m
calico-system calico-node-llqxx 1/1 Running 0 7h39m
calico-system calico-typha-7c59c5d99c-flv5m 1/1 Running 0 7h39m
default cortx-consul-client-7r776 1/1 Running 0 7h36m
default cortx-consul-client-x6j8w 0/1 Running 0 7h36m
default cortx-consul-server-0 1/1 Running 0 7h36m
default cortx-consul-server-1 1/1 Running 0 7h36m
default cortx-kafka-0 0/1 CrashLoopBackOff 162 (28s ago) 7h36m
default cortx-kafka-1 0/1 CrashLoopBackOff 99 (4m3s ago) 7h36m
default cortx-zookeeper-0 1/1 Running 0 7h36m
default cortx-zookeeper-1 1/1 Running 0 7h36m
kube-system coredns-64897985d-9qn5m 1/1 Running 0 7h40m
kube-system coredns-64897985d-z8t5b 1/1 Running 0 7h40m
kube-system etcd-node-1 1/1 Running 0 7h40m
kube-system kube-apiserver-node-1 1/1 Running 0 7h40m
kube-system kube-controller-manager-node-1 1/1 Running 0 7h40m
kube-system kube-proxy-7hpgl 1/1 Running 0 7h40m
kube-system kube-proxy-m4fcz 1/1 Running 0 7h39m
kube-system kube-scheduler-node-1 1/1 Running 0 7h40m
local-path-storage local-path-provisioner-756898894-bgxgk 1/1 Running 0 7h36m
tigera-operator tigera-operator-7d8c9d4f67-69rlk 1/1 Running 0 7h40m
[root@node-1 k8_cortx_cloud]# kc logs cortx-kafka-0
kafka 12:13:50.75
kafka 12:13:50.75 Welcome to the Bitnami kafka container
kafka 12:13:50.76 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 12:13:50.76 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 12:13:50.76
kafka 12:13:50.76 INFO ==> ** Starting Kafka setup **
kafka 12:13:50.82 WARN ==> You set the environment variable ALLOW_PLAINTEXT_LISTENER=yes. For safety reasons, do not use this flag in a production environment.
kafka 12:13:50.83 INFO ==> Initializing Kafka...
kafka 12:13:50.84 INFO ==> No injected configuration files found, creating default config files
kafka 12:13:51.13 INFO ==> Configuring Kafka for inter-broker communications with PLAINTEXT authentication.
kafka 12:13:51.14 WARN ==> Inter-broker communications are configured as PLAINTEXT. This is not safe for production environments.
kafka 12:13:51.14 INFO ==> Configuring Kafka for client communications with PLAINTEXT authentication.
kafka 12:13:51.14 WARN ==> Client communications are configured using PLAINTEXT listeners. For safety reasons, do not use this in a production environment.
kafka 12:13:51.16 INFO ==> ** Kafka setup finished! **
kafka 12:13:51.19 INFO ==> ** Starting Kafka **
[2022-06-04 12:13:52,698] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2022-06-04 12:13:53,334] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2022-06-04 12:13:53,524] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2022-06-04 12:13:53,526] INFO starting (kafka.server.KafkaServer)
[2022-06-04 12:13:53,527] INFO Connecting to zookeeper on cortx-zookeeper (kafka.server.KafkaServer)
[2022-06-04 12:13:53,541] INFO [ZooKeeperClient Kafka server] Initializing a new session to cortx-zookeeper. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:13:53,545] INFO Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:host.name=cortx-kafka-0.cortx-kafka-headless.default.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.version=11.0.14 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.vendor=BellSoft (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.home=/opt/bitnami/java (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,545] INFO Client environment:java.class.path=/opt/bitnami/kafka/bin/../libs/activation-1.1.1.jar:/opt/bitnami/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/bitnami/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/bitnami/kafka/bin/../libs/commons-cli-1.4.jar:/opt/bitnami/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/connect-api-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-basic-auth-extension-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-file-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-json-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-client-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-runtime-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/connect-transforms-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/hk2-api-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-locator-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-utils-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-annotations-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-core-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-databind-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-dataformat-csv-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-datatype-jdk8-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-base-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-json-provider-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-jaxb-annotations-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-scala_2.12-2.12.3.jar:/opt/bitnami/kafka/bin/../libs/jakarta.activation-api-1.2.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.annotation-api-1.3.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.inject-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.validation-api-2.0.2.jar:/opt/bitnami/kafka/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/opt/bitnami/kafka/bin/../libs/jakarta.xml.bind-api-2.3.2.jar:/opt/bitnami/kafka/bin/../libs/javassist-3.27.0-GA.jar:/opt/bitnami/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/bitnami/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/bitnami/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/bitnami/kafka/bin/../libs/jersey-client-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-common-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-core-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-hk2-2.34.jar:/opt/bitnami/kafka/bin/../libs/jersey-server-2.34.jar:/opt/bitnami/kafka/bin/../libs/jetty-client-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-continuation-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-http-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-io-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-security-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-server-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlet-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlets-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-ajax-9.4.43.v20210629.jar:/opt/bitnami/kafka/bin/../libs/jline-3.12.1.jar:/opt/bitnami/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/bitnami/kafka/bin/../libs/kafka-clients-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-log4j-appender-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-metadata-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-raft-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-server-common-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-shell-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-storage-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-storage-api-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-examples-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-scala_2.12-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-test-utils-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-tools-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/log4j-1.2.17.jar:/opt/bitnami/kafka/bin/../libs/lz4-java-1.7.1.jar:/opt/bitnami/kafka/bin/../libs/maven-artifact-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-4.1.12.1.jar:/opt/bitnami/kafka/bin/../libs/netty-buffer-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-codec-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-handler-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-resolver-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-epoll-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-unix-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/osgi-resource-locator-1.0.3.jar:/opt/bitnami/kafka/bin/../libs/paranamer-2.8.jar:/opt/bitnami/kafka/bin/../libs/plexus-utils-3.2.1.jar:/opt/bitnami/kafka/bin/../libs/reflections-0.9.12.jar:/opt/bitnami/kafka/bin/../libs/rocksdbjni-6.19.3.jar:/opt/bitnami/kafka/bin/../libs/scala-collection-compat_2.12-2.4.4.jar:/opt/bitnami/kafka/bin/../libs/scala-java8-compat_2.12-1.0.0.jar:/opt/bitnami/kafka/bin/../libs/scala-library-2.12.14.jar:/opt/bitnami/kafka/bin/../libs/scala-logging_2.12-3.9.3.jar:/opt/bitnami/kafka/bin/../libs/scala-reflect-2.12.14.jar:/opt/bitnami/kafka/bin/../libs/slf4j-api-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/slf4j-log4j12-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/snappy-java-1.1.8.1.jar:/opt/bitnami/kafka/bin/../libs/trogdor-3.0.0.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-jute-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/zstd-jni-1.5.0-2.jar (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.version=3.10.0-1127.19.1.el7.x86_64 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.name=1001 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.home=/ (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.free=1010MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.max=1024MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,546] INFO Client environment:os.memory.total=1024MB (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,548] INFO Initiating client connection, connectString=cortx-zookeeper sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@51972dc7 (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:13:53,552] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2022-06-04 12:13:53,557] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
[2022-06-04 12:13:53,559] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:13:59,560] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:14:13,580] ERROR Unable to resolve address: cortx-zookeeper:2181 (org.apache.zookeeper.client.StaticHostProvider)
java.net.UnknownHostException: cortx-zookeeper: Temporary failure in name resolution
at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)
at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88)
at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141)
at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207)
[2022-06-04 12:14:13,588] WARN An exception was thrown while closing send thread for session 0x0. (org.apache.zookeeper.ClientCnxn)
java.lang.IllegalArgumentException: Unable to canonicalize address cortx-zookeeper:2181 because it's not resolvable
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78)
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1161)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1210)
[2022-06-04 12:14:13,693] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2022-06-04 12:14:13,694] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2022-06-04 12:14:13,695] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2022-06-04 12:14:13,697] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:254)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:250)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:108)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1981)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:457)
at kafka.server.KafkaServer.startup(KafkaServer.scala:196)
at kafka.Kafka$.main(Kafka.scala:109)
at kafka.Kafka.main(Kafka.scala)
[2022-06-04 12:14:13,698] INFO shutting down (kafka.server.KafkaServer)
[2022-06-04 12:14:13,703] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2022-06-04 12:14:13,703] INFO shut down completed (kafka.server.KafkaServer)
[2022-06-04 12:14:13,703] ERROR Exiting Kafka. (kafka.Kafka$)
[2022-06-04 12:14:13,704] INFO shutting down (kafka.server.KafkaServer)
Deployed Kubernetes v1.23 with CRI-O as the container runtime. But when deploying the latest CORTX, all the pods failed to be scheduled due the "VolumeBining" time out error. Search for a while but still could resolve. Any suggestion would be appreciated!
Should schedule the pods successfully.
Centos 7, Kubernetes and CRI-O both v1.23, CORTX 0.5.0 (latest).
# install cri-o
OS=CentOS_7
VERSION=1.23
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/devel:kubic:libcontainers:stable.repo
sudo curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo
sudo yum install cri-o -y
# Install Kubernetes, specify Version as CRI-O
yum install -y kubelet-1.23.0-0 kubeadm-1.23.0-0 kubectl-1.23.0-0 --disableexcludes=kubernetes
The complete installation guide is here.
v0.5.0
Kubernetes version: 1.23
kubectl version: 1.23
solution:
namespace: default
deployment_type: standard
secrets:
name: cortx-secret
content:
kafka_admin_secret: null
consul_admin_secret: null
common_admin_secret: null
s3_auth_admin_secret: Cortx123!
csm_auth_admin_secret: Cortx123!
csm_mgmt_admin_secret: Cortx123!
images:
cortxcontrol: ghcr.io/seagate/cortx-all:2.0.0-756
cortxdata: ghcr.io/seagate/cortx-all:2.0.0-756
cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-756
cortxha: ghcr.io/seagate/cortx-all:2.0.0-756
cortxclient: ghcr.io/seagate/cortx-all:2.0.0-756
consul: ghcr.io/seagate/consul:1.11.4
kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
busybox: ghcr.io/seagate/busybox:latest
common:
storage_provisioner_path: /mnt/fs-local-volume
container_path:
local: /etc/cortx
log: /etc/cortx/log
s3:
default_iam_users:
auth_admin: "sgiamadmin"
auth_user: "user_name"
#auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
max_start_timeout: 240
extra_configuration: ""
motr:
num_client_inst: 0
start_port_num: 29000
extra_configuration: ""
hax:
protocol: https
service_name: cortx-hax-svc
port_num: 22003
storage_sets:
name: storage-set-1
durability:
sns: 1+0+0
dix: 1+0+0
external_services:
s3:
type: NodePort
count: 1
ports:
http: 80
https: 443
nodePorts:
http: null
https: null
control:
type: NodePort
ports:
https: 8081
nodePorts:
https: null
resource_allocation:
consul:
server:
storage: 10Gi
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
client:
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
zookeeper:
storage_request_size: 8Gi
data_log_dir_request_size: 8Gi
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
kafka:
storage_request_size: 8Gi
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1
storage:
cvg1:
name: cvg-01
type: ios
devices:
metadata:
device: /dev/sdc
size: 5Gi
data:
d1:
device: /dev/sdd
size: 5Gi
d2:
device: /dev/sde
size: 5Gi
nodes:
node1:
name: node-1
node2:
name: node-2
All the pods are not ready
[root@node-1 k8_cortx_cloud]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-85gbv 0/1 Running 0 58m
cortx-consul-client-kx58q 0/1 Running 0 58m
cortx-consul-server-0 0/1 Pending 0 58m
cortx-consul-server-1 0/1 Pending 0 58m
cortx-kafka-0 0/1 Pending 0 58m
cortx-kafka-1 0/1 Pending 0 58m
cortx-zookeeper-0 0/1 Pending 0 58m
cortx-zookeeper-1 0/1 Pending 0 58m
Checking the log for consul pod, the error seemed to be VoumeBining causing causing the scheduling to fail
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m25s (x5 over 43m) default-scheduler running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
The same error appeared kafka pod, and others
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m25s (x5 over 43m) default-scheduler running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition
I was following the cortx-k8s-installation and encountered an error at step 3.1
git clone -b stable https://github.com/Seagate/cortx-k8s.git
There is no stable branch to this repository.
cortx-server-0
and cortx-data-g0-0
could not be initialized during deployment.
Should be able to deploy. [See the new comment below]
Can follow the commands from this guide https://github.com/faradawn/tutorials/blob/main/linux/cortx/README.md
main, v0.10.0
cortx_k8s commit id: 7a920e1
Date: Mon Aug 15 17:02:52 2022 -0700
Kubernetes version: v1.24
kubectl version: v1.24
solution:
namespace: default
deployment_type: standard
secrets:
name: cortx-secret
content:
kafka_admin_secret: null
consul_admin_secret: null
common_admin_secret: null
s3_auth_admin_secret: null
csm_auth_admin_secret: null
csm_mgmt_admin_secret: Cortx123!
images:
cortxcontrol: ghcr.io/seagate/cortx-control:2.0.0-895
cortxdata: ghcr.io/seagate/cortx-data:2.0.0-895
cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-895
cortxha: ghcr.io/seagate/cortx-control:2.0.0-895
cortxclient: ghcr.io/seagate/cortx-data:2.0.0-895
consul: ghcr.io/seagate/consul:1.11.4
kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
busybox: ghcr.io/seagate/busybox:latest
common:
storage_provisioner_path: /mnt/fs-local-volume
s3:
default_iam_users:
auth_admin: "sgiamadmin"
auth_user: "user_name"
#auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
max_start_timeout: 240
instances_per_node: 1
extra_configuration: ""
motr:
num_client_inst: 0
extra_configuration: ""
hax:
protocol: https
port_num: 22003
external_services:
s3:
type: NodePort
count: 1
ports:
http: 80
https: 443
nodePorts:
http: null
https: null
control:
type: NodePort
ports:
https: 8081
nodePorts:
https: null
resource_allocation:
consul:
server:
storage: 10Gi
resources:
requests:
memory: 200Mi
cpu: 200m
limits:
memory: 500Mi
cpu: 500m
client:
resources:
requests:
memory: 200Mi
cpu: 200m
limits:
memory: 500Mi
cpu: 500m
zookeeper:
storage_request_size: 8Gi
data_log_dir_request_size: 8Gi
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
kafka:
storage_request_size: 8Gi
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
hare:
hax:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
data:
motr:
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
confd:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
server:
rgw:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 2000m
control:
agent:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 500m
ha:
fault_tolerance:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
health_monitor:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
k8s_monitor:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
storage_sets:
- name: storage-set-1
durability:
sns: 1+0+0
dix: 1+0+0
container_group_size: 1
nodes:
- sky-2.novalocal
storage:
- name: cvg-01
type: ios
devices:
metadata:
- path: /dev/loop1
size: 5Gi
data:
- path: /dev/loop2
size: 5Gi
Get all pods
[cc@sky-2 k8_cortx_cloud]$ kc get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default cortx-consul-client-926b4 1/1 Running 0 14m 10.85.0.26 sky-2.novalocal <none> <none>
default cortx-consul-server-0 1/1 Running 0 14m 10.85.0.32 sky-2.novalocal <none> <none>
default cortx-control-f4b57d4dd-c486j 1/1 Running 0 14m 10.85.0.25 sky-2.novalocal <none> <none>
default cortx-data-g0-0 0/3 Init:0/2 0 14m <none> sky-2.novalocal <none> <none>
default cortx-ha-56fb4b495-ptrps 3/3 Running 0 14m 10.85.0.35 sky-2.novalocal <none> <none>
default cortx-kafka-0 1/1 Running 0 14m 10.85.0.37 sky-2.novalocal <none> <none>
default cortx-server-0 0/2 Init:0/1 0 14m 10.85.0.33 sky-2.novalocal <none> <none>
default cortx-zookeeper-0 1/1 Running 0 14m 10.85.0.36 sky-2.novalocal <none> <none>
kube-system coredns-5769f8787-l4vzb 1/1 Running 0 61s 10.85.0.38 sky-2.novalocal <none> <none>
kube-system coredns-5769f8787-rrqxn 1/1 Running 0 61s 10.85.0.39 sky-2.novalocal <none> <none>
kube-system etcd-sky-2.novalocal 1/1 Running 0 74m 10.52.2.232 sky-2.novalocal <none> <none>
kube-system kube-apiserver-sky-2.novalocal 1/1 Running 1 (27m ago) 74m 10.52.2.232 sky-2.novalocal <none> <none>
kube-system kube-controller-manager-sky-2.novalocal 1/1 Running 5 (16m ago) 74m 10.52.2.232 sky-2.novalocal <none> <none>
kube-system kube-proxy-ksjjr 1/1 Running 0 73m 10.52.2.232 sky-2.novalocal <none> <none>
kube-system kube-scheduler-sky-2.novalocal 1/1 Running 6 (16m ago) 73m 10.52.2.232 sky-2.novalocal <none> <none>
local-path-storage local-path-provisioner-7f45fdfb8-86rz6 1/1 Running 0 54m 10.85.0.4 sky-2.novalocal <none> <none>
Describe server-pod
[cc@sky-2 k8_cortx_cloud]$ kc describe pod cortx-server-0
Name: cortx-server-0
Namespace: default
Priority: 0
Node: sky-2.novalocal/10.52.2.232
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-cortx-server-0
ReadOnly: false
cortx-configuration:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cortx
Optional: false
cortx-ssl-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cortx-ssl-cert
Optional: false
configuration-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: cortx-secret
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/cortx-server-0 to sky-2.novalocal
Normal Pulled 14m kubelet Container image "ghcr.io/seagate/cortx-rgw:2.0.0-895" already present on machine
Normal Created 14m kubelet Created container cortx-setup
Normal Started 14m kubelet Started container cortx-setup
Tried to get lods
[cc@sky-2 k8_cortx_cloud]$ kc logs cortx-server-0
Defaulted container "cortx-hax" out of: cortx-hax, cortx-rgw, cortx-setup (init)
Error from server (BadRequest): container "cortx-hax" in pod "cortx-server-0" is waiting to start: PodInitializing
The https://github.com/Seagate/cortx-k8s/tree/integration#using-cortx-on-kubernetes section needs to be completed to have a quick, high-level usage guide for interacting with CORTX once it is deployed. A deeper-dive can be linked off to another document in the repository for a comprehensive user guide once deployed, but we should have some level of a smoke test under the root README quick starts section.
We need to update the root README.md to include a "Contribution" section detailing the expectations on PRs and project coordination, taking guidance from existing CORTX repositories as available.
Tried to deploy CORTX with 12 data pods on a 8-node Kubernetes cluster, but encounter a HA-deployment timeout error. However, think that deployed successfully two days ago. But now, tried two times, all encountered the HA timeout error. Wondered may I ask for some help?
Should be able to deploy 12 data pods, as 15 disks are available besides the 1 for system and 1 for fs-local-volumn. In addition, think that I deployed it successfully once.
Can follow this deployment script:
https://github.com/faradawn/tutorials/blob/main/linux/cortx/kube.sh
v0.6.0
Kubernetes version: v1.24.0
kubectl version: v1.24.0
Container runtime: CRI-O
solution:
namespace: default
deployment_type: standard
secrets:
name: cortx-secret
content:
kafka_admin_secret: null
consul_admin_secret: null
common_admin_secret: null
s3_auth_admin_secret: null
csm_auth_admin_secret: null
csm_mgmt_admin_secret: Cortx123!
images:
cortxcontrol: ghcr.io/seagate/cortx-control:2.0.0-803
cortxdata: ghcr.io/seagate/cortx-data:2.0.0-803
cortxserver: ghcr.io/seagate/cortx-rgw:2.0.0-803
cortxha: ghcr.io/seagate/cortx-control:2.0.0-803
cortxclient: ghcr.io/seagate/cortx-data:2.0.0-803
consul: ghcr.io/seagate/consul:1.11.4
kafka: ghcr.io/seagate/kafka:3.0.0-debian-10-r97
zookeeper: ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9
rancher: ghcr.io/seagate/local-path-provisioner:v0.0.20
busybox: ghcr.io/seagate/busybox:latest
common:
storage_provisioner_path: /mnt/fs-local-volume
container_path:
local: /etc/cortx
log: /etc/cortx/log
s3:
default_iam_users:
auth_admin: "sgiamadmin"
auth_user: "user_name"
#auth_secret defined above in solution.secrets.content.s3_auth_admin_secret
max_start_timeout: 240
extra_configuration: ""
motr:
num_client_inst: 0
start_port_num: 29000
extra_configuration: ""
hax:
protocol: https
port_num: 22003
storage_sets:
name: storage-set-1
durability:
sns: 1+0+0
dix: 1+0+0
external_services:
s3:
type: NodePort
count: 1
ports:
http: 80
https: 443
nodePorts:
http: null
https: null
control:
type: NodePort
ports:
https: 8081
nodePorts:
https: null
resource_allocation:
consul:
server:
storage: 10Gi
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
client:
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 100m
zookeeper:
storage_request_size: 8Gi
data_log_dir_request_size: 8Gi
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
kafka:
storage_request_size: 8Gi
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1
hare:
hax:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
data:
motr:
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
confd:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
server:
rgw:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 2000m
control:
agent:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 500m
ha:
fault_tolerance:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
health_monitor:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
k8s_monitor:
resources:
requests:
memory: 128Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
storage:
cvg1:
name: cvg-01
type: ios
devices:
metadata:
device: /dev/sdc
size: 64Gi
data:
d1:
device: /dev/sdd
size: 64Gi
d2:
device: /dev/sde
size: 64Gi
d3:
device: /dev/sdf
size: 64Gi
d4:
device: /dev/sdg
size: 64Gi
d5:
device: /dev/sdh
size: 64Gi
d6:
device: /dev/sdi
size: 64Gi
cvg2:
name: cvg-02
type: ios
devices:
metadata:
device: /dev/sdk
size: 64Gi
data:
d1:
device: /dev/sdl
size: 64Gi
d2:
device: /dev/sdm
size: 64Gi
d3:
device: /dev/sdn
size: 64Gi
d4:
device: /dev/sdo
size: 64Gi
d5:
device: /dev/sdp
size: 64Gi
d6:
device: /dev/sdj
size: 64Gi
nodes:
node1:
name: node-1
node2:
name: node-2
node3:
name: node-3
node4:
name: node-4
node5:
name: node-5
node6:
name: node-6
node7:
name: node-7
node8:
name: node-8
First, the HA pods seemed to be running fine. Here is the result of get pod all namespace:
[root@node-1 cc]# all
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-apiserver calico-apiserver-7676694b58-8r8xf 1/1 Running 0 4d22h 192.168.247.2 node-2 <none> <none>
calico-apiserver calico-apiserver-7676694b58-sbtlk 1/1 Running 0 4d22h 192.168.247.1 node-2 <none> <none>
calico-system calico-kube-controllers-68884f975d-dkjtd 1/1 Running 0 4d22h 10.85.0.2 node-2 <none> <none>
calico-system calico-node-497jm 1/1 Running 0 4d22h 10.52.2.98 node-2 <none> <none>
calico-system calico-node-54chh 1/1 Running 0 4d22h 10.52.3.120 node-5 <none> <none>
calico-system calico-node-bhcww 1/1 Running 0 4d22h 10.52.3.25 node-6 <none> <none>
calico-system calico-node-fdhx6 1/1 Running 0 4d22h 10.52.3.71 node-3 <none> <none>
calico-system calico-node-h4kgm 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
calico-system calico-node-k244b 1/1 Running 0 4d22h 10.52.2.217 node-4 <none> <none>
calico-system calico-node-ltpzg 1/1 Running 0 4d22h 10.52.2.200 node-8 <none> <none>
calico-system calico-node-wkt2h 1/1 Running 0 4d22h 10.52.0.72 node-7 <none> <none>
calico-system calico-typha-789b8bc756-4qtcr 1/1 Running 0 4d22h 10.52.3.71 node-3 <none> <none>
calico-system calico-typha-789b8bc756-hl57v 1/1 Running 0 4d22h 10.52.0.72 node-7 <none> <none>
calico-system calico-typha-789b8bc756-q6vvg 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
default cortx-consul-client-6gjs7 1/1 Running 0 7h35m 192.168.49.211 node-6 <none> <none>
default cortx-consul-client-dtcb9 1/1 Running 0 7h35m 192.168.84.152 node-1 <none> <none>
default cortx-consul-client-lbdxt 1/1 Running 0 7h35m 192.168.217.98 node-4 <none> <none>
default cortx-consul-client-m2l7h 1/1 Running 0 7h35m 192.168.247.55 node-2 <none> <none>
default cortx-consul-client-pbs28 1/1 Running 0 7h36m 192.168.150.108 node-5 <none> <none>
default cortx-consul-client-q58gs 1/1 Running 0 7h36m 192.168.227.90 node-7 <none> <none>
default cortx-consul-client-sfhkk 1/1 Running 0 7h36m 192.168.139.109 node-3 <none> <none>
default cortx-consul-client-wvvg6 1/1 Running 0 7h35m 192.168.144.185 node-8 <none> <none>
default cortx-consul-server-0 1/1 Running 0 7h34m 192.168.217.104 node-4 <none> <none>
default cortx-consul-server-1 1/1 Running 0 7h35m 192.168.150.101 node-5 <none> <none>
default cortx-consul-server-2 1/1 Running 0 7h36m 192.168.139.96 node-3 <none> <none>
default cortx-control-5fd7bb76f7-8gcrm 1/1 Running 0 7h34m 192.168.144.132 node-8 <none> <none>
default cortx-data-node-1-84f75868fd-z76l7 4/4 Running 0 7h33m 192.168.84.149 node-1 <none> <none>
default cortx-data-node-2-7bd5bf54b7-nsvx2 4/4 Running 0 7h33m 192.168.247.54 node-2 <none> <none>
default cortx-data-node-3-599d7f746-llkg8 4/4 Running 0 7h33m 192.168.139.108 node-3 <none> <none>
default cortx-data-node-4-7b8c6bf545-gn62n 4/4 Running 0 7h33m 192.168.217.108 node-4 <none> <none>
default cortx-data-node-5-56fb948c74-25r9v 4/4 Running 0 7h33m 192.168.150.100 node-5 <none> <none>
default cortx-data-node-6-86c94c46f-nfdgc 4/4 Running 0 7h33m 192.168.49.216 node-6 <none> <none>
default cortx-data-node-7-59668fd6fd-5wp5r 4/4 Running 0 7h33m 192.168.227.98 node-7 <none> <none>
default cortx-data-node-8-5dd6b5c5ff-dmrqf 4/4 Running 0 7h33m 192.168.144.191 node-8 <none> <none>
default cortx-ha-775dcbd84b-7tqdv 3/3 Running 0 7h25m 192.168.144.182 node-8 <none> <none>
default cortx-kafka-0 1/1 Running 1 (7h37m ago) 7h37m 192.168.217.92 node-4 <none> <none>
default cortx-kafka-1 1/1 Running 0 7h37m 192.168.49.213 node-6 <none> <none>
default cortx-kafka-2 1/1 Running 0 7h37m 192.168.150.107 node-5 <none> <none>
default cortx-server-node-1-576c5d794c-xd5r6 2/2 Running 0 7h30m 192.168.84.150 node-1 <none> <none>
default cortx-server-node-2-6987744f59-96xdd 2/2 Running 0 7h30m 192.168.247.52 node-2 <none> <none>
default cortx-server-node-3-7bbdddd479-xdfqt 2/2 Running 0 7h30m 192.168.139.106 node-3 <none> <none>
default cortx-server-node-4-5c94fc889c-rl8jj 2/2 Running 0 7h30m 192.168.217.107 node-4 <none> <none>
default cortx-server-node-5-5b75d49b67-vjx8q 2/2 Running 0 7h30m 192.168.150.109 node-5 <none> <none>
default cortx-server-node-6-76c5dddc4c-d74bw 2/2 Running 0 7h30m 192.168.49.218 node-6 <none> <none>
default cortx-server-node-7-797df6dc67-9s4dv 2/2 Running 0 7h30m 192.168.227.96 node-7 <none> <none>
default cortx-server-node-8-78858c774f-mzhhl 2/2 Running 0 7h30m 192.168.144.189 node-8 <none> <none>
default cortx-zookeeper-0 1/1 Running 0 7h37m 192.168.144.171 node-8 <none> <none>
default cortx-zookeeper-1 1/1 Running 0 7h37m 192.168.139.98 node-3 <none> <none>
default cortx-zookeeper-2 1/1 Running 0 7h37m 192.168.217.106 node-4 <none> <none>
kube-system coredns-64455c7956-l2sbf 1/1 Running 0 4d17h 192.168.217.65 node-4 <none> <none>
kube-system coredns-64455c7956-zb5nl 1/1 Running 0 4d17h 192.168.150.66 node-5 <none> <none>
kube-system etcd-node-1 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
kube-system kube-apiserver-node-1 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
kube-system kube-controller-manager-node-1 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
kube-system kube-proxy-6kfz7 1/1 Running 0 4d22h 10.52.0.72 node-7 <none> <none>
kube-system kube-proxy-f5b4h 1/1 Running 0 4d22h 10.52.2.217 node-4 <none> <none>
kube-system kube-proxy-jg5tz 1/1 Running 0 4d22h 10.52.3.120 node-5 <none> <none>
kube-system kube-proxy-qgdmg 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
kube-system kube-proxy-qmgd2 1/1 Running 0 4d22h 10.52.2.98 node-2 <none> <none>
kube-system kube-proxy-skqk7 1/1 Running 0 4d22h 10.52.2.200 node-8 <none> <none>
kube-system kube-proxy-vm8xq 1/1 Running 0 4d22h 10.52.3.25 node-6 <none> <none>
kube-system kube-proxy-z8hst 1/1 Running 0 4d22h 10.52.3.71 node-3 <none> <none>
kube-system kube-scheduler-node-1 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
local-path-storage local-path-provisioner-7f45fdfb8-r88fj 1/1 Running 0 4d17h 192.168.49.193 node-6 <none> <none>
tigera-operator tigera-operator-5fb55776df-fjhqz 1/1 Running 0 4d22h 10.52.3.226 node-1 <none> <none>
Second, here is all the deployments. The HA deployment also seemed alright.
[root@node-1 cc]# kc get deployment --all-namespaces
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
calico-apiserver calico-apiserver 2/2 2 2 4d22h
calico-system calico-kube-controllers 1/1 1 1 4d22h
calico-system calico-typha 3/3 3 3 4d22h
default cortx-control 1/1 1 1 7h21m
default cortx-data-node-1 1/1 1 1 7h21m
default cortx-data-node-2 1/1 1 1 7h21m
default cortx-data-node-3 1/1 1 1 7h21m
default cortx-data-node-4 1/1 1 1 7h21m
default cortx-data-node-5 1/1 1 1 7h21m
default cortx-data-node-6 1/1 1 1 7h21m
default cortx-data-node-7 1/1 1 1 7h21m
default cortx-data-node-8 1/1 1 1 7h21m
default cortx-ha 1/1 1 1 7h13m
default cortx-server-node-1 1/1 1 1 7h18m
default cortx-server-node-2 1/1 1 1 7h18m
default cortx-server-node-3 1/1 1 1 7h18m
default cortx-server-node-4 1/1 1 1 7h18m
default cortx-server-node-5 1/1 1 1 7h18m
default cortx-server-node-6 1/1 1 1 7h18m
default cortx-server-node-7 1/1 1 1 7h18m
default cortx-server-node-8 1/1 1 1 7h18m
kube-system coredns 2/2 2 2 4d22h
local-path-storage local-path-provisioner 1/1 1 1 4d17h
tigera-operator tigera-operator 1/1 1 1 4d22h
Finally, here is the error during deployment:
########################################################
# Deploy CORTX HA
########################################################
NAME: cortx-ha-default
LAST DEPLOYED: Mon Jun 13 06:52:07 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Wait for CORTX HA to be ready.............................error: timed out waiting for the condition on deployments/cortx-ha
Deployment CORTX HA timed out after 240 seconds
Failed. Exiting script.
Here is the disk layout:
[root@node-1 cc]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk /mnt/fs-local-volume
sdb 8:16 0 1.8T 0 disk
sdc 8:32 0 1.8T 0 disk
sdd 8:48 0 1.8T 0 disk
sde 8:64 0 1.8T 0 disk
sdf 8:80 0 1.8T 0 disk
sdg 8:96 0 1.8T 0 disk
sdh 8:112 0 1.8T 0 disk
sdi 8:128 0 1.8T 0 disk
sdj 8:144 0 1.8T 0 disk
sdk 8:160 0 1.8T 0 disk
sdl 8:176 0 1.8T 0 disk
sdm 8:192 0 1.8T 0 disk
sdn 8:208 0 1.8T 0 disk
sdo 8:224 0 1.8T 0 disk
sdp 8:240 0 1.8T 0 disk
sdq 65:0 0 372.6G 0 disk
└─sdq1 65:1 0 372.6G 0 part /
loop0 7:0 0 1.8T 0 loop
loop1 7:1 0 1.8T 0 loop
loop2 7:2 0 1.8T 0 loop
loop3 7:3 0 1.8T 0 loop
loop4 7:4 0 1.8T 0 loop
loop5 7:5 0 1.8T 0 loop
loop6 7:6 0 1.8T 0 loop
loop7 7:7 0 1.8T 0 loop
loop8 7:8 0 1.8T 0 loop
loop9 7:9 0 1.8T 0 loop
loop10 7:10 0 1.8T 0 loop
loop11 7:11 0 1.8T 0 loop
loop12 7:12 0 1.8T 0 loop
loop13 7:13 0 1.8T 0 loop
Thanks in advance!
Error while trying to follow the steps in the below document :
https://github.com/Seagate/cortx-k8s/blob/stable/doc/cortx-aws-k8s-installation.md
While trying to execute step number 2.7 I am getting the following error :
# ssh $SSH_FLAGS centos@$ClusterControlPlaneIP "kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml; kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml"
Warning: Permanently added 'XX.XXX.XX.XXX' (ED14408) to the list of known hosts.
error: error loading config file "/home/centos/.kube/config": open /home/centos/.kube/config: permission denied
error: error loading config file "/home/centos/.kube/config": open /home/centos/.kube/config: permission denied
I was trying out the new v0.1.0 release with the AWS QSG (and CloudFormation for #119), and the deployment steps went smoothly.
In section 4.1, however, I can't get the CSM calls to work. The login
works as expected and gives me a bearer token, but the call to s3_accounts
just returns 404.
Did something with CSM change since the last release, or am I missing something simple?
@r-wambui , can you please work with @alfhad to make sure that the awesome google doc that @alfhad created becomes a fully working document and is contributed to our GitHub and linked appropriately from an existing page?
References:
Slack Discussion
Google Doc
Please reach out to @osowski if you have any questions about either how to get the document working or how to appropriately create documentation in this repo.
Through https://github.com/Seagate/cortx-k8s/tree/CORTX-29859_migrate_data_pods_statefulset, we are breaking the explicit linkage that the PVs and PVCs used in CORTX are mirror copies of one another. This then allows users to create their own PVs ahead of time and allow the CORTX Helm Charts to use these PVs instead.
This issue should cover the documentation (and minor code) changes required to expose this functionality to the user - which for the most part entails creating PVs with the appropriate StorageClass and cortx.io/device-path
k8s Label so the Data Pods PVCs will be matched correctly.
In this way, users would be able to create more Data Pods than k8s Worker Nodes, all while ensuring the PVs used for each Pod were pointing to distinct underlying paths on the local node's filesystem.
n/a
v0.7.0
Kubernetes version: 1.22
kubectl version: 1.22
n/a
n/a
n/a
If you run generate-cvg-yaml.sh
with the -v
or --verbose
flag it will fail in some circumstances.
The script should work correctly with those flags.
Pass in the -v
or --verbose
flag to the script. It fails, unless it's the very last argument.
Bad: ./generate-cvg-yaml.sh -v -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi"
Good: ./generate-cvg-yaml.sh -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi" -v
N/A
No response
❯ ./generate-cvg-yaml.sh -v -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi"
./generate-cvg-yaml.sh: -- OPTIONS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | NUM_CVGS=1
./generate-cvg-yaml.sh: | NUM_DATA_DRIVES=1
./generate-cvg-yaml.sh: | SIZE_DATA_DRIVE=5Gi
./generate-cvg-yaml.sh: | NUM_METADATA_DRIVES=1
./generate-cvg-yaml.sh: | SIZE_METADATA_DRIVE=5Gi
./generate-cvg-yaml.sh: | SOLUTION_YAML=solution.yaml
./generate-cvg-yaml.sh: | NODE_LIST_FILE=UNSET
./generate-cvg-yaml.sh: | DEVICE_PATHS_FILE=UNSET
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: -- FLAGS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | _VERBOSE=1
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: -- PRE-REQS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | NODE_LIST_FILE="UNSET"
./generate-cvg-yaml.sh: NODE_LIST_FILE is a required parameter and is unset.
Notice how all the command line arguments were ignored and the default values are printed. Obviously, the args were never processed.
Good:
❯ ./generate-cvg-yaml.sh -x nodes.txt -y devices.txt -c 2 -d 52 -s "solution9_106.yaml" -e "200Gi" -n "200Gi" -v
./generate-cvg-yaml.sh: -- OPTIONS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | NUM_CVGS=2
./generate-cvg-yaml.sh: | NUM_DATA_DRIVES=52
./generate-cvg-yaml.sh: | SIZE_DATA_DRIVE=200Gi
./generate-cvg-yaml.sh: | NUM_METADATA_DRIVES=1
./generate-cvg-yaml.sh: | SIZE_METADATA_DRIVE=200Gi
./generate-cvg-yaml.sh: | SOLUTION_YAML=solution9_106.yaml
./generate-cvg-yaml.sh: | NODE_LIST_FILE=nodes.txt
./generate-cvg-yaml.sh: | DEVICE_PATHS_FILE=devices.txt
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: -- FLAGS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | _VERBOSE=1
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: -- PRE-REQS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | NODE_LIST_FILE="nodes.txt"
./generate-cvg-yaml.sh: | DEVICE_PATHS_FILE="devices.txt"
./generate-cvg-yaml.sh: | YQ_AVAILABLE="/home/kpine/bin/yq"
./generate-cvg-yaml.sh: -- PARSED PARAMETERS
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | NODE_LIST_FILE:
./generate-cvg-yaml.sh: | node1
./generate-cvg-yaml.sh: | node2
./generate-cvg-yaml.sh: |
./generate-cvg-yaml.sh: | DEVICE_PATHS_FILE:
./generate-cvg-yaml.sh: | /dev/sda
./generate-cvg-yaml.sh: | /dev/sdb
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.