hsbawa / icp-ce-on-linux-containers Goto Github PK

Multi node IBM Cloud Private Community Edition 3.2.x w/ Kubernetes 1.13.5 in a Box. Terraform, Packer and BASH based Infrastructure as Code script sets up a multi node LXD cluster, installs ICP-CE and clis on a metal or VM Ubuntu 18.04 host.

Home Page: https://github.com/HSBawa/icp-ce-on-linux-containers

Shell 67.37% HCL 32.63%

icp ibm-cloud-private hashicorp packer linux-containers ibm terraform linux kubernetes kubernetes-cluster

icp-ce-on-linux-containers's People

Contributors

Stargazers

Watchers

Forkers

forks 4ant00ra cinco oscargcervantes haribawa asirxing tsanghan kernt bonomali simhaonline fanzalika hammersport-marketing manudiy virtualization-projects swipswaps hermanstrom-of-noaa sgnconnects

icp-ce-on-linux-containers's Issues

Kubernetes in icp_master did not start properly

I setup ICP cluster in Google cloud platform. (8CPU, 30GB Ram, 500GB Disk)
Run the command ./create_cluster.sh -es=demo
When finished, i get some error blow.
How do I start Kubernetes properly? Thanks

GlusterFS Server IP Address

hi, Hari
Am running through another series of experiments. The GlusterFS servers and ICP are running. I am able to ping the glusterfs servers from the ICP worker nodes. All good. I wondered how the glusterfs server IP addresses became 10.31.221.117 and 10.31.221.132. I expected them to inherit values from glusterfs-server-lxc.sh:

node_ip_pre=10.71.17.
start_ip=190

I wanted to understand the logic. Would appreciate guidance.
Thanks, Ravi

(local-exec): ls: cannot open directory '/root': Permission denied

Captured console output of terraform apply. Have attached file. I could not diagnose:

module.container-master_worker.lxd_container.icp_ce_master (local-exec): ls: cannot open directory '/root': Permission denied

Suggestions? Profiles & network look okay. Five LXC containers are running with two interfaces 10... & 172... . I am able exec into a container. It has /root directory. I could not determine where ls /root command originated.

icp-install-error.txt

Overhauled my disk allocations & names (complete rebuild of my Ubuntu 18.04 machine). File systems names are aligned with your repo (to minimize tweaks before I build).

Error: cannot connect to Tiller

I logged into ICP using icp-login-3.1.2-ce.sh. Thought I should see tiller

ravi@toga:~/git/icp-ce-on-linux-containers$ helm init --client-only
$HELM_HOME has been configured at /home/ravi/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
ravi@toga:~/git/icp-ce-on-linux-containers$ helm version
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Error: cannot connect to Tiller

Response to kubectl get deployments --all-namespaces contains the line:

NAMESPACE      NAME                                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system    tiller-deploy                                 1         1         1            1           4h12m

Or should I use helm init rather than helm init --client-only ?

GlusterFS PersistentVolume

Sorry to raise another issue, Hari. I thought I had run through these steps a couple of months ago, but I am not sure now. I was trying to compose the PersistentVolume yaml and went into tailspin. I tried the ICP Create PersistentVolume dialog. Does this look correct? Especially path:

{
  "kind": "PersistentVolume",
  "apiVersion": "v1",
  "metadata": {
    "name": "gfs1-vol2",
    "labels": {}
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "persistentVolumeReclaimPolicy": "Retain",
    "glusterfs": {
      "endpoints": "gluster-cluster",
      "path": "/volume/pool2"
    },
    "capacity": {
      "storage": "10Gi"
    }
  }
}

The glusterfs servers volumes seem out of balance. Server 1 shows only five volumes.

root@glusterfs-server-1:~# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/sdb1                  280G   41G  236G  15% /
none                       492K     0  492K   0% /dev
udev                        16G     0   16G   0% /dev/fuse
tmpfs                      100K     0  100K   0% /dev/lxd
/dev/sdb2                   96G   60M   91G   1% /share
/dev/sdb8                   96G   62M   91G   1% /volume
tmpfs                      100K     0  100K   0% /dev/.lxd-mounts
tmpfs                       16G   12K   16G   1% /dev/shm
tmpfs                       16G  9.1M   16G   1% /run
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                       16G     0   16G   0% /sys/fs/cgroup
glusterfs-server-1:/vol3    96G  1.1G   91G   2% /volume/pool3
glusterfs-server-1:/vol2    96G  1.1G   91G   2% /volume/pool2
glusterfs-server-1:/vol7    96G  1.1G   91G   2% /volume/pool7
glusterfs-server-1:/vol6    96G  1.1G   91G   2% /volume/pool6
glusterfs-server-1:/vol10   96G  1.1G   91G   2% /volume/pool10

Server 2 shows ten volumes.

root@glusterfs-server-2:~# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/sdb1                  280G   40G  236G  15% /
none                       492K     0  492K   0% /dev
udev                        16G     0   16G   0% /dev/fuse
tmpfs                      100K     0  100K   0% /dev/lxd
/dev/sdb2                   96G   60M   91G   1% /share
/dev/sdb9                   96G   62M   91G   1% /volume
tmpfs                      100K     0  100K   0% /dev/.lxd-mounts
tmpfs                       16G   12K   16G   1% /dev/shm
tmpfs                       16G  9.1M   16G   1% /run
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                       16G     0   16G   0% /sys/fs/cgroup
glusterfs-server-1:/vol1    96G  1.1G   91G   2% /volume/pool1
glusterfs-server-1:/vol7    96G  1.1G   91G   2% /volume/pool7
glusterfs-server-1:/vol3    96G  1.1G   91G   2% /volume/pool3
glusterfs-server-1:/vol8    96G  1.1G   91G   2% /volume/pool8
glusterfs-server-1:/vol2    96G  1.1G   91G   2% /volume/pool2
glusterfs-server-1:/vol5    96G  1.1G   91G   2% /volume/pool5
glusterfs-server-1:/vol4    96G  1.1G   91G   2% /volume/pool4
glusterfs-server-1:/vol10   96G  1.1G   91G   2% /volume/pool10
glusterfs-server-1:/vol6    96G  1.1G   91G   2% /volume/pool6

Full build fails -- proxy-mgmt nodes

Started with clean sheet. Synced repo, of course. New branch. Set lxcshare, LXD with APT & storage pool. Full build failed at terrafrom init with:

Error: module 'container-proxy-mgmt': unknown variable referenced: 'lxd_image'; define it with a 'variable' block
Error: Variable 'remote': duplicate found. Variable names must be unique.
Error: resource 'lxd_container.icp_ce_mgmt' config: unknown variable referenced: 'lxd_image'; define it with a 'variable' block
Error: resource 'lxd_container.icp_ce_proxy' config: unknown variable referenced: 'lxd_image'; define it with a 'variable' block
Error: module "profile-proxy-mgmt": missing required argument "lxd"
Error: module "container-proxy-mgmt": "lxd_image" is not a valid argument
Error: module "container-proxy-mgmt": missing required argument "lxd"

Tweaked proxy-mgmt.tf.tmpl and container/proxy-mgmt/main.tf -- changed lxd_image to lxd. Now down to two pesky errors with terraform init.

Error: Variable 'remote': duplicate found. Variable names must be unique.
Error: module "profile-proxy-mgmt": missing required argument "lxd"

Not sure if I have gone forwards or backwards. Minimal build succeeds with the tweaked code base. Am I in the mud hole again?

packer build xenial fails

I am able to isolate the problem to the line which contains linux-image. I modified xenial-packer-lxd-image:

"echo [$(uname -r)]", "bash -c 'apt install -y ntp curl wget resolvconf &> /dev/null'", "echo [Updating packages 2]", "bash -c 'apt install -y linux-image-$(uname -r) linux-image-extra-$(uname -r) linux-image-extra-virtual &> /dev/null'", "echo [Updating packages 3]",

the output of packer build shows Updating packages 2 but not Updating packages 3:

xenial-amd64-container-for-icp: [Diabling UFW] xenial-amd64-container-for-icp: Waiting for container to settle down ... xenial-amd64-container-for-icp: [Updating packages] xenial-amd64-container-for-icp: [4.15.0-33-generic] xenial-amd64-container-for-icp: [Updating packages 2] ==> xenial-amd64-container-for-icp: Unregistering and deleting deleting container... Build 'xenial-amd64-container-for-icp' errored: Script exited with non-zero exit status: 100
--- Update ---
The image builds after I removed line containing linux-image (above). I am able to launch the image and I see two network interfaces. Any ideas? Thanks, Ravi

bionic container only one interface

Just FYI... No problem. Will proceed rebuild on xenial.
stirred-lynx -- bionic has been running for 15 minutes
complete-donkey -- xenial has been up for a couple of minutes

| complete-donkey | RUNNING | 172.17.0.1 (docker0) | fd42:8d02:8f0f:9205:216:3eff:fef6:67bd (eth0) | PERSISTENT | 0         |
|                 |         | 10.31.221.238 (eth0) |                                               |            |           |
+-----------------+---------+----------------------+-----------------------------------------------+------------+-----------+
| stirred-lynx    | RUNNING | 10.31.221.130 (eth0) | fd42:8d02:8f0f:9205:216:3eff:fea0:f262 (eth0) | PERSISTENT | 0

cloudctl -- wonder what it does

No worries. Just FYI...

passwword

I noticed this when I ran icp-login-3.1.0-ce.sh:

FAILED
'passwword-rules' is not a registered command. See 'cloudctl pm help'.

grep shows ww in passwword:

./icp-login-3.1.0-ce.sh:cloudctl pm passwword-rules devicpcluster default
./scripts/prepare-boot-node.sh:            echo "$cliname pm passwword-rules $cluster_name $default_namespace" | tee -a $icp_login_sh_file

I ran the command manually with correct spelling. Fails. Does it work for you?

Where is the management node?

ravi@toga:~/git/icp-ce-on-linux-containers$ cloudctl cm nodes
ID                 Type     Private IP     Public IP   Machine Type   State   
devicpcluster-m1   master   10.50.50.101   -           -              deployed   
devicpcluster-p1   proxy    10.50.50.121   -           -              deployed   
devicpcluster-w1   worker   10.50.50.201   -           -              deployed   
devicpcluster-w2   worker   10.50.50.202   -           -              deployed

Same with kubectl. Do you know why they would be different?

ravi@toga:~/git/icp-ce-on-linux-containers$ kubectl get no
NAME           STATUS   ROLES         AGE   VERSION
10.50.50.101   Ready    etcd,master   7h    v1.11.1+icp
10.50.50.121   Ready    proxy         7h    v1.11.1+icp
10.50.50.141   Ready    management    7h    v1.11.1+icp
10.50.50.201   Ready    worker        7h    v1.11.1+icp
10.50.50.202   Ready    worker        7h    v1.11.1+icp

How Dashboard URL IP can be changed?

Hi,

I have tried to change the default private IP for the Dashboard with no success.
I want to use the public ip of the VPS wich is static.
The VPS i used is that which is recommended by the author

In which config file i need to make the modification for this?
Can anyone help me?

Thank you and regards,
Manu

[Q] Routing Metal LB

Just a question, I see HAProxy is installed and assuming for routing from PODS to the host for access externally via the host IP. My question is, if I which to use Metal LB to expose services via LoadBalancer the IP range for Layer2 should come from the same subnet that my Hosts are being assigned. So as an example:

My Host for this is 10.10.100.X and the range for the LXC is at 10.50.50.X . Since I would like to consider Metal LB for exposing PODS I would want my LXC containers to be in the same range as host so 10.10.100.X I have tried this but looks like when doing so the containers cannot resolve the external registries to pull the containers.

Is there anything else I should consider ? I did test Metal LB with the existing 10.50.50.x assignment and did see my PODs when exposed via LoadBalancer were able to be accessed but only from the Host not external do such. Ideally I would like to use Metal LB to be able to access any svc I explose as a unique IP in the range as described above. Just looking for thoughts or why possibly when the range of the LXC containers is within the same subnet as host that it is unable to see or resolve the external registries to pull images

Thanks
DB

Not able to destroy terraform network & profile modules

hi, Hari.
Don't think this issue is a problem in your code, but I have not been able to solve this riddle despite several hours of effort and repeated runs of terra-clean.sh and terraform destroy. I am using the latest version of the code. install-w-terra.sh or manual invocations of the commands produces the same errors. Did I miss the obvious?
Thanks, Ravi

Error: Error applying plan:

2 error(s) occurred:

* module.network.lxd_network.icp_ce_network: 1 error(s) occurred:

* lxd_network.icp_ce_network: The network already exists
* module.profile.lxd_profile.icp_ce: 1 error(s) occurred:

* lxd_profile.icp_ce: The profile already exists

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

ssh failure between xenial nodes

hi, Hari.
All lxc nodes reach running state and have two network interfaces, but ssh setup seems awry. During one run I noticed ssh failure messages in the install log file in dev-boot. First time, I had not installed kubectl & IBM Cloud CLI. The second & third times (lxd check all OK), the log file was empty. All three times the terraform apply command finished in 2 minutes. dev-boot had no running docker containers, but had the ibmcom/icp-inception 2.1.0.2 docker image and extracted /opt/icp-2.1.0.2-ce installer. There was no cfc folder. I am trying to build ICP ce 2102 and pulled a fresh copy from github. Am using the same machine which worked built ICP ce 2101. Have you seen this before?
Thanks, Ravi

[Q] This is more of a question

I have a questions but will also report this minor bug. When I pass the --force option to create-cluster.sh it will throw an error reporting an extra '-' so something like '- --force' and error out.

My real question is I have spun up the cluster as-is but with 780G storage pool and each worker having 100G , 4 workers.

I noticed that in the Dashboard ICP shows 80G shared. How does all this relate to the overall storage utilized or taken from the pool?

I suppose my end result would be to be able to utilize the full 780G as best as possible and that perhaps each worker should only have 20G-40G and reduce the storage pool size, leaving the rest of the storage to perhaps mount as NFS on the host and inject in to ICP.

Thoughts? Again I think my best approach is workers to run loads and use PV allocations for storage which I think is the norm and thus perhaps I should not be allocating 100G to each worker.

Hari can you provide me some clarification and thoughts around this> Thanks so much

DB-

nfs-pv.yaml - minor syntax issue

k8s PV vol8, 9, 10 worked after I fixed the nfs-pv.yaml. Is the following an error?

metadata:
   name: vol8
   labels:                                   <<---- missing for vol8, 9, 10
          type: app8

OpenResty?

Many times while trying to install repeatedly everything looks like it was a success and then going to ICP you end up with this:

Welcome to OpenResty!

If you see this page, the OpenResty web platform is successfully installed and working. Further configuration is required.

For online documentation and support please refer to openresty.org.
Commercial support is available at openresty.com.

Thank you for flying OpenResty.

Thoughts? Curious as to what is truly failing.

Thanks
DB

Small error in `icp-login-3.1.0-ce.sh`

Nice to have the login script. I tweaked it just a little. Removed devicpcluster. This works (note: no cluster name). Well, it does not throw an error. But there is very little in the output.

$ cloudctl pm password-rules default
Name   Description   Regex

Throws error.

$ cloudctl pm password-rules devicpcluster default
FAILED
Incorrect Usage.

NAME:
   password-rules - List the password rules for a namespace.

USAGE:
   cloudctl pm password-rules <namespace>

PARAMETERS:
   -s      Do not show the column headers in the output.
   --json  Display output in JSON format.

Support RHEL/Fedora/CentOS systems

I actually like the idea of this IaaC however its limited to only one distro. I think it should be cross-distro and supporting Red Hat Distros can also help us deploy ICP in LXD easier.

Unable to create client for remote [local-https]

Vanilla trial run -- changed source = "/media/lxcshare" in main.tf and ran through commands of Option 2 in 2.0 Create LXD Cluster and ICP install with Terraform · HSBawa/icp-ce-on-linux-containers Wiki.
Ubuntu 18.04 host, image: xenial-container-for-icp-lvm-bionic-host, Snap 3.01 installed via APT. After I saw the error, I installed Apache. Now http://127.0.0.1 works, but https://127.0.0.1:9443 fails. Ran through Option 2 from the top and still the same error. Suggestions?

2 error(s) occurred:

* module.network.lxd_network.icp_ce_network: 1 error(s) occurred:

* lxd_network.icp_ce_network: Unable to create client for remote [local-https]: Could not get remote certificate: Get https://127.0.0.1:9443: Unable to connect to: 127.0.0.1:9443
* module.profile-master-worker.lxd_profile.icp_ce: 1 error(s) occurred:

* lxd_profile.icp_ce: Unable to create client for remote [local-https]: Could not get remote certificate: Get https://127.0.0.1:9443: Unable to connect to: 127.0.0.1:9443

Docker fails to run

ICP installation failed as docker failed to run on dev-boot. I tried docker run hello-world on testimage. It fails with errors (please see attached file). Ran into the same problem with zfs and, later, with btrfs. Host machine is Ubuntu 17.10. Any ideas? Will plug away as time permits.
icp-docker-error.txt

JSON file is missing a comma, then it fails to validate

No big deal but check the EOL of the bionic image JSON file :

icp-ce-on-linux-containers/util-scripts/lxd-setup/images/bionic-packer-lxd-image

Line 2 in 99b2c89

"_comment": "Use this image only for Bionic Hosts"

When I came to validate the image file, I got :

packer validate ./util-scripts/lxd-setup/images/bionic-packer-lxd-image
Failed to parse template: Error parsing JSON: invalid character '"' after object key:value pair
At line 3, column 4 (offset 58):
    2:   "_comment": "Use this image only for Bionic Hosts"
    3:   "
         ^

LXD/LXC Start Over

Not really a problem with your code, Hari. I wanted to clean house and migrate to LXD snap. I deleted containers & images before I migrated LXD from apt to snap (your instructions helped). I saw four LXC networks but only one seemed to exist. I wanted to delete all, but was able to delete only lxdbr0 (the only network which seemed to exist). Any tips on removing networks?

ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network list
+---------+----------+---------+-------------+---------+
|  NAME   |   TYPE   | MANAGED | DESCRIPTION | USED BY |
+---------+----------+---------+-------------+---------+
| docker0 | bridge   | NO      |             | 0       |
+---------+----------+---------+-------------+---------+
| enp31s0 | physical | NO      |             | 0       |
+---------+----------+---------+-------------+---------+
| lxcbr0  | bridge   | NO      |             | 0       |
+---------+----------+---------+-------------+---------+
| lxdbr0  | bridge   | YES     |             | 0       |
+---------+----------+---------+-------------+---------+

But three do not seem to exist. Is this a problem?

ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network info lxcbr0
Error: Interface 'lxcbr0' not found
ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network info docker0 
Error: Interface 'docker0' not found
ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network info enp31s0 
Error: Interface 'enp31s0' not found

Only one network seems to exist:

ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network info lxdbr0 
Name: lxdbr0
MAC address: ba:b7:05:96:f5:89
MTU: 1500
State: up

Ips:
  inet	10.31.221.1
  inet6	fd42:8d02:8f0f:9205::1
  inet6	fe80::b8b7:5ff:fe96:f589

Network usage:
  Bytes received: 0B
  Bytes sent: 35.55kB
  Packets received: 0
  Packets sent: 208

Managed to delete lxdbr0, but not the other three Error: not found.

ravi@toga:~/git/icp-ce-on-linux-containers$ lxc network delete lxdbr0 
Network lxdbr0 deleted

Failed - Create LXC container: LXD doesn't have a uid/gid allocation

Ran the command: sudo ./create-cluster.sh -es=dev --host=pc. Creates ubuntu 18.04 LTS amd64 (release) (20190306) | x86_64 but fails to create bionic-image-for-icp-lvm.

==> bionic-image-for-icp-lvm: Creating container...
==> bionic-image-for-icp-lvm: Error creating container: LXD command error: Error: Failed container creation: Create LXC container: LXD doesn't have a uid/gid allocation. In this mode, only privileged containers are supported
==> bionic-image-for-icp-lvm: Unregistering and deleting deleting container...
==> bionic-image-for-icp-lvm: Error deleting container: LXD command error: Error: not found
Build 'bionic-image-for-icp-lvm' errored: Error creating container: LXD command error: Error: Failed container creation: Create LXC container: LXD doesn't have a uid/gid allocation. In this mode, only privileged containers are supported

Both /etc/subuid and /etc/subgid have the same entry: ravi:100000:65536. Have attached the console output.

Tried command ./create-cluster.sh -es=dev --host=pc -- same results.

create-cluster-2.log

Please see attacehd lxd-state-after-2.txt. The existing storage pool default disappeared, but I am not sure when. Where does the ubuntu 18.04 LTS amd64 (release) (20190306) | x86_64 exist? I cannot see it in storage pool icpce.

lxd-state-after-2.txt

hsbawa / icp-ce-on-linux-containers Goto Github PK

icp-ce-on-linux-containers's People

Contributors

Stargazers

Watchers

Forkers

icp-ce-on-linux-containers's Issues

passwword

Where is the management node?

Recommend Projects

Recommend Topics

Recommend Org

Jobs