GithubHelp home page GithubHelp logo

contiv / install Goto Github PK

View Code? Open in Web Editor NEW
114.0 39.0 59.0 473 KB

Contiv Installer

Home Page: https://contiv.github.io

License: Other

Makefile 7.31% Ruby 10.86% Shell 76.26% Python 5.57%
contiv docker-swarm kubeadm aci kubernetes

install's Introduction

Contiv Installation for Docker Swarm & Kubernetes 1.4+

Install Contiv on your Docker Swarm or Kubernetes cluster.

Docker Swarm Installation

Prerequisites

  • CentOS 7.x operating system.
  • Python installed on the master and worker nodes.
  • Docker installed on the host where you are running the installer.
  • Install a Docker Swarm cluster in either legacy swarm mode or native swarm mode (requires 17.03+ version of Docker engine where swarm functionality is inbuilt). Alternatively, use the Contiv installer to setup docker and legacy swarm stack on cluster nodes.

Contiv Installation with Legacy Swarm Mode

The Contiv Docker Swarm installer is launched from a host external to the cluster. All the nodes must be accessible to the Contiv Ansible-based installer host through SSH. installer

  • Download the installer bundle:
    curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
    If your access to the Internet is limited or slow and you want to download the full Contiv install, choose
    contiv-full-$VERSION.tgz
    Note: The full image contains only Contiv components. Installing Docker Swarm will need Internet connectivity.

  • Extract the install bundle
    tar oxf contiv-$VERSION.tgz.

  • Change directories to the extracted folder
    cd contiv-$VERSION

  • To install Contiv with Docker Legacy Swarm:
    ./install/ansible/install_swarm.sh -f cfg.yml -e <ssh key> -u <username> -i

  • To install Contiv with Docker Legacy Swarm and ACI:
    ./install/ansible/install_swarm.sh -f aci_cfg.yml -e <ssh key> -u <username> -i -m aci

  • Example host config files are available at install/ansible/cfg.yml and install/ansible/aci_cfg.yml

  • To see additional install options and examples, run
    ./install/ansible/install_swarm.sh -h.

Contiv Installation with Native Swarm Mode

Docker swarm cluster must be already setup (see details). Installer only sets up Contiv v2plugin and dependencies. The Contiv installer can be run from a host in cluster itself.

  • Download the installer bundle:
    curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
    If your access to the Internet is limited or slow and you want to download the full Contiv install, choose
    contiv-full-$VERSION.tgz
    Note: The full image contains only Contiv components.

  • Extract the install bundle
    tar oxf contiv-$VERSION.tgz.

  • Change directories to the extracted folder
    cd contiv-$VERSION

  • To install Contiv v2plugin:
    ./install/ansible/install_swarm.sh -f cfg.yml -e <ssh key> -u <username> -p

  • Example host config files are available at install/ansible/cfg.yml and install/ansible/aci_cfg.yml

  • To see additional install options and examples, such as adding arguments to ansible for verbose output and proxy settings, run
    ./install/ansible/install_swarm.sh -h.

Removing Contiv

If you need to remove Contiv from Docker Swarm and return to your original state, you can uninstall Contiv with the following commands:

  • To uninstall Contiv and Docker Legacy Swarm:
    ./install/ansible/uninstall_swarm.sh -f cfg.yml -e <ssh key> -u <username> -i
  • To uninstall Contiv and Docker Legacy Swarm with ACI support:
    ./install/ansible/uninstall_swarm.sh -f aci_cfg.yml -e <ssh key> -u <username> -i -m aci
  • To uninstall Contiv and not Docker Legacy Swarm:
    ./install/ansible/uninstall_swarm.sh -f cfg.yml -e <ssh key> -u <username>
  • To uninstall Contiv v2plugin:
    ./install/ansible/uninstall_swarm.sh -f cfg.yml -e <ssh key> -u <username> -p
  • Note: Adding the -r flag, will cleanup any Contiv state.
  • To see additional install options and examples, such as adding arguments to ansible for verbose output and proxy settings, run
    ./install/ansible/uninstall_swarm.sh -h.

Kubernetes Installation

Prerequisites

  • Kubernetes 1.6.2+, and 1.8.4+ are supported with the following instructions.
  • CentOS 7.x operating system
  • Install Kubernetes:
    1. kubeadm installs the latest Kubernetes version.
      For Kubernetes 1.6, see an example script here. For Kubernetes 1.8, see an example script here.
    2. Replace step (3/4) in the kubeadm install guide with the following Contiv Installation Instructions. Contiv installation can be done after completing step (4/4).
    3. Instructions to install Kubernetes are available here.

Contiv Installation

  • Run the following commands on the kubernetes master host.
  • Use curl to get the installer bundle:
    curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
  • Extract the install bundle
    tar oxf contiv-$VERSION.tgz.
  • Change directories to the extracted folder
    cd contiv-$VERSION
  • To install Contiv with VXLAN:
    sudo ./install/k8s/install.sh -n $CONTIV_MASTER
  • NOTE: Use the same IP for CONTIV_MASTER as you use for --api-advertise-addresses in kubeadm init.
  • To install Contiv specifying a data plane interface for VLAN:
    sudo ./install/k8s/install.sh -n $CONTIV_MASTER -v <data plane interface like eth1>
  • NOTE: Ensure that the data plane interface is the same on all the worker nodes.
  • To install Contiv with ACI:
    ./install/k8s/install.sh -n $CONTIV_MASTER -a <APIC URL> -u <APIC User> -p <APIC Password> -l <Leaf Nodes> -d <Physical Domain> -e <EPG Bridge domain> -m <APIC contracts unrestricted mode>
    For example:
    ./install/k8s/install.sh -n <netmaster DNS/IP> -a https://apic_host:443 -u apic_user -p apic_password -l topology/pod-xxx/node-xxx -d phys_domain -e not_specified -m no
    where $CONTIV_MASTER is the Contiv proxy or Net Master IP.
  • To install Contiv with a custom infra network and gateway:
    ./install/k8s/install.sh -n <netmaster DNS/IP> -g <GATEWAY IP> -i <SUBNET>
  • To see additional install options, run
    ./install/ansible/install.sh.

Removing Contiv

  • To uninstall Contiv, retaining the etcd state, run:
    sudo ./install/k8s/uninstall.sh
  • To uninstall Contiv, cleaning up the etcd state, run:
    sudo ./install/k8s/uninstall.sh etcd-cleanup.
    Use this option to cleanup all the Contiv network state.
  • To stop Contiv, go to the install folder contiv-$VERSION and run:
    kubectl delete -f .contiv.yaml
  • To start Contiv, go to the install folder contiv-$VERSION and run:
    kubectl apply -f .contiv.yaml
  • To remove etcd state when Contiv is stopped, run:
    rm -rf /var/etcd/contiv-data

install's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

install's Issues

contiv-compose instllation error

With latest beta2 version


TASK [docker : check docker-compose version] ***********************************
fatal: [node2]: FAILED! => {"changed": true, "cmd": "docker-compose --version", "delta": "0:00:00.002427", "end": "2017-02-16 02:56:07.196436", "failed": true, "rc": 127, "start": "2017-02-16 02:56:07.194009", "stderr": "/bin/sh: docker-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [node3]: FAILED! => {"changed": true, "cmd": "docker-compose --version", "delta": "0:00:00.004267", "end": "2017-02-16 02:56:07.331148", "failed": true, "rc": 127, "start": "2017-02-16 02:56:07.326881", "stderr": "/bin/sh: docker-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [node1]: FAILED! => {"changed": true, "cmd": "docker-compose --version", "delta": "0:00:00.004044", "end": "2017-02-16 02:56:07.334328", "failed": true, "rc": 127, "start": "2017-02-16 02:56:07.330284", "stderr": "/bin/sh: docker-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

TASK [docker : download and install docker-compose] ****************************
changed: [node2]
changed: [node1]
changed: [node3]

TASK [docker : check contiv-compose version] ***********************************
fatal: [node2]: FAILED! => {"changed": true, "cmd": "contiv-compose --version", "delta": "0:00:00.002615", "end": "2017-02-16 02:56:15.713991", "failed": true, "rc": 127, "start": "2017-02-16 02:56:15.711376", "stderr": "/bin/sh: contiv-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [node1]: FAILED! => {"changed": true, "cmd": "contiv-compose --version", "delta": "0:00:00.003675", "end": "2017-02-16 02:56:18.832096", "failed": true, "rc": 127, "start": "2017-02-16 02:56:18.828421", "stderr": "/bin/sh: contiv-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring
fatal: [node3]: FAILED! => {"changed": true, "cmd": "contiv-compose --version", "delta": "0:00:00.003446", "end": "2017-02-16 02:56:18.818676", "failed": true, "rc": 127, "start": "2017-02-16 02:56:18.815230", "stderr": "/bin/sh: contiv-compose: command not found", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

is it expected ? If we are not using contiv-compose , we should not be installing via ansible imo.

Installer for Docker-Swarm needs to exit with status code != 0 on all failures

The installer doesn't always exit with a non-zero status code on a failure.

Shell scripts succeed (0) or fail (non-zero) based on the last command issued. If that command is an echo, it succeeds and the script exits with 0.

The problem with this is Jenkins uses the exit codes of commands and scripts to determine if the step succeeded or not. So printing an error statement and exiting with 0 means "Success."

The last error Installer encountered was that the Netmaster didn't come up after 10 tries, so it printed an error messaged and exited...with 0.

contiv-compose not working with netmaster API

After installing 1.0.0-beta.2, contiv-compose does not seem to work and gets lots of "Page not found" errors when attempting to build contiv policies:

[cgascoig@aci-docker-1 test]$ contiv-compose up -d
WARN[0000] Note: This is an experimental alternate implementation of the Compose CLI (https://github.com/docker/compose) 
ERRO[0000] Unable to create endpoint group. Tenant 'default' Network 'private' Epg 'test_web'. Error Page not found! 
ERRO[0000] Unable to add epg for service 'web'. Error Page not found! 
ERRO[0000] Unable to apply policies for unspecified tiers. Error Page not found! 
ERRO[0000] Unable to apply links based policy. Error: Page not found! 
INFO[0000] Applying labels based policies               
INFO[0000] Project [test]: Starting project             
INFO[0000] [0/2] [redis]: Starting                      
ERRO[0001] Failed Starting web : 500 Internal Server Error: Error response from daemon: network test_web.private not found
 
ERRO[0002] Failed Starting redis : 500 Internal Server Error: Error response from daemon: network test_redis.private not found
 
ERRO[0002] Failed to start: redis : 500 Internal Server Error: Error response from daemon: network test_redis.private not found
 
ERRO[0002] Failed to start: web : 500 Internal Server Error: Error response from daemon: network test_web.private not found
 
FATA[0002] 500 Internal Server Error: Error response from daemon: network test_redis.private not found

It looks like contiv-compose is not using the correct netmaster API - using strace I can see that it is using API calls like: POST /api/endpointGroups/default which I believe should be POST /api/v1/endpointGroups/default

Contiv k8s install.sh needs to exit with non-zero if failure

Jenkins tried running the K8s install.sh but did it in the wrong directory and didn't do it as root.

This caused many errors, but the script still thought it succeeded, and printed out how to get to the Contiv UI.

If an error is encountered (not root, missing file, etc.) print an error message and exit with status = 1.

etcd not installed when the script is run without -i

Hi,

If docker is already installed but etcd is not the script logic has a bug and won't install etcd.

You first do this:
if [ "$cluster_store" == "" ];then
cluster_store="etcd://$service_vip:2379"
fi
and then you check

if [ "$cluster_store" = "" ];
INSTALL ETCD

This will clearly never work

uninstaller does not clean up docker containers

docker swarm + contiv one does not clean up containers.

Unsintall should do EXACT reverse of all the tasks we are doing with installer so that we can get our VMs back in original shape.

Swarm + Contiv installatoin

Can we display some success message at the end of installation ?
Current script just displays standard ansible message.

net_demo_installer script displays message like this

echo "Install complete."
echo "========================================================="
echo " "
echo "Please export DOCKER_HOST=${DOCKER_HOST} in your shell before proceeding"
echo " "
echo "========================================================="

Contiv.yml

Hi,

This readme requires notes on having to update the contiv.yml file otherwise the uplink interfaces are never deployed. See below on the logs I'm seeing from my worker nodes.

time="Feb 16 15:56:12.527659444" level=error msg="OfnetBgp currently supports only one uplink interface. Num uplinks configured: 0"
time="Feb 16 15:56:12.527757579" level=info msg="Waiting for OVS switch(vlan) to connect.."
time="Feb 16 15:56:12.527921909" level=info msg="Listening for connections on :6634"

Process for Installing Contiv with K8s

I feel there should be a process for installing Contiv with a working installation of K8s. If you look at other products such as Calico or Flannel, they have these well documented. This instruction on this page fall short where they're dependant of Kubeadm - you can't use Kubeadm in production as it doesn't support a multi-master deployment.

My recommendation is to document the process for installing Contiv with a working installation of K8s.

have runtime parameter to specify aci-gw version

Current installer does not take aci-gw container customer image information as a command line. Can we please make that parameter as run time so that I can pass in to script ?

aci-gw-image="contiv/aci-gw:"

uninstaller failed


TASK [stop docker] *************************************************************
changed: [node1]
changed: [node3]
changed: [node2]

TASK [stop docker tcp socket] **************************************************
fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Could not find the requested service \"'docker-tcp.socket'\": "}
...ignoring
fatal: [node2]: FAILED! => {"changed": false, "failed": true, "msg": "Could not find the requested service \"'docker-tcp.socket'\": "}
...ignoring
fatal: [node3]: FAILED! => {"changed": false, "failed": true, "msg": "Could not find the requested service \"'docker-tcp.socket'\": "}
...ignoring

TASK [cleanup iptables for docker] *********************************************
 [WARNING]: The loop variable 'item' is already in use. You should set the
`loop_var` value in the `loop_control` option for the task to something else to
avoid variable collisions and unexpected behavior.
 [WARNING]: The loop variable 'item' is already in use. You should set the
`loop_var` value in the `loop_control` option for the task to something else to
avoid variable collisions and unexpected behavior.
 [WARNING]: The loop variable 'item' is already in use. You should set the
`loop_var` value in the `loop_control` option for the task to something else to
avoid variable collisions and unexpected behavior.
changed: [node1] => (item=2385)
changed: [node2] => (item=2385)
changed: [node3] => (item=2385)

PLAY RECAP *********************************************************************
node1                      : ok=36   changed=13   unreachable=0    failed=0
node2                      : ok=31   changed=7    unreachable=0    failed=0
node3                      : ok=31   changed=7    unreachable=0    failed=0

Uninstallation failed
=========================================================
 Please check contiv_uninstall.log for errors.
=========================================================

Installer used : https://github.com/contiv/install/releases/download/1.0.0-beta.2/contiv-1.0.0-beta.2.tgz

command I tried :

./install/ansible/uninstall_swarm.sh -e -u centos -f install/ansible/cfg.yml -i -n

Documentation for swarm + contiv installation

there is no documentation for swarm installation+ contiv here .
It used to be there earlier in netplugin/install.
1: should we copy that ?
2: Should we just maintain one repo : contiv/netplugin for all installers ?

script to check softwares are installed or not, before trying out tutorial

before we do run make demo-*, a single script to check following :

1: Check whether you have vagrant installer or not. (version > some basic working version of vagrant say 1.8.x)
2: Check whether you have installed Virtualbox or not ?
3: Check whether you have installed docker-engine and its running or not.
4: Check if git is present or not.

installer failed to install ovs

PLAY [netplugin-master] ********************************************************

TASK [setup] *******************************************************************
ok: [node1]

TASK [contiv_network : get openstack kilo repo] ********************************
fatal: [node1]: FAILED! => {"changed": false, "dest": "/tmp/rdo-release-kilo-2.noarch.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "https://repos.fedorapeople.org/repos/openstack/openstack-kilo/rdo-release-kilo-2.noarch.rpm"}
	to retry, use: --limit @/ansible/install_plays.retry

PLAY RECAP *********************************************************************
node1                      : ok=30   changed=10   unreachable=0    failed=1
node2                      : ok=29   changed=10   unreachable=0    failed=0
node3                      : ok=29   changed=10   unreachable=0    failed=0

Installation failed
=========================================================
 Please check ./config/contiv_install_04-12-2017.22-52-28.UTC.log for errors.
=========================================================

version : beta6

un-installer does not delete docker containers which user has created

Command used : ./install/ansible/uninstall_swarm.sh -f install/ansible/cfg.yml -e ~/.ssh/id_rsa.test -u admin -i -r

O/p from one of the node :

[admin@gaurav-vm-3 ~]$ docker ps
CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS               NAMES
4340a2aba546        contiv/alpine                "sh"                29 minutes ago      Up 29 minutes                           gaurav-vm-2/Bcont
36ab6bc37409        contiv/alpine                "sh"                29 minutes ago      Up 12 minutes                           gaurav-vm-1/Acont
c3eb3c7166b2        quay.io/coreos/etcd:v2.3.8   "/etcd"             41 minutes ago      Up 41 minutes                           gaurav-vm-1/etcd
e372f638a59b        quay.io/coreos/etcd:v2.3.8   "/etcd"             41 minutes ago      Up 41 minutes                           gaurav-vm-2/etcd
083324440ee8        quay.io/coreos/etcd:v2.3.8   "/etcd"             41 minutes ago      Up 41 minutes                           gaurav-vm-3/etcd


[admin@gaurav-vm-3 ~]$ sudo service netmaster status
Redirecting to /bin/systemctl status  netmaster.service
โ— netmaster.service - Netmaster
   Loaded: loaded (/etc/systemd/system/netmaster.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2017-04-12 15:24:38 PDT; 3min 5s ago
 Main PID: 9953 (code=killed, signal=TERM)

Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.506916287" level=info msg="Received CreateEndpointRequest: {TenantName:TestTenant NetworkName:TestNet ServiceName:epgA ...
Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.835729752" level=info msg="Received Endpoint CReate from Remote netplugin"
Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.835810090" level=info msg="Sending endpoint: &{EndpointID:10.1.1.1:TestTenant EndpointType:2 EndpointGroup:1 IpAddr:10....
Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.836805998" level=info msg="Sending endpoint: &{EndpointID:10.1.1.1:TestTenant EndpointType:2 EndpointGroup:1 IpAddr:10....
Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.837422275" level=info msg="Sending endpoint: &{EndpointID:10.1.1.1:TestTenant EndpointType:2 EndpointGroup:1 IpAddr:10....
Apr 12 15:15:26 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:26.837931836" level=info msg="Sending endpoint: &{EndpointID:10.1.1.1:TestTenant EndpointType:2 EndpointGroup:1 IpAddr:10....
Apr 12 15:15:27 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:15:27.033650566" level=info msg="Received EndpointUpdateRequest {{IPAddress:10.1.1.1 ContainerID:36ab6bc37409b83aeb2be9f00b6b...
Apr 12 15:24:35 gaurav-vm-3 netmaster[9953]: time="Apr 12 15:24:35.365104776" level=info msg="Received EndpointUpdateRequest {{IPAddress: ContainerID:5765eb645adb4b2c8edc15a0...mmonName:}}"
Apr 12 15:24:38 gaurav-vm-3 systemd[1]: Stopping Netmaster...
Apr 12 15:24:38 gaurav-vm-3 systemd[1]: Stopped Netmaster.
Hint: Some lines were ellipsized, use -l to show in full.
[admin@gaurav-vm-3 ~]$

You can see the Acont and Bcont are still running
Installer version: beta6

while doing make demo-swarm getting this error

git clone latest install code
cd install
export BUILD_VERSION=1.0.0-beta.4
make demo-swarm

==> contiv-node4: Complete!
BUILD_VERSION= make install-test-swarm
~/Desktop/INSTALL/install/cluster ~/Desktop/INSTALL/install
~/Desktop/INSTALL/install
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
make[1]: *** [install-test-swarm] Error 1
make: *** [demo-swarm] Error 2

swarm installer gets stuck sometime here


TASK [base : install/upgrade base packages (redhat)] ***************************
ok: [node1] => (item=[u'yum-utils', u'ntp', u'unzip', u'bzip2', u'curl', u'python-requests', u'bash-completion', u'libselinux-python', u'e2fsprogs', u'openssh-server'])

more info regarding installer

Following things to be added to documentation. Please dont add as help in script. Let us aim to add in some readme.

1: please explain why we need contiv install. This is the most confusing part according to other people.

2: for public cloud list of ports to be opened

3: minimum number of nodes (master and worker) requirements. Will installer work on 1 master node only ? will install work one worked node only ? do we always need 1 master and equal or more than 1 worker nodes.

4: for public cloud do we need eth0 and eth1 interface ? or will just eth0 work ? why do we need two interfaces for public cloud if we are just supporting vxlan mode in public cloud

Update make demo* documentation

Creating this to track doc updates to make demo* scripts

  1. Set the build version to use for the demo
    export BUILD_VERSION=1.0.0-beta.3
  2. Document how to access the Vagrant VMs
    cd cluster && CONTIV_KUBEADM=1 vagrant ssh contiv-node1 for k8s
    and
    cd cluster && vagrant ssh contiv-node3 for swarm

Documentation for kubernetes install

Document says to continue to kubernetes install getting started page. Per @neelimamukiri , should skip step 3, labeled (3/4) Installing a pod network. The contiv install is done instead of step 3. Also, it could be mentioned that step 4 (Joining nodes) of the kubernetes install can be done either before or after step 3 (which would be running the contiv install).

make demo-k8s does not work on RHEL

bash:master$ make demo-k8s
BUILD_VERSION=1.0.0-rc1 CONTIV_KUBEADM=1 make cluster
cd cluster && vagrant destroy -f
==> contiv-node2: VM not created. Moving on...
==> contiv-node1: VM not created. Moving on...
cd cluster && vagrant up
Bringing machine 'contiv-node1' up with 'virtualbox' provider...
Bringing machine 'contiv-node2' up with 'virtualbox' provider...
==> contiv-node1: Importing base box 'rhel7'...
==> contiv-node1: Matching MAC address for NAT networking...
==> contiv-node1: Setting the name of the VM: cluster_contiv-node1_1492208304112_19835
==> contiv-node1: Clearing any previously set network interfaces...
==> contiv-node1: Preparing network interfaces based on configuration...
    contiv-node1: Adapter 1: nat
    contiv-node1: Adapter 2: hostonly
    contiv-node1: Adapter 3: intnet
==> contiv-node1: Forwarding ports...
    contiv-node1: 10000 (guest) => 10055 (host) (adapter 1)
    contiv-node1: 22 (guest) => 2222 (host) (adapter 1)
==> contiv-node1: Running 'pre-boot' VM customizations...
==> contiv-node1: Booting VM...
==> contiv-node1: Waiting for machine to boot. This may take a few minutes...
    contiv-node1: SSH address: 127.0.0.1:2222
    contiv-node1: SSH username: vagrant
    contiv-node1: SSH auth method: private key
    contiv-node1: Warning: Remote connection disconnect. Retrying...
==> contiv-node1: Machine booted and ready!
==> contiv-node1: Registering box with vagrant-registration...




Network error, unable to connect to server. Please see /var/log/rhsm/rhsm.log for more information.
Registering to: subscription.rhn.redhat.com:443/subscription The system has been registered with ID: 8ef721ba-22ed-4dc4-b9d0-3453e2970b17
make[1]: *** [cluster] Error 1
make: *** [demo-k8s] Error 2

Move net_demo_installer to contiv/install

We seem to have a requirement to run the install from a node in the host.
So allow installing ansible on one of the nodes and running installer from there rather than as a container from an external host.

Kubeadm 1.6.x based install - pod scheduling

It doesn't seem to be matching the specified taints. API proxy is trying to be scheduled on non master nodes. We need to check if the taints have changed and update the yaml files accordingly.

Symptom is that make demo-k8s fails with


  1. Retry login to Contiv
  2. Retry login to Contiv
  3. Retry login to Contiv
    ...
  4. Retry login to Contiv
    Install FAILED
    make[1]: *** [install-test-kubeadm] Error 1

Installer needs to exit with a non-zero status if it fails

There's a problem with the repo that is preventing Installer from getting a certain RPM.

The installer prints "Install Failed" but since this command succeeds, the script exits with status = 0, which means the entire script succeeded.

Unfortunately this means for automation and other integration systems, Installer will always "succeed" - even when it failed.

Exiting with status code=1 is sufficient in the case of a failure.

/run out of free space while using v2plugin:0.1

Hi there,

I just realized on my 3 Docker Swarm nodes, that v2plugin:0.1 filled the /run partitions 100%..

[root@node1 ~]# df -h 
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/cl_inslnxcl02-root   17G  3.6G   14G  21% /
devtmpfs                        1.8G     0  1.8G   0% /dev
tmpfs                           1.9G     0  1.9G   0% /dev/shm
tmpfs                           1.9G  1.9G     0 100% /run
tmpfs                           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda1                      1014M  166M  849M  17% /boot
shm                              64M     0   64M   0% /var/lib/docker/containers/84ef582039226d0daab4e81a0e527108b4a89fb5069a4c0c030345d75c454815/shm
[root@node1 ~]# du -hs /run/contiv/log/*
480M	/run/contiv/log/netmaster.log
793M	/run/contiv/log/netplugin.log
379M	/run/contiv/log/ovs-db.log
3.8M	/run/contiv/log/ovs-vswitchd.log
48K	/run/contiv/log/plugin_bootup.log
[root@node1 ~]# ls -lah /run/contiv/log/*
-rw-r--r--. 1 root root 480M Apr 20 11:11 /run/contiv/log/netmaster.log
-rw-r--r--. 1 root root 793M Apr 20 11:11 /run/contiv/log/netplugin.log
-rw-r--r--. 1 root root 379M Apr 20 11:11 /run/contiv/log/ovs-db.log
-rw-r--r--. 1 root root 3.8M Apr 20 09:28 /run/contiv/log/ovs-vswitchd.log
-rw-r--r--. 1 root root  47K Apr 13 20:25 /run/contiv/log/plugin_bootup.log

Why do you place logs in /run and not /var/log? I also think a provided logrotate would be nice. Is there a way to lower the log verbosity to crit, warn or something like this?

Example log entries:
/run/contiv/log/netplugin.log:

time="Apr 18 14:19:03.963236637" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.963324076" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.964007128" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.964079151" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.964973818" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.965059934" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch" 
time="Apr 18 14:19:03.965590669" level=error msg="Error client: etcd cluster is unavailable or misconfigured during watch"

/run/contiv/log/netmaster.log:

time="Apr 18 14:19:07.717891842" level=debug msg="Refreshing key: /contiv.io/service/netmaster.rpc/192.168.X.X:9001" 
time="Apr 18 14:19:07.718227654" level=debug msg="Refreshing key: /contiv.io/service/netmaster/192.168.X.X:9999" 
time="Apr 18 14:19:07.718403614" level=error msg="Error setting key /contiv.io/service/netmaster.rpc/192.168.X.X:9001, Err: client: etcd cluster is unavailable or misconfigured" 
time="Apr 18 14:19:07.718471773" level=error msg="Error setting key /contiv.io/service/netmaster/192.168.X.X:9999, Err: client: etcd cluster is unavailable or misconfigured" 

Thanks.

Regards,
Philip

make demo-swarm is asking for additional python modules.

git clone latest installer code
cd install
export BUILD_VERSION=1.0.0-beta.4
make demo-swarm

make demo-swarm
Traceback (most recent call last):
  File "./scripts/get_latest_release.py", line 4, in <module>
    import requests
ImportError: No module named requests
Traceback (most recent call last):
  File "./scripts/get_latest_release.py", line 4, in <module>
    import requests
ImportError: No module named requests
BUILD_VERSION= make cluster

Container 101 tutorial not working anymore

http://contiv.github.io/documents/tutorials/container-101.html

[vagrant@tutorial-node1 ~]$ etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured
error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

[vagrant@tutorial-node1 ~]$ netctl version
ERRO[0030] Get http://netmaster:9999/version: dial tcp 192.168.2.10:9999: i/o timeout
[vagrant@tutorial-node1 ~]$ logout
Connection to 127.0.0.1 closed.

Installer fails as hosts are not accessible though passwordless ssh is setup properly

Executing this from mac

./install/ansible/install_swarm.sh -f install/ansible/cfg.yml -e ~/.ssh/ssh-key -u admin -i
Starting the installer container
Generating Ansible configuration
Using 15.29.37.216 as the master node
Verifying ansible reachability
WARNING Some of the hosts are not accessible via passwordless SSH

This means either the host is unreachable or passwordless SSH is not
set up for it. Please resolve this before proceeding.

The same host is accessible via ssh from the host:

ssh [email protected] -i ~/.ssh/ssh-key
Warning: Permanently added '15.29.37.216' (ECDSA) to the list of known hosts.
Last login: Fri Apr 14 16:46:26 2017 from pfsense-contiv.cisco.com
[admin@user-vm-1 ~]$

make demo swarm fails


==> contiv-node4:   Verifying  : python-IPy-0.75-6.el7.noarch                                4/11
==> contiv-node4:
==> contiv-node4:   Verifying  : libcgroup-0.41-11.el7.x86_64                                5/11
==> contiv-node4:
==> contiv-node4:   Verifying  : policycoreutils-python-2.5-11.el7_3.x86_64                  6/11
==> contiv-node4:
==> contiv-node4:   Verifying  : checkpolicy-2.5-4.el7.x86_64                                7/11
==> contiv-node4:
==> contiv-node4:   Verifying  : libsemanage-python-2.5-5.1.el7_3.x86_64                     8/11
==> contiv-node4:
==> contiv-node4:   Verifying  : setools-libs-3.3.8-1.1.el7.x86_64                           9/11
==> contiv-node4:
==> contiv-node4:   Verifying  : libsemanage-2.5-4.el7.x86_64                               10/11
==> contiv-node4:   Verifying  : policycoreutils-2.5-9.el7.x86_64                           11/11
==> contiv-node4:
==> contiv-node4:
==> contiv-node4: Installed:
==> contiv-node4:   policycoreutils-python.x86_64 0:2.5-11.el7_3
==> contiv-node4:
==> contiv-node4: Dependency Installed:
==> contiv-node4:   audit-libs-python.x86_64 0:2.6.5-3.el7
==> contiv-node4:   checkpolicy.x86_64 0:2.5-4.el7
==> contiv-node4:   libcgroup.x86_64 0:0.41-11.el7
==> contiv-node4:   libsemanage-python.x86_64 0:2.5-5.1.el7_3
==> contiv-node4:   python-IPy.noarch 0:0.75-6.el7
==> contiv-node4:   setools-libs.x86_64 0:3.3.8-1.1.el7
==> contiv-node4:
==> contiv-node4: Dependency Updated:
==> contiv-node4:   libsemanage.x86_64 0:2.5-5.1.el7_3    policycoreutils.x86_64 0:2.5-11.el7_3
==> contiv-node4:
==> contiv-node4: Complete!
BUILD_VERSION=1.0.0-beta.2 make install-test-swarm
~/Desktop/INSTALL/install/cluster ~/Desktop/INSTALL/install
~/Desktop/INSTALL/install
./scripts/swarm_test.sh: line 17: cd: release: No such file or directory
make[1]: *** [install-test-swarm] Error 1
make: *** [demo-swarm] Error 2
bash:master$

installer failed on aws


TASK [contiv_network : setup hostname alias] ***********************************
changed: [node1] => (item={u'regexp': u'^127\\.0\\.0\\.1', u'line': u'127.0.0.1 localhost'})
changed: [node3] => (item={u'regexp': u'^127\\.0\\.0\\.1', u'line': u'127.0.0.1 localhost'})
changed: [node2] => (item={u'regexp': u'^127\\.0\\.0\\.1', u'line': u'127.0.0.1 localhost'})
changed: [node1] => (item={u'regexp': u' ip-172-31-2-165$', u'line': u'172.31.2.165 ip-172-31-2-165'})
changed: [node3] => (item={u'regexp': u' ip-172-31-14-168$', u'line': u'172.31.14.168 ip-172-31-14-168'})
changed: [node2] => (item={u'regexp': u' ip-172-31-14-177$', u'line': u'172.31.14.177 ip-172-31-14-177'})

TASK [contiv_network : copy environment file for netmaster] ********************
changed: [node3]
changed: [node1]
changed: [node2]

TASK [contiv_network : copy systemd units for netmaster] ***********************
changed: [node3]
changed: [node1]
changed: [node2]

TASK [contiv_network : start netmaster] ****************************************
fatal: [node2]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service netmaster: Failed to start netmaster.service: Transaction is destructive.\nSee system logs and 'systemctl status netmaster.service' for details.\n"}
fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service netmaster: Failed to start netmaster.service: Transaction is destructive.\nSee system logs and 'systemctl status netmaster.service' for details.\n"}
fatal: [node3]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service netmaster: Failed to start netmaster.service: Transaction is destructive.\nSee system logs and 'systemctl status netmaster.service' for details.\n"}
	to retry, use: --limit @/ansible/install_plays.retry

PLAY RECAP *********************************************************************
node1                      : ok=62   changed=49   unreachable=0    failed=1
node2                      : ok=62   changed=49   unreachable=0    failed=1
node3                      : ok=62   changed=49   unreachable=0    failed=1

Installation failed
=========================================================
 Please check contiv_install.log for errors.
=========================================================
[admin@contiv222 contiv-1.0.0-beta.2]$

Permission denied (publickey) error when connecting through SSH on AWS after Contiv installation

There is an issue that may most likely be caused by Contiv.
Immediately after installing Contiv on an AWS instance the home of the user change ownership. The new owner of the home directory becomes "staff".
So if you have an instance with Ubuntu OS, the owner of /home/ubuntu is replaced by "staff", causing the instance to be no more accessible with SSH. The workaround is to create a new instance, stop old and new instance, detaching the volume from old instance, attaching the volume to new instance then starting the new instance and then change the ownership of the home folder. Then you can stop the new instance, detach the volume from old instance, attach it to the old instance, and finally the old instance is accessible again via SSH.

Ansible Options hardcoded in install_swam.sh and uninstall_swarm.sh causes issues in non-Vagrant environments

The install_swarm.sh and the uninstall_swarm.sh have the Ansible options hardcoded to a vagrant like environment.

Ansible options. By default, this specifies a private key to be used and the vagrant user

ans_opts=""
ans_user="vagrant"
ans_key=$src_conf_path/insecure_private_key
install_scheduler=""

Should either be made accessible through the CLI as parameters or mentioned in document to modify these values according to setup environment.

If not modified, and a non-Vagrant environment is used this causes problems with login to other nodes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.