GithubHelp home page GithubHelp logo

hashicorp / consul-helm Goto Github PK

View Code? Open in Web Editor NEW
423.0 46.0 388.0 2.71 MB

Helm chart to install Consul and other associated components.

License: Mozilla Public License 2.0

Smarty 0.46% Shell 74.01% HCL 1.39% Makefile 0.04% Dockerfile 0.29% Go 23.82%

consul-helm's Introduction

Consul Helm Chart

⚠️ The Consul Helm chart has been moved to hashicorp/consul-k8s under the charts/consul directory. ⚠️

Please direct all pull requests and issues to that repository.

Why We Moved consul-helm

For users, the separate repositories lead to difficulty on new releases and confusion surrounding versioning. Most of the time new releases that include changes to consul-k8s also change consul-helm. But separate repositories mean separate GitHub PR's and added confusion in opening new Github Issues. In addition, we maintain separate versions of the consul-k8s binary and the Consul Helm chart, which in most cases are more tightly coupled together with dependencies. This versioning strategy has also led to confusion as to which Helm charts are compatible with which versions of consul-k8s.

consul-helm's People

Contributors

adilyse avatar alvin-huang avatar anubhavmishra avatar david-yu avatar hamishforbes avatar ilpianista avatar ishustava avatar jhandguy avatar kschoche avatar lawliet89 avatar liviudm avatar ljupchokotev avatar lkysow avatar lord-y avatar m-yosefpor avatar maskshell avatar michaelgeorgeattard avatar milk avatar mitchellh avatar mmisztal1980 avatar msiedlarek avatar ndhanushkodi avatar peterklijn avatar pviniciusfm avatar s3than avatar schristoff avatar tehmoon avatar thisisnotashwin avatar tomwganem avatar zinref avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

consul-helm's Issues

Talking to agent with downward API fails when hosts have multiple IPs (ex. EKS w/ aws-cniI)

Hello,

I have an EKS cluster with 3 workers, and deployed Consul and Rabbitmq via Helm. Both services are running, and I am able to access the Consul UI.

Each eks-worker in AWS has a private IP, as well as a few secondary IPs, which are used for Pods.

I am attempting to get syncCatalog working, but I am getting connection refused.

2018-10-17T23:52:23.572Z [WARN ] to-consul/sink: error registering service: node-name=ip-10-20-132-50.ec2.internal service-name=halting-penguin-rabbitmq err="Put http://10.20.62.49:8500/v1/catalog/register: dial tcp 10.20.62.49:8500: connect: connection refused"

IP - 10.20.62.49 is a primary ip of one of my eks-workers, however the pod that runs consul server (8500) is actually running on a secondary IP - 10.20.61.120 on that same eks-worker.

kubectl describe pod consul-server-2
Name:           consul-server-2
Namespace:      default
Node:           ip-10-20-62-49.ec2.internal/10.20.62.49
Start Time:     Wed, 17 Oct 2018 16:12:01 -0700
Labels:         app=consul
                chart=consul-0.1.0
                component=server
                controller-revision-hash=consul-server-66c8b8459c
                hasDNS=true
                release=consul
                statefulset.kubernetes.io/pod-name=consul-server-2
Annotations:    consul.hashicorp.com/connect-inject: false
Status:         Running
IP:             10.20.61.120

How can I register services the Consul cluster via the pod IP address which is on my eks-worker's secondary IP, and not connect to the primary ip?

Please let me know if I need to provide more information

Thank you

Please provide an example of extraConfig so TLS can work

consul-helm/values.yaml

Lines 70 to 73 in ec0de41

# extraConfig is a raw string of extra configuration to set with the
# server. This should be JSON.
extraConfig: |
{}

How exactly should client.extraConfig and client.extraVolumes be set so that we can have TLS enabled from Consul client to a Consul cluster outside of kubernetes?

Could you provide examples?

Trial and error formatting this 'raw JSON payload' via "helm --set client.extraConfig" isn't working out

We want to not have to customize values.yaml by hand but rather just use the chart

consul-sync-catalog doesn't start when toK8s is false

We don't want, for now, to register Consul services as K8s services (we are just reaching those using service_name.service.domain), so we are using this config:

syncCatalog:
  # True if you want to enable the catalog sync. "-" for default.
  enabled: true
  image: null

  # toConsul and toK8S control whether syncing is enabled to Consul or K8S
  # as a destination. If both of these are disabled, the sync will do nothing.
  toConsul: true
  toK8S: false

However, having toK8S: false makes consul-sync-catalog not to start:

consul-sync-catalog-644884767d-b97bn   0/1       Error     2         19s
consul-sync-catalog-644884767d-b97bn   0/1       CrashLoopBackOff   2         20s
consul-sync-catalog-644884767d-b97bn   0/1       Error     3         48s
consul-sync-catalog-644884767d-b97bn   0/1       CrashLoopBackOff   3         1m
consul-sync-catalog-644884767d-b97bn   0/1       Error     4         1m
consul-sync-catalog-644884767d-b97bn   0/1       CrashLoopBackOff   4         1m
consul-sync-catalog-644884767d-b97bn   1/1       Running   5         2m
consul-sync-catalog-644884767d-b97bn   0/1       Error     5         2m
consul-sync-catalog-644884767d-b97bn   0/1       CrashLoopBackOff   5         3m

The logs for thad pod are:

➜  ~ kubectl logs -f  consul-sync-catalog-644884767d-k6th9                                                                                                                      
2018-10-11T05:38:31.376Z [INFO ] to-consul/sink: ConsulSyncer quitting
ERROR: logging before flag.Parse: E1011 05:38:31.377307       6 controller.go:115] Error syncing cache                                                                          
2018-10-11T05:38:31.378Z [INFO ] to-consul/source: starting runner for endpoints
ERROR: logging before flag.Parse: E1011 05:38:31.378587       6 controller.go:115] Error syncing cache                                                                          

When toK8S: true K8s services are registered in Consul correctly, but as I explained, we don't want Consul to K8S to be synced. We are not using CoreDNS for now and it seems it's required for toK8S

Consul fails to start on Minikube

May be a duplicate of other issues (#9, #10), but they are closed without solution.

Helm: v2.11.0
Minikube: v0.30.0
Consul-Helm: checked out at v0.3.0

Installation process:

$ git clone https://github.com/hashicorp/consul-helm.git
$ cd consul-helm
$ git checkout v0.3.0
$ helm install --name consul ./

After installing via Helm, the pods either never Ready or sit in pending state:

$ => kubectl get pods
NAME              READY   STATUS    RESTARTS   AGE
consul-mw5pz      0/1     Running   0          2m
consul-server-0   0/1     Running   0          2m
consul-server-1   0/1     Pending   0          2m
consul-server-2   0/1     Pending   0          2m

Logs from the consul pod are endless loop of:

2018/11/08 09:04:12 [ERR] agent: failed to sync remote state: rpc error making call: No cluster leader
2018/11/08 09:04:28 [ERR] consul: "Coordinate.Update" RPC failed to server 172.17.0.7:8300: rpc error making call: No cluster leader
2018/11/08 09:04:28 [ERR] agent: Coordinate update error: rpc error making call: No cluster leader
2018/11/08 09:04:34 [ERR] consul: "Catalog.NodeServices" RPC failed to server 172.17.0.7:8300: rpc error making call: No cluster leader

Logs from the first consul-server pod are repeated:

2018/11/08 09:05:16 [ERR] agent: Coordinate update error: No cluster leader
2018/11/08 09:05:27 [ERR] agent: failed to sync remote state: No cluster leader

The failure is straightfoward I think - none of the servers are started in bootstrap mode and the first one refuses to self-elect. The second and third servers are never started because the first is never ready.

Is there some missing information from the readme?

Ability to provide self-signed public CA cert for Vault?

I opened pretty much this same ticket on consul-k8s, but I thought it was worth a shot to open one here too, so please forgive the copy/paste.

I'm trying to use Vault as CA. I'm running Vault with a self-signed cert. I can add the public CA cert for it to the trusted store on my containers, including the Consul server stateful set pods (modifying the stateful set template a little bit), so I'm able to configure Vault as the CA. However, when the Envoy sidecar injected by consul-k8s is spinning up the proxy, I get this:

[2018-12-18 23:29:52.516][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:494] add/update cluster local_app during init
[2018-12-18 23:29:52.516][1][warning][upstream] source/common/config/grpc_mux_impl.cc:226] gRPC config for type.googleapis.com/envoy.api.v2.Cluster update rejected: Failed to load trusted CA certificates from <inline>
[2018-12-18 23:29:52.516][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:70] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Failed to load trusted CA certificates from <inline>
[2018-12-18 23:29:52.516][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:135] cm init: all clusters initialized
[2018-12-18 23:29:52.516][1][info][main] source/server/server.cc:421] all clusters initialized. initializing init manager
[2018-12-18 23:29:52.518][1][warning][upstream] source/common/config/grpc_mux_impl.cc:226] gRPC config for type.googleapis.com/envoy.api.v2.Listener update rejected: Error adding/updating listener public_listener:100.126.123.255:20000: Failed to load trusted CA certificates from <inline>
[2018-12-18 23:29:52.518][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:70] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener public_listener:100.126.123.255:20000: Failed to load trusted CA certificates from <inline>

The line Failed to load trusted CA certificates from <inline> makes me think that it's not able to talk to Vault because of the lack of trust. Indeed if I exec into the sidecar and curl my Vault endpoint, I get the standard "Can't verify server identity" error that goes away with -k.

Is there a way to either pass/inject the public CA cert for the Envoy sidecar to use, or at least set some environment variable to ignore cert verification?

For what it's worth, I'm not entirely sure if this is an issue with Envoy, Consul, this chart, or consul-k8s, but I thought this was a good starting point. Also, it seems to me like something similar is being addressed on the Consul side, but I was wondering if something can be done in the meantime.

Add support for RBAC for consul-k8s in chart

Currently when running the Helm chart against a Kubernetes cluster that has RBAC fully enabled, no RBAC policies exist for consul-k8s which prevents the functionality from being able to communicate with the Kubernetes API.

The required policy to enable consul-k8s to function also does not exist anywhere in the documentation.

Note: this will also be needed when the connect auto-injection is complete as well

Persistent Volume Claims is Pending

I install from helm.

kubectl describe pvc

Events:
  Type       Reason         Age              From                         Message
  ----       ------         ----             ----                         -------
  Normal     FailedBinding  7s (x2 over 7s)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

In my cluster no persistent volumes and storage class.

Which pv and sc i need add to my cluster?

Thanks

setting resources doesn't seem to work

I don't really understand the following change #62. the resources block is an object and it's using the tpl function now which expects a string. What is the expected value to be used to be able to apply something like the following?

resources:
  requests:
    memory: "10Gi"
  limits:
    memory: "10Gi"

Consul client is not exposing the 8500 port as mentioned in guide

It's mentioned in the guide that consul client will expose the port 8500 on host machine but after deploying the consul using helm chart client is not exposing any port on host machine.

kubectl describe pod consul-df8js
Name:           consul-df8js
Namespace:      default
Node:           minikube/10.0.2.15
Start Time:     Fri, 05 Oct 2018 19:56:18 +0530
Labels:         app=consul
                chart=consul-0.1.0
                component=client
                controller-revision-hash=53542314
                hasDNS=true
                pod-template-generation=1
                release=consul
Annotations:    consul.hashicorp.com/connect-inject=false
Status:         Running
IP:             172.17.0.10
Controlled By:  DaemonSet/consul
Containers:
  consul:
    Container ID:  docker://6688a53c6d651d3226bfde66a2cdf77ea193721987b2b81ca6c46c8ac0e26bf3
    Image:         consul:1.2.2
    Image ID:      docker-pullable://consul@sha256:8603f0d1b2278364ecb7c11068a477b1ea648df735eda8791362063aba99656a
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="consul"

exec /bin/consul agent \
  -advertise="${POD_IP}" \
  -bind=0.0.0.0 \
  -client=0.0.0.0 \
  -config-dir=/consul/config \
  -datacenter=dc1 \
  -data-dir=/consul/data \
  -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -domain=consul

    State:          Running
      Started:      Fri, 05 Oct 2018 19:56:54 +0530
    Ready:          True
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-px6mr (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  data:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-client-config
    Optional:  false
  default-token-px6mr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-px6mr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:          <none>

I would suggest to run this in host mode so that client ports are available for other clients which are outside K8 to connect with consul running on K8.

I tried to create the service using NodePort so that I can use that port to connect my external consul client with through the consul client which is running as DS in K8 but no luck.

apiVersion: v1
kind: Service
metadata:
  name: consulclientsvc
  labels:
    run: consulclientsvc
spec:
  type: NodePort
  ports:
  - port: 8500
    targetPort: 8500
    protocol: TCP
    name: consulport
  selector:
    app: consul
    component: client      

kubectl get svc consulclientsvc
NAME              TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
consulclientsvc   NodePort   10.100.249.49   <none>        8500:31664/TCP   11s

kubectl get ep consulclientsvc
NAME              ENDPOINTS          AGE
consulclientsvc   172.17.0.10:8500   20s

But when the external consul client is trying to register with the consul running in k8 it's getting failed with the following error.

docker@consul1:~$ docker run -d --rm --net=host consul agent --retry-join=192.168.99.100:31664 -bind=192.168.99.101
80d24b20ea64d3443c9e89912d2cf2b98787bbb1a1d44b0c8aa93896af7aecc2
docker@consul1:~$ docker logs 80d24b20ea64d3443c9e89912d2cf2b98787bbb1a1d44b0c8aa93896af7aecc2
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.3'
           Node ID: '8a7a2c00-1147-dab0-ad5b-306b4273e869'
         Node name: 'consul1'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 192.168.99.101 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/10/06 08:14:13 [INFO] serf: EventMemberJoin: consul1 192.168.99.101
    2018/10/06 08:14:13 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2018/10/06 08:14:13 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2018/10/06 08:14:13 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
    2018/10/06 08:14:13 [INFO] agent: started state syncer
    2018/10/06 08:14:13 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
    2018/10/06 08:14:13 [INFO] agent: Joining LAN cluster...
    2018/10/06 08:14:13 [INFO] agent: (LAN) joining: [192.168.99.100:31664]
    2018/10/06 08:14:13 [WARN] manager: No servers available
    2018/10/06 08:14:13 [ERR] agent: failed to sync remote state: No known Consul servers
~~

Inconsistent usage of metadata.namespace in chart templates

Ran into this today when using helm template to generate YAML to be subsequently applied with kubectl.

Today only connect-inject-serviceaccount.yaml and sync-catalog-service-account.yaml appear to be setting metadata.namespace via {{ .Release.Namespace }}

Ideally, helm would do this automatically when running template --namespace=<x> to mirror the behavior of helm install --namespace=<x>, but it does not. See: helm/helm#3553 for a longer discussion.

Other projects, such as Istio, have worked around this limitation by including metadata.name on all resource templates. See: istio/istio#4606

Surprising behavior on uninstall && re-install

I was struggling with getting even the most basic examples from da-connect-demo to work. I posted an issue here, but eventually concluded that the issue was with the chart and not the examples...

I had been observing that service registration was failing and seemed to be the source of all my problems. As to why it was failing...

I had installed / deleted / re-installed the Consul Helm chart a number of times in the consul-system namespace. As noted, on every attempt to make the examples work, service registrations failed... silently.

I eventually stumbled upon a realization that things do work when I install the chart in a new namespace. Anytime I re-install to a namespace that previously had the chart installed, things would go back to not working. Clearly, there was some state being left behind that wasn't removed by helm delete consul --purge...

What I had initially failed to realize was that when helm delete consul --purge deleted the StatefulSet for the Consul server, it does not cascade that delete down to PVCs...

So, in any case where I deleted and re-installed in a namespace that already contained those PVCs, I was effectively launching a new cluster of Consul servers backed by old data. I'm not clear on why that doesn't work (one would think it should), but it doesn't.

If I take care to manually delete PVCs (or the entire namespace) after helm delete consul --purge, it gets me back on track for the next time I re-install into the same namespace.

So...

I'm reporting one of the two following issues. Either:

  1. Re-installation of the chart to a namespace where the PVCs already exist should result in a working Consul cluster

or

  1. If there's a valid technical reason that no.1 isn't the case, the need to manually delete PVCs after helm delete consul --purge should be documented to help others avoid the surprises I was encountering.

catalog sync use too much of CPU

I have deployed the helmchart and disabled everything except sync from k8s to Consul (k8s is running in Docker CE app locally) and all is working BUT CPU consumption shoot to the moon the moment this is deployed.
image

the moment the helm chart is undeployed CPU utilization goes back to normal
image

[question] Multi/External Datacenter concept

I would like to describe my understanding and the current state of the project on this topic. It would be nice to discuss how best to use Hashicorp Consul with Kubernetes across multiple Datacenters.
Unfortunately at the moment there are very few guides on the topic "Multi Datacenter Consul Setup on Kubernetes", so I will try to describe the problems that I discovered and would like to discuss them and find the right solutions.

Federation Join

The first problem I see is "Federation with the WAN Gossip Pool".
Imagine we have two independent networks, each has a k8s cluster with the same CIRD for pods (10.0.0.0/14). k8s nodes of the first network will be created in 172.100.0/24 range and nodes of the second network in 174.200.0/24.
For both clusters we use the current helm chart to bootstrap Consul cluster.

If we are going to join these clusters, it would be impossible, because the Consul servers are not reachable from the outside and each consul server has only an internal pod IP address:

screenshot 2018-12-10 at 10 48 05

A possible solution to this problem might be #27 feat: enable consul-servers to be accessed externally, but what if we didn’t want the Consul servers to be accessible with an external ip address. Then we have to open the consul server ports on the host machine (k8s node) that does not overlap with the consul client ports that are already open on the host. I created a PR for this: #84
In addition, we also need to allow connections between networks (aka firewall rules) and to configure the custom iptables for target network on each cluster: https://github.com/bowei/k8s-custom-iptables

I'm not sure that k8s-custom-iptables should be added to the current chart.

If we apply these changes, we can join two clusters.

screenshot 2018-12-10 at 10 50 55

consul-k8s tool

Catalog Sync

consul-k8s tool synchronises k8s services with consul. The problem here is that it is intended only for single cluster and writes only internal pod addresses into the consul service catalog. Also if we create a NodePort-service, only the entrypoints of this service will be written to the catalog. So we cannot reach a service from another datacenter or from any non-k8s vm on the same network.

Consul Connect - Enterprise

consul-k8s tool also inserts a sidecar consul connect proxy into a pod and provide a secure connection to the service. The problem here is the same - it is intended only for single cluster, since the connect proxy service is registered with a private k8s address (pod CIDR range) in the current data center.

screenshot 2018-12-10 at 09 10 39

A possible solution for this problem would be to register the proxy on a random host port or create a k8s NodePort-service for this proxy and sync it to the catalog with an external IP address:

screenshot 2018-12-10 at 14 50 36

P.S.
We really would like to combine several datacenters into one service mesh and use all consul features across these datacenters, but at this stage of the project it is simply impossible.

connect: connection refused when enable syncCatalog

2018-12-05T17:28:53.592Z [INFO ] to-consul/sink: ConsulSyncer quitting
ERROR: logging before flag.Parse: E1205 17:28:53.592459 7 controller.go:115] Error syncing cache
2018-12-05T17:28:53.592Z [INFO ] to-consul/source: starting runner for endpoints
ERROR: logging before flag.Parse: E1205 17:28:53.592565 7 controller.go:115] Error syncing cache
2018-12-05T17:28:53.593Z [WARN ] to-consul/sink: error querying services, will retry: err="Get http://192.168.43.4:8500/v1/catalog/services?index=1&stale=&wait=60000ms: dial tcp 192.168.43.4:8500: connect: connection refused"
2018-12-05T17:28:53.593Z [WARN ] to-consul/sink: error querying services, will retry: err="Get http://192.168.43.4:8500/v1/catalog/services?index=1&stale=&wait=60000ms: dial tcp 192.168.43.4:8500: connect: connection refused"
2018-12-05T17:28:53.593Z [WARN ] to-consul/sink: error querying services, will retry: err="Get http://192.168.43.4:8500/v1/catalog/services?index=1&stale=&wait=60000ms: dial tcp 192.168.43.4:8500: connect: connection refused"

deleting and getting delete error's, objects not found

With the following command, I always get an error on an AKS cluster
helm delete consul --tiller-namespace helm

Error:
Error: deletion completed with 8 error(s): object not found, skipping delete; object not found, skipping delete; object not found, skipping delete; object not found, skipping delete; object not found, skipping delete; object not found, skipping delete; object not found, skipping delete; object not found, skipping delete

additional:
Wat also sometimes happen is that a pvc does not de-provision correctly. Not sure if this is an issue of the combination of helm consul and aks, or just AKS. then i need to do it manually.

Cloud Auto-joining not working with AWS provider

Hi Guys!

I had tried join consul cluster using cloud auto-join in Kubernetes.

I'm using helm for consul with configuration below:

First attempt

client:
enabled: true
image: null
join: ["provider=aws", "tag_key=consul-staging", "tag_value=auto-join"]

Second attempt

join:
- "provider=aws"
- "tag_key=consul-staging"
- "tag_value=auto-join"

For the both ways I got the same error:

Failed to resolve tag_key=consul-staging: lookup tag_key=consul-staging: no such host
* Failed to resolve tag_value=auto-join: lookup tag_value=auto-join: no such host
    2018/11/09 15:21:35 [WARN] agent: Join LAN failed: <nil>, retrying in 30s
    2018/11/09 15:21:37 [WARN] manager: No servers available
    2018/11/09 15:21:37 [ERR] http: Request GET /v1/status/leader, error: No known Consul servers from=127.0.0.1:59748
    2018/11/09 15:21:47 [WARN] manager: No servers available
    2018/11/09 15:21:47 [ERR] http: Request GET /v1/status/leader, error: No known Consul servers from=127.0.0.1:59818
    2018/11/09 15:21:53 [WARN] manager: No servers available
    2018/11/09 15:21:53 [ERR] agent: failed to sync remote state: No known Consul servers
    2018/11/09 15:21:57 [WARN] manager: No servers available
    2018/11/09 15:21:57 [ERR] http: Request GET /v1/status/leader, error: No known Consul servers from=127.0.0.1:59874
    2018/11/09 15:22:05 [INFO] discover-aws: Address type  is not supported. Valid values are {private_v4,public_v4,public_v6}. Falling back to 'private_v4'
    2018/11/09 15:22:05 [INFO] discover-aws: Region not provided. Looking up region in metadata...
    2018/11/09 15:22:05 [INFO] discover-aws: Region is us-east-1
    2018/11/09 15:22:05 [INFO] discover-aws: Filter instances with =
    2018/11/09 15:22:05 [INFO] agent: Discovered LAN servers:
    2018/11/09 15:22:05 [INFO] agent: (LAN) joining: [tag_key=consul-staging tag_value=auto-join]
    2018/11/09 15:22:05 [WARN] memberlist: Failed to resolve tag_key=consul-staging: lookup tag_key=consul-staging: no such host
    2018/11/09 15:22:05 [WARN] memberlist: Failed to resolve tag_value=auto-join: lookup tag_value=auto-join: no such host
    2018/11/09 15:22:05 [INFO] agent: (LAN) joined: 0 Err: 2 error(s) occurred:

Just for test purpose I have changed "join" attribute instead of tag_key and tag_value to consul-cluster IP address:

client:
  enabled: true
  image: null
  join: ["10.29.20.137", "10.29.20.148", "10.29.20.60"] # Fake IPs

With this configuration passing IP address I got join on cluster successfully.

For troubleshooting:

  • I've checked role permissions attached in kubernetes node - OK.

  • I've checked communications between nodes and consul-server - OK for ALL ports.

  • I've have other consul clients using cloud-discovery and that clients works perfectly.

  • I've followed instruction from issue #16 but unsuccessfully.

Can help me?

If you need more detailed information please just let me known.

Thanks,

Register out kubernetes

I have syncCatalog is enabled. I registered outside service on node use rest api consul agent. In kubernetes auto create service(my-service) from consul catalog.

How can i access my-service from kubernetes use dns name?

my-service.default return ip pod of consul-agent(daemontset), but my need ip node where run consul-agent.

Thanks

consul-sync-catalog pod crashes when only syncing to consul catalog

I'm trying to integrate consul in our kubernetes cluster and have it connect to an existing consul cluster. That part works fine. What's not working is enabling "syncCatalog". For now, I just want kubernetes services in the consul catalog. So I've disabled sync to kubernetes. But the consul-sync-catalog pod immediately crashes. The error message is not entirely helpful.

$ kubectl -n consul logs -f consul-sync-catalog-d888cf85d-dpvjz
2018-10-05T06:25:18.706Z [INFO ] to-consul/source: starting runner for endpoints
2018-10-05T06:25:18.707Z [INFO ] to-consul/sink: ConsulSyncer quitting
ERROR: logging before flag.Parse: E1005 06:25:18.707409       7 controller.go:115] Error syncing cache
ERROR: logging before flag.Parse: E1005 06:25:18.707434       7 controller.go:115] Error syncing cache

Here's my values.yaml file.

---
global:
  enabled: true
  domain: consul
  image: consul:1.2.3
  imageK8S: hashicorp/consul-k8s:0.1.0
  datacenter: aoc-devtest
server:
  enabled: "-"
  image:
  replicas: 3
  bootstrapExpect: 3
  storage: 10Gi
  storageClass: ibmc-file-bronze
  connect: true
  resources: {}
  updatePartition: 0
  disruptionBudget:
    enabled: true
    maxUnavailable:
  extraConfig: |
    {
      "encrypt": "secretkey",
      "retry_join_wan": [
        "10.115.173.171",
        "10.115.173.188",
        "10.115.173.176"
      ]
    }
  extraVolumes: []
client:
  enabled: "-"
  image:
  join:
  resources: {}
  extraConfig: |
    {
      "encrypt": "secretkey"
    }
  extraVolumes: []
dns:
  enabled: "-"
ui:
  enabled: "-"
  service:
    enabled: true
    type:
syncCatalog:
  enabled: "-"
  image:
  toConsul: true
  toK8S: false
  k8sPrefix:
connectInject:
  enabled: false
  image: TODO
  default: false
  caBundle: ''
  namespaceSelector:
  certs:
    secretName:
    caBundle: ''
    certName: tls.crt
    keyName: tls.key

Does not install on AWS

Attempt to setup consul via helm using this repo and the directions provided via the blog.
The pods fail with the following.

The pods
`==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.2.3'
Node ID: 'ff2962b2-4429-b113-1ae5-c40f78fb7fb6'
Node name: 'giggly-mouse-consul-4p9zd'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 100.110.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

2018/10/03 17:50:05 [INFO] serf: EventMemberJoin: giggly-mouse-consul-4p9zd 100.110.0.1
2018/10/03 17:50:05 [WARN] agent/proxy: running as root, will not start managed proxies
2018/10/03 17:50:05 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
2018/10/03 17:50:05 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
2018/10/03 17:50:05 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
2018/10/03 17:50:05 [INFO] agent: started state syncer
2018/10/03 17:50:05 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
2018/10/03 17:50:05 [INFO] agent: Joining LAN cluster...
2018/10/03 17:50:05 [INFO] agent: (LAN) joining: [giggly-mouse-consul-server-0.giggly-mouse-consul-server.default.svc giggly-mouse-consul-server-1.giggly-mouse-consul-server.default.svc giggly-mouse-consul-server-2.giggly-mouse-consul-server.default.svc]
2018/10/03 17:50:05 [WARN] manager: No servers available
2018/10/03 17:50:05 [ERR] agent: failed to sync remote state: No known Consul servers
2018/10/03 17:50:05 [WARN] memberlist: Failed to resolve giggly-mouse-consul-server-0.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-0.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
2018/10/03 17:50:05 [WARN] memberlist: Failed to resolve giggly-mouse-consul-server-1.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-1.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
2018/10/03 17:50:05 [WARN] memberlist: Failed to resolve giggly-mouse-consul-server-2.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-2.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
2018/10/03 17:50:05 [INFO] agent: (LAN) joined: 0 Err: 3 error(s) occurred:
  • Failed to resolve giggly-mouse-consul-server-0.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-0.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
  • Failed to resolve giggly-mouse-consul-server-1.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-1.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
  • Failed to resolve giggly-mouse-consul-server-2.giggly-mouse-consul-server.default.svc: lookup giggly-mouse-consul-server-2.giggly-mouse-consul-server.default.svc on 100.64.0.10:53: no such host
    2018/10/03 17:50:45 [ERR] agent: failed to sync remote state: rpc error making call: No cluster leader
    2018/10/03 17:50:53 [ERR] consul: "Catalog.NodeServices" RPC failed to server 100.104.0.10:8300: rpc error making call: No cluster leader`

I was and am able to use the repo at https://github.com/helm/charts/tree/master/stable/consul to run traefik with consul.

Also it is a bit confusing that there exists a chart on helm and one at hashicorp.

When using 'extraVolumes' to mount secrets, Consul errors on start with 'data_dir is empty'

When using the extraVolumes configuration option in the values.yaml, the Helm chart fails to start Consul correctly with the error data_dir is empty.

All secrets exist and when tested by telling the chart by setting them to load: false Consul starts correctly.

Upon further investigation the issue appears to be a missing \ in the loop that adds the extra config directories for the statefulset and the daemonset:

https://github.com/hashicorp/consul-helm/blob/master/templates/server-statefulset.yaml#L91
https://github.com/hashicorp/consul-helm/blob/master/templates/client-daemonset.yaml#L77

I would submit a Pull Request but the company I work for currently doesn't have a policy in place for contributing to open source software.

Cloud Auto-joining not working with gce

Setting up the Auto-join values in values.yaml results in error

join:
- "provider=gce project_name=test-0 tag_value=consul-server"

Error -

==> config: Unknown extra arguments: [project_name=test-0 tag_value=consul-server -domain=consul]

Is this valid error or something is wrong with my configuration

All tests fail on Linux

I installed helm, kubectl, and bats on an Ubuntu system and ran the unit test suite bats ./test/unit and all tests failed.

Failed tests
 ✗ client/ConfigMap: enabled by default
   (in test file test/unit/client-configmap.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ client/ConfigMap: enable with global.enabled false
   (in test file test/unit/client-configmap.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ client/ConfigMap: disable with client.enabled
   (in test file test/unit/client-configmap.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ client/ConfigMap: disable with global.enabled
   (in test file test/unit/client-configmap.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ client/ConfigMap: extraConfig is set
   (in test file test/unit/client-configmap.bats, line 52)
     `[ ! -z "${actual}" ]' failed
 ✗ connectInject/Deployment: disabled by default
   (in test file test/unit/connect-inject-deployment.bats, line 11)
     `[ "${actual}" = "false" ]' failed
 ✗ connectInject/Deployment: enable with global.enabled false
   (in test file test/unit/connect-inject-deployment.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ connectInject/Deployment: disable with connectInject.enabled
   (in test file test/unit/connect-inject-deployment.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ connectInject/Deployment: disable with global.enabled
   (in test file test/unit/connect-inject-deployment.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ connectInject/Deployment: no secretName: no tls-{cert,key}-file set
   (in test file test/unit/connect-inject-deployment.bats, line 52)
     `[ "${actual}" = "false" ]' failed
 ✗ connectInject/Deployment: with secretName: tls-{cert,key}-file set
   (in test file test/unit/connect-inject-deployment.bats, line 77)
     `[ "${actual}" = "true" ]' failed
 ✗ dns/Service: enabled by default
   (in test file test/unit/dns-service.bats, line 11)
     `[ "${actual}" = "true" ]' failed
   /tmp/bats.153734.src: line 11: yq: command not found
 ✗ dns/Service: enable with global.enabled false
   (in test file test/unit/dns-service.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ dns/Service: disable with dns.enabled
   (in test file test/unit/dns-service.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ dns/Service: disable with global.enabled
   (in test file test/unit/dns-service.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ server/ConfigMap: enabled by default
   (in test file test/unit/server-configmap.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ server/ConfigMap: enable with global.enabled false
   (in test file test/unit/server-configmap.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ server/ConfigMap: disable with server.enabled
   (in test file test/unit/server-configmap.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ server/ConfigMap: disable with global.enabled
   (in test file test/unit/server-configmap.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ server/ConfigMap: extraConfig is set
   (in test file test/unit/server-configmap.bats, line 52)
     `[ ! -z "${actual}" ]' failed
 ✗ server/DisruptionBudget: enabled by default
   (in test file test/unit/server-disruptionbudget.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ server/DisruptionBudget: enable with global.enabled false
   (in test file test/unit/server-disruptionbudget.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ server/DisruptionBudget: disable with server.enabled
   (in test file test/unit/server-disruptionbudget.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ server/DisruptionBudget: disable with server.disruptionBudget.enabled
   (in test file test/unit/server-disruptionbudget.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ server/DisruptionBudget: disable with global.enabled
   (in test file test/unit/server-disruptionbudget.bats, line 52)
     `[ "${actual}" = "false" ]' failed
 ✗ server/DisruptionBudget: correct maxUnavailable with n=3
   (in test file test/unit/server-disruptionbudget.bats, line 62)
     `[ "${actual}" = "0" ]' failed
 ✗ server/Service: enabled by default
   (in test file test/unit/server-service.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ server/Service: enable with global.enabled false
   (in test file test/unit/server-service.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ server/Service: disable with server.enabled
   (in test file test/unit/server-service.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ server/Service: disable with global.enabled
   (in test file test/unit/server-service.bats, line 42)
     `[ "${actual}" = "false" ]' failed
 ✗ server/Service: tolerates unready endpoints
   (in test file test/unit/server-service.bats, line 53)
     `[ "${actual}" = "true" ]' failed
 ✗ server/StatefulSet: enabled by default
   (in test file test/unit/server-statefulset.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ server/StatefulSet: enable with global.enabled false
   (in test file test/unit/server-statefulset.bats, line 22)
     `[ "${actual}" = "true" ]' failed
 ✗ server/StatefulSet: disable with server.enabled
   (in test file test/unit/server-statefulset.bats, line 32)
     `[ "${actual}" = "false" ]' failed
 ✗ server/StatefulSet: disable with global.enabled
   (in test file test/unit/server-statefulset.bats, line 42)
     `[ "${actual}" = "false" ]' failed
   /tmp/bats.155386.src: line 42: yq: command not found
 ✗ server/StatefulSet: image defaults to global.image
   (in test file test/unit/server-statefulset.bats, line 52)
     `[ "${actual}" = "foo" ]' failed
 ✗ server/StatefulSet: image can be overridden with server.image
   (in test file test/unit/server-statefulset.bats, line 63)
     `[ "${actual}" = "bar" ]' failed
 ✗ server/StatefulSet: no updateStrategy when not updating
   (in test file test/unit/server-statefulset.bats, line 75)
     `[ "${actual}" = "null" ]' failed
 ✗ server/StatefulSet: updateStrategy during update
   (in test file test/unit/server-statefulset.bats, line 85)
     `[ "${actual}" = "RollingUpdate" ]' failed
 ✗ server/StatefulSet: adds extra volume
   (in test file test/unit/server-statefulset.bats, line 111)
     `[ "${actual}" = "foo" ]' failed
 ✗ server/StatefulSet: adds extra secret volume
   (in test file test/unit/server-statefulset.bats, line 156)
     `[ "${actual}" = "null" ]' failed
 ✗ server/StatefulSet: adds loadable volume
   (in test file test/unit/server-statefulset.bats, line 197)
     `[ "${actual}" = "1" ]' failed
 ✗ ui/Service: enabled by default
   (in test file test/unit/ui-service.bats, line 11)
     `[ "${actual}" = "true" ]' failed
 ✗ ui/Service: enable with global.enabled false
   (in test file test/unit/ui-service.bats, line 23)
     `[ "${actual}" = "true" ]' failed
 ✗ ui/Service: disable with server.enabled
   (in test file test/unit/ui-service.bats, line 33)
     `[ "${actual}" = "false" ]' failed
 ✗ ui/Service: disable with ui.enabled
   (in test file test/unit/ui-service.bats, line 43)
     `[ "${actual}" = "false" ]' failed
 ✗ ui/Service: disable with ui.service.enabled
   (in test file test/unit/ui-service.bats, line 53)
     `[ "${actual}" = "false" ]' failed
 ✗ ui/Service: disable with global.enabled
   (in test file test/unit/ui-service.bats, line 63)
     `[ "${actual}" = "false" ]' failed
 ✗ ui/Service: disable with global.enabled and server.enabled on
   (in test file test/unit/ui-service.bats, line 74)
     `[ "${actual}" = "false" ]' failed
 ✗ ui/Service: no type by default
   (in test file test/unit/ui-service.bats, line 83)
     `[ "${actual}" = "null" ]' failed
 ✗ ui/Service: specified type
   (in test file test/unit/ui-service.bats, line 93)
     `[ "${actual}" = "LoadBalancer" ]' failed

I see yq being used in the unit test suite but not documented anywhere in the README. After installing yq I reran the tests and they still fail.

PodDisruptionBudget calculation is not working?

I have a 3 replicas setup and for some reason the PDB was set to 0.

ceil (sub (div (int .Values.server.replicas) 2) 1) looks good to me but it's possible the calculation is off in GoTemplate ?

Consul does not bootstrap on AKS using Terraform Provider

AKS cluster spun up using Terraform in an existing Azure Subnet (hard-coded variables substituted in to clarify configuration)

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "${local.aks_name}"
  location            = "eastus"
  resource_group_name = "${var.resource_group_name}"
  dns_prefix          = "${local.dns_prefix}"
  kubernetes_version  = "1.10.8"  # Results in same errors with the AKS default 1.9.9

  agent_pool_profile {
    name            = "default"
    count           = "5"
    vm_size         = "Standard_DS2"
    os_type         = "Linux"
    os_disk_size_gb = 30
    vnet_subnet_id  = "${var.subnet_id}"
  }

  linux_profile {
    admin_username = "localadmin"

    ssh_key {
      key_data = "${var.ssh_key}"
    }
  }

  service_principal {
    client_id     = "${var.spn_client}"
    client_secret = "${var.spn_secret}"
  }

  tags {
    environment = "${var.environment}"
  }
}

Used the helm chart to install with commit 8b57bed

helm install --name az1 --namespace consul .

After a few minutes logs from consul-consul-server-0

bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.3'
           Node ID: '441321d9-cf1c-9ea4-08d1-063d3aacb69c'
         Node name: 'az1-consul-server-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
     Cluster Addr: 10.244.9.4 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
 
==> Log data will now stream in as it occurs:
 
    2018/10/17 01:42:00 [INFO] raft: Initial configuration (index=0): []
    2018/10/17 01:42:00 [INFO] raft: Node at 10.244.9.4:8300 [Follower] entering Follower state (Leader: "")
    2018/10/17 01:42:00 [INFO] serf: EventMemberJoin: az1-consul-server-0.dc1 10.244.9.4
    2018/10/17 01:42:00 [INFO] serf: EventMemberJoin: az1-consul-server-0 10.244.9.4
    2018/10/17 01:42:00 [INFO] consul: Handled member-join event for server "az1-consul-server-0.dc1" in area "wan"
    2018/10/17 01:42:00 [INFO] consul: Adding LAN server az1-consul-server-0 (Addr: tcp/10.244.9.4:8300) (DC: dc1)
    2018/10/17 01:42:00 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/10/17 01:42:00 [WARN] agent/proxy: running as root, will not start managed proxies
    2018/10/17 01:42:00 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/10/17 01:42:00 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/10/17 01:42:00 [INFO] agent: started state syncer
    2018/10/17 01:42:00 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
    2018/10/17 01:42:00 [INFO] agent: Joining LAN cluster...
    2018/10/17 01:42:00 [INFO] agent: (LAN) joining: [az1-consul-server-0.az1-consul-server.consul.svc az1-consul-server-1.az1-consul-server.consul.svc az1-consul-server-2.az1-consul-server.consul.svc]
    2018/10/17 01:42:05 [WARN] raft: no known peers, aborting election
    2018/10/17 01:42:07 [ERR] agent: failed to sync remote state: No cluster leader
==> Failed to check for updates: Get https://checkpoint-api.hashicorp.com/v1/check/consul?arch=amd64&os=linux&signature=a447a0f9-f6b7-3a33-a136-81045c4b26d6&version=1.2.3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    2018/10/17 01:42:28 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:42:38 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:42:52 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:43:06 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2018/10/17 01:43:06 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/10/17 01:43:06 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:43:19 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:43:29 [INFO] serf: EventMemberJoin: az1-consul-6lw7d 10.244.9.3
    2018/10/17 01:43:41 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:43:52 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:44:17 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:44:22 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:44:42 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:44:49 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:45:15 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:45:18 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:45:47 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:45:51 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:46:18 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:46:26 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:46:44 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:46:55 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:47:07 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:47:28 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:47:38 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:47:58 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:48:03 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:48:26 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/17 01:48:35 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:49:00 [ERR] agent: Coordinate update error: No cluster leader
    2018/10/17 01:49:01 [ERR] agent: failed to sync remote state: No cluster leader

(Pulling up the logs in the kubernetes UI repeat the same [ERR]* reports for the last few pages now that it has been up for over 12 hours).

Kubectl exec into consul-consul-server-0

/ # consul members
Node                 Address          Status  Type    Build  Protocol  DC   Segment
az1-consul-server-0  10.244.9.4:8301  alive   server  1.2.3  2         dc1  <all>
az1-consul-6lw7d     10.244.9.3:8301  alive   client  1.2.3  2         dc1  <default>

Kubectl exec into consul-consul-server-2 (consul members had equivalent information on server-1)

> kubectl exec az1-consul-server-2 -n consul -it -- /bin/sh
/ # consul members
Node                 Address          Status  Type    Build  Protocol  DC   Segment
az1-consul-server-2  10.244.8.4:8301  alive   server  1.2.3  2         dc1  <all>
/ # consul join az1-consul-server-0
Error joining address 'az1-consul-server-0': Unexpected response code: 500 (1 error(s) occurred:
 
* Failed to resolve az1-consul-server-0: lookup az1-consul-server-0 on 10.0.0.10:53: read udp 10.244.8.4:56376->10.0.0.10:53: i/o timeout)
Failed to join any nodes.

Consul on AKS does not properly set POD_IP

When trying to test Consul using the default Helm Chart (and when changing a few settings to match local environment), it is never managing to elect a leader.

Looking at the Pod Statuses the POD_IP variable is not set, I tracked the status in the Kubernetes UI and the IP field does not populate for several seconds after the container is created (at which point it is showing the Environment Variables). This leads to a loop of log errors where it cannot resolve the other Pods so it cannot elect a leader.

This is on a brand new AKS cluster.

consul-connect-injector-webhook-deployment crash-loop

consul-connect-injector-webhook-deployment pod crash-loops with error in the logs

flag provided but not defined: -consul-image

in connect-inject-deployment.yaml on line 47 there is -consul-image="{{ default .Values.global.image .Values.connectInject.imageConsul }}" \

removing this line fixes the error

I have submitted PR #63 for this

feature request: helm chart repository

With reference to the to the README, it clearly states:

For now, we do not host a chart repository.

Because a Helm provider is now managed by HashiCorp, it would be very pleasent if a Helm chart repository also could be managed by HashiCorp. For example by using GitHub pages.

Then we don't have to first clone this repo or to download and unpack the chart, and we can very easily change the version of the Helm chart to be installed.

And then we could do everything in Terraform configuration, for example:

resource "helm_repository" "main" {
  name = "hashicorp-consul"
  url  = "https://hashicorp.github.io/consul-helm/"
}

resource "helm_release" "main" {
  name       = "consul-westeurope"
  repository = "${helm_repository.main.metadata.0.name}"
  chart      = "consul"
  version    = "0.1.0"

  set {
    name  = "global.datacenter"
    value = "westeurope"
  }

}

Please vote on this issue by adding a 👍 reaction.

Deoploy Consul helm chart on minikube

Deploying default helm chart onto a blank minikube system has DNS issues.
minikube version: v0.28.2
kubernetes 1.10
helm version 2.10.0 (client and server)
MaxOS 10.13.6

Log output of first consul-server after launching it:
$ kubectl logs kissable-duck-consul-server-0
bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.2.3'
Node ID: '7a2aeb77-2b45-9c3b-0409-2b6612e949d1'
Node name: 'kissable-duck-consul-server-0'
Datacenter: 'dc1' (Segment: '')
Server: true (Bootstrap: false)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.17.0.8 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

2018/09/25 00:03:39 [INFO] raft: Initial configuration (index=0): []
2018/09/25 00:03:39 [INFO] raft: Node at 172.17.0.8:8300 [Follower] entering Follower state (Leader: "")
2018/09/25 00:03:39 [INFO] serf: EventMemberJoin: kissable-duck-consul-server-0.dc1 172.17.0.8
2018/09/25 00:03:39 [INFO] serf: EventMemberJoin: kissable-duck-consul-server-0 172.17.0.8
2018/09/25 00:03:39 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
2018/09/25 00:03:39 [INFO] consul: Adding LAN server kissable-duck-consul-server-0 (Addr: tcp/172.17.0.8:8300) (DC: dc1)
2018/09/25 00:03:39 [INFO] consul: Handled member-join event for server "kissable-duck-consul-server-0.dc1" in area "wan"
2018/09/25 00:03:39 [WARN] agent/proxy: running as root, will not start managed proxies
2018/09/25 00:03:39 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
2018/09/25 00:03:39 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
2018/09/25 00:03:39 [INFO] agent: started state syncer
2018/09/25 00:03:39 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
2018/09/25 00:03:39 [INFO] agent: Joining LAN cluster...
2018/09/25 00:03:39 [INFO] agent: (LAN) joining: [kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc]
2018/09/25 00:03:39 [WARN] memberlist: Failed to resolve kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
2018/09/25 00:03:39 [WARN] memberlist: Failed to resolve kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
2018/09/25 00:03:39 [WARN] memberlist: Failed to resolve kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
2018/09/25 00:03:39 [INFO] agent: (LAN) joined: 0 Err: 3 error(s) occurred:
  • Failed to resolve kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
  • Failed to resolve kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
  • Failed to resolve kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
    2018/09/25 00:03:39 [WARN] agent: Join LAN failed: , retrying in 30s
    2018/09/25 00:03:44 [WARN] raft: no known peers, aborting election
    2018/09/25 00:03:46 [ERR] agent: failed to sync remote state: No cluster leader
    2018/09/25 00:04:07 [INFO] serf: EventMemberJoin: kissable-duck-consul-bqctk 172.17.0.7
    2018/09/25 00:04:09 [INFO] agent: (LAN) joining: [kissable-duck-consul-server-0.kissable-duck-consul-server.default.svc kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc]
    2018/09/25 00:04:09 [WARN] memberlist: Failed to resolve kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-1.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
    2018/09/25 00:04:09 [WARN] memberlist: Failed to resolve kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc: lookup kissable-duck-consul-server-2.kissable-duck-consul-server.default.svc on 10.96.0.10:53: no such host
    2018/09/25 00:04:09 [INFO] agent: (LAN) joined: 1 Err:
    2018/09/25 00:04:09 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/09/25 00:04:13 [ERR] agent: Coordinate update error: No cluster leader
    2018/09/25 00:04:13 [ERR] agent: failed to sync remote state: No cluster leader
    2018/09/25 00:04:39 [ERR] agent: Coordinate update error: No cluster leader

Any ideas what is going wrong?

Clear service of node when restart pod

I registered out service in consul use rest api consul client(daemonset). But after restart pod all service is clean of this node, although this service is alive.

Is this right?

disruptionBudget maxUnavailable value is ignored if 0

What I did:

I set my pod disruption budget's maxUnavailable to 0 in my Helm values override file. I then installed the helm chart to a minikube cluster.

What I saw:

When running helm install, I saw an error that told me the maxUnavailable had been set to -1, not the 0 that I had used.

$ helm install -f helm-consul-values.yaml ./consul-helm

Error: release pouring-monkey failed: PodDisruptionBudget.policy "pouring-monkey-consul
-server" is invalid: spec.maxUnavailable: Invalid value: -1: must be greater than or eq
ual to 0

What I expected:

I expected that setting maxUnavailable to 0 would be interpreted by the chart as a valid value.

Other context:

If I set the value to 1, everything works as expected. This suggests that a piece of logic is interpreting 0 as false and fails to use the value. Values 1 and greater are interpreted correctly.

My Helm values file has (this is the config that causes the above error):

server:
  replicas: 1
  bootstrapExpect: 1
  disruptionBudget:
    enabled: true
    maxUnavailable: 0

The issue may be related to code in _helpers.tpl which checks if .Values.server.disruptionBudget.maxUnavailable but should look for if .Values.server.disruptionBudget.enabled.

Consul agent registers node with the name of the pod

When starting agent instances, the agents use the hostnames of the pods consul-{random} to register as nodes. IMHO this isn't desired and creates confusion for the operator of the consul cluster, b/c it's impossible to tell which node represents which physical k8s node.

Unable to install consul-helm with ConnectInject.enabled = true

Helm/tiller version: 2.11.0 with init & RBAC --service-account tiller set
Kubectl client/server version: 1.11.5
Target Provider/platform: AKS

When deploying consul-helm with value of connectInject.enabled=true, the below error pops up.

If I manually apply the connect-inject service account, role binding, mutatingwebhook and deployment it fails to inject connect even if annotation is added to pod manifest.

Error Message:
Error: release consul failed: clusterroles.rbac.authorization.k8s.io "consul-connect-injector-webhook" is forbidden: attempt to grant extra privileges: [{[get] [admissionregistration.k8s.io] [mutatingwebhookconfigurations] [] []} {[list] [admissionregistration.k8s.io] [mutatingwebhookconfigurations] [] []} {[watch] [admissionregistration.k8s.io] [mutatingwebhookconfigurations] [] []} {[patch] [admissionregistration.k8s.io] [mutatingwebhookconfigurations] [] []}] user=&{system:serviceaccount:kube-system:default 6aa4a5a7-f800-11e8-bc17-0a58ac1f0dfd [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[clusterroles.rbac.authorization.k8s.io "system:discovery" not found]

Unable to run v0.2.0 in minikube. "pod has unbound PersistentVolumeClaims"

Running helm install --name consul --namespace=pkr -f dev-consul.yaml ./consul-helm with these custom values:

syncCatalog:
  enabled: true
server:
  storage: "8Gi"

It completes successfully, but when running kubectl get pods -n pkr, I see that consul-server-1 & consul-server-2 are Pending.

NAME                                   READY     STATUS    RESTARTS   AGE
consul-4vstb                           0/1       Running   0          15m
consul-server-0                        0/1       Running   0          15m
consul-server-1                        0/1       Pending   0          15m
consul-server-2                        0/1       Pending   0          15m
consul-sync-catalog-587b6859f6-dpj5v   1/1       Running   0          15m

Closer inspection on pods consul-server-0:

kubectl describe pods consul-server-0 -n pkr
Name:           consul-server-0
Namespace:      pkr
Node:           minikube/10.0.2.15
Start Time:     Sat, 29 Sep 2018 22:39:09 +0300
Labels:         app=consul
                chart=consul-0.1.0
                component=server
                controller-revision-hash=consul-server-66479c5df5
                hasDNS=true
                release=consul
                statefulset.kubernetes.io/pod-name=consul-server-0
Annotations:    consul.hashicorp.com/connect-inject=false
Status:         Running
IP:             172.17.0.9
Controlled By:  StatefulSet/consul-server
Containers:
  consul:
    Container ID:  docker://7de44c6027bdb78ba4b7bc73643701aa9e0bbb55abce8ce2c7b8e12e2adf82b0
    Image:         consul:1.2.3
    Image ID:      docker-pullable://consul@sha256:ea66d17d8c8c1f1afb2138528d62a917093fcd2e3b3a7b216a52c253189ea980
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="consul"

exec /bin/consul agent \
  -advertise="${POD_IP}" \
  -bind=0.0.0.0 \
  -bootstrap-expect=3 \
  -client=0.0.0.0 \
  -config-dir=/consul/config \
  -datacenter=dc1 \
  -data-dir=/consul/data \
  -domain=consul \
  -hcl="connect { enabled = true }" \
  -ui \
  -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -server

    State:          Running
      Started:      Sat, 29 Sep 2018 22:39:10 +0300
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  pkr (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fd6r9 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-consul-server-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-server-config
    Optional:  false
  default-token-fd6r9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-fd6r9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                  From               Message
  ----     ------                 ----                 ----               -------
  Warning  FailedScheduling       31m (x2 over 31m)    default-scheduler  pod has unbound PersistentVolumeClaims
  Normal   Scheduled              31m                  default-scheduler  Successfully assigned consul-server-0 to minikube
  Normal   SuccessfulMountVolume  31m                  kubelet, minikube  MountVolume.SetUp succeeded for volume "pvc-55188bbe-c41f-11e8-b65d-080027750557"
  Normal   SuccessfulMountVolume  31m                  kubelet, minikube  MountVolume.SetUp succeeded for volume "config"
  Normal   SuccessfulMountVolume  31m                  kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-fd6r9"
  Normal   Pulled                 31m                  kubelet, minikube  Container image "consul:1.2.3" already present on machine
  Normal   Created                31m                  kubelet, minikube  Created container
  Normal   Started                31m                  kubelet, minikube  Started container
  Warning  Unhealthy              16m (x299 over 30m)  kubelet, minikube  Readiness probe failed:

This "pod has unbound PersistentVolumeClaims" error is same for all consul servers. Yet, when running kubectl get pvc & kubectl get pv, I see persistent volumes fine:

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                           STORAGECLASS   REASON    AGE
pvc-55188bbe-c41f-11e8-b65d-080027750557   8Gi        RWO            Delete           Bound     pkr/data-consul-server-0   standard                 37m
pvc-55245df6-c41f-11e8-b65d-080027750557   8Gi        RWO            Delete           Bound     pkr/data-consul-server-1   standard                 37m
pvc-5533ebc2-c41f-11e8-b65d-080027750557   8Gi        RWO            Delete           Bound     pkr/data-consul-server-2   standard                 37m

kubectl get pvc -n pkr
NAME                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-consul-server-0   Bound     pvc-55188bbe-c41f-11e8-b65d-080027750557   8Gi        RWO            standard       38m
data-consul-server-1   Bound     pvc-55245df6-c41f-11e8-b65d-080027750557   8Gi        RWO            standard       38m
data-consul-server-2   Bound     pvc-5533ebc2-c41f-11e8-b65d-080027750557   8Gi        RWO            standard       38m
postgresql             Bound     pvc-56367c46-c41f-11e8-b65d-080027750557   8Gi        RWO            standard       38m

So, I don't understand the error, since pv & pvc outputs seem fine to me.
How should I debug this further? I've tried deleting the minikube cluster and starting over from scratch, but I get this result every time.

kubectl version

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-10T11:44:36Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

helm version
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}```

Unable to access consul servers from external datacenter

If I want to connect a kubernetes consul cluster with another consul cluster in another datacenter over WAN, the communication only works one way, because this helm chart does not expose a route to the consul servers. So within the kubernetes cluster, I can resolve services that are running in the external datacenter. But if I'm in the external datacenter, I'm unable to resolve services that are running in the kubernetes cluster. The error given is 500 (rpc error: No path to datacenter)

Consul fails to start

In my cluster I have rook running. When provisioning the consul cluster, my helm values file looks like this:

global:
  enabled: true
  domain: consul
  image: "consul:1.2.3"
  datacenter: dc1

server:
  enabled: "-"
  image: null
  replicas: 3
  bootstrapExpect: 3
  storage: 10Gi
  storageClass: rook-ceph-block
  connect: true
  resources: {}
  updatePartition: 0
  disruptionBudget:
    enabled: true
    maxUnavailable: null
  extraConfig: |
    {}
  extraVolumes: []
client:
  enabled: "-"
  image: null
  join: null
  resources: {}
  extraConfig: |
    {}
  extraVolumes: []
dns:
  enabled: "-"

ui:
  enabled: "-"
  service:
    enabled: true
    type: null

connectInject:
  enabled: false # "-" disable this by default for now until the image is public
  image: "TODO"
  default: false # true will inject by default, otherwise requires annotation
  caBundle: "" # empty will auto generate the bundle
  namespaceSelector: null

  certs:
    secretName: null
    caBundle: ""
    certName: tls.crt
    keyName: tls.key

In order to start the chart, I use the following cmdline:

helm install -f ./helm/values.digitalocean.yaml --name consul --namespace service-discovery ./helm
NAME:   consul
LAST DEPLOYED: Wed Sep 26 07:46:32 2018
NAMESPACE: service-discovery
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/PodDisruptionBudget
NAME           MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
consul-server  N/A            0                0                    1s

==> v1/Pod(related)
NAME             READY  STATUS             RESTARTS  AGE
consul-56lvs     0/1    ContainerCreating  0         1s
consul-jttwp     0/1    ContainerCreating  0         1s
consul-qpgdn     0/1    ContainerCreating  0         1s
consul-server-0  0/1    Pending            0         1s
consul-server-1  0/1    Pending            0         1s
consul-server-2  0/1    Pending            0         1s

==> v1/ConfigMap
NAME                  DATA  AGE
consul-client-config  1     1s
consul-server-config  1     1s

==> v1/Service
NAME           TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)                                                                  AGE
consul-dns     ClusterIP  10.3.169.187  <none>       53/TCP,53/UDP                                                            1s
consul-server  ClusterIP  None          <none>       8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP  1s
consul-ui      ClusterIP  10.3.20.200   <none>       80/TCP                                                                   1s

==> v1/DaemonSet
NAME    DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
consul  3        3        0      3           0          <none>         1s

==> v1/StatefulSet
NAME           DESIRED  CURRENT  AGE
consul-server  3        3        1s

A quick verification of pvc indicate that they have bound successfully, please note that I've observed that it takes up to 20s to bind the pvc(s) running under rook

consul-qpgdn     0/1    ContainerCreating  0         1s
consul-server-0  0/1    Pending            0         1s
consul-server-1  0/1    Pending            0         1s
consul-server-2  0/1    Pending            0         1s

==> v1/ConfigMap
NAME                  DATA  AGE
consul-client-config  1     1s
consul-server-config  1     1s

==> v1/Service
NAME           TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)                                                                  AGE
consul-dns     ClusterIP  10.3.169.187  <none>       53/TCP,53/UDP                                                            1s
consul-server  ClusterIP  None          <none>       8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP  1s
consul-ui      ClusterIP  10.3.20.200   <none>       80/TCP                                                                   1s

==> v1/DaemonSet
NAME    DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
consul  3        3        0      3           0          <none>         1s

==> v1/StatefulSet
NAME           DESIRED  CURRENT  AGE
consul-server  3        3        1s


 mmisztal@tsunami  ~/Projects/@cloud-technologies/ops-k8s-services-inf/src/consul   master ●  kubectl get pvc --all-namespaces
NAMESPACE           NAME                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
service-discovery   data-consul-server-0   Bound     pvc-85de967e-c14f-11e8-8c17-c6926976ea12   10Gi       RWO            rook-ceph-block   13s
service-discovery   data-consul-server-1   Bound     pvc-85e4d3d0-c14f-11e8-8c17-c6926976ea12   10Gi       RWO            rook-ceph-block   13s
service-discovery   data-consul-server-2   Bound     pvc-85ea1341-c14f-11e8-8c17-c6926976ea12   10Gi       RWO            rook-ceph-block   13s

However the consul-server pods have failed to start:

 kubectl -n service-discovery get pods
NAME              READY     STATUS              RESTARTS   AGE
consul-56lvs      0/1       Running             0          36s
consul-jttwp      0/1       Running             0          36s
consul-qpgdn      0/1       Running             0          36s
consul-server-0   0/1       ContainerCreating   0          36s
consul-server-1   0/1       ContainerCreating   0          36s
consul-server-2   0/1       Running             0          36s

An examination of the pod indicates that the readiness probe has failed:

kubectl -n service-discovery describe pod consul-server-2
Name:           consul-server-2
Namespace:      service-discovery
Node:           k8s-node-1.cloud-technologies.net/142.93.131.205
Start Time:     Wed, 26 Sep 2018 07:47:42 +0200
Labels:         app=consul
                chart=consul-0.1.0
                component=server
                controller-revision-hash=consul-server-66479c5df5
                hasDNS=true
                release=consul
                statefulset.kubernetes.io/pod-name=consul-server-2
Annotations:    consul.hashicorp.com/connect-inject=false
Status:         Running
IP:             10.2.1.13
Controlled By:  StatefulSet/consul-server
Containers:
  consul:
    Container ID:  docker://09641e9f57faf0304b9e74818ee99f2a7dd23a4f1bc44fa6e426ad4be2d72578
    Image:         consul:1.2.3
    Image ID:      docker-pullable://consul@sha256:ea66d17d8c8c1f1afb2138528d62a917093fcd2e3b3a7b216a52c253189ea980
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="consul"

exec /bin/consul agent \
  -advertise="${POD_IP}" \
  -bind=0.0.0.0 \
  -bootstrap-expect=3 \
  -client=0.0.0.0 \
  -config-dir=/consul/config \
  -datacenter=dc1 \
  -data-dir=/consul/data \
  -domain=consul \
  -hcl="connect { enabled = true }" \
  -ui \
  -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
  -server

    State:          Running
      Started:      Wed, 26 Sep 2018 07:48:17 +0200
    Ready:          True
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  service-discovery (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-js48j (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-consul-server-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-server-config
    Optional:  false
  default-token-js48j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-js48j
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                                        Message
  ----     ------                 ----              ----                                        -------
  Normal   Scheduled              10m               default-scheduler                           Successfully assigned consul-server-2 to k8s-node-1.cloud-technologies.net
  Normal   SuccessfulMountVolume  10m               kubelet, k8s-node-1.cloud-technologies.net  MountVolume.SetUp succeeded for volume "config"
  Normal   SuccessfulMountVolume  10m               kubelet, k8s-node-1.cloud-technologies.net  MountVolume.SetUp succeeded for volume "default-token-js48j"
  Warning  FailedMount            10m               kubelet, k8s-node-1.cloud-technologies.net  MountVolume.SetUp failed for volume "pvc-85ea1341-c14f-11e8-8c17-c6926976ea12" : invalid character '-' after top-level value
  Normal   SuccessfulMountVolume  10m               kubelet, k8s-node-1.cloud-technologies.net  MountVolume.SetUp succeeded for volume "pvc-85ea1341-c14f-11e8-8c17-c6926976ea12"
  Normal   Pulled                 10m               kubelet, k8s-node-1.cloud-technologies.net  Container image "consul:1.2.3" already present on machine
  Normal   Created                10m               kubelet, k8s-node-1.cloud-technologies.net  Created container
  Normal   Started                9m                kubelet, k8s-node-1.cloud-technologies.net  Started container
  Warning  Unhealthy              6m (x12 over 9m)  kubelet, k8s-node-1.cloud-technologies.net  Readiness probe failed:
 mmisztal@tsunami  ~/Projects/@cloud-technologies/ops-k8s-services-inf/src/consul   master ●  

An examination of the server logs indicates that it has failed to form the cluster:

 kubectl -n service-discovery logs consul-server-2
==> Starting Consul agent...
bootstrap_expect > 0: expecting 3 servers
==> Consul agent running!
           Version: 'v1.2.3'
           Node ID: '0a5797a9-fee3-d61c-61d8-01b500a9e3c8'
         Node name: 'consul-server-2'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.2.1.13 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/09/26 05:48:17 [INFO] raft: Initial configuration (index=0): []
    2018/09/26 05:48:17 [INFO] raft: Node at 10.2.1.13:8300 [Follower] entering Follower state (Leader: "")
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-server-2.dc1 10.2.1.13
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-server-2 10.2.1.13
    2018/09/26 05:48:17 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/09/26 05:48:17 [INFO] consul: Adding LAN server consul-server-2 (Addr: tcp/10.2.1.13:8300) (DC: dc1)
    2018/09/26 05:48:17 [INFO] consul: Handled member-join event for server "consul-server-2.dc1" in area "wan"
    2018/09/26 05:48:17 [WARN] agent/proxy: running as root, will not start managed proxies
    2018/09/26 05:48:17 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/09/26 05:48:17 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/09/26 05:48:17 [INFO] agent: started state syncer
    2018/09/26 05:48:17 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
    2018/09/26 05:48:17 [INFO] agent: Joining LAN cluster...
    2018/09/26 05:48:17 [INFO] agent: (LAN) joining: [consul-server-0.consul-server.service-discovery.svc consul-server-1.consul-server.service-discovery.svc consul-server-2.consul-server.service-discovery.svc]
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-56lvs 10.2.3.10
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-jttwp 10.2.1.11
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-server-1 10.2.3.11
    2018/09/26 05:48:17 [INFO] consul: Adding LAN server consul-server-1 (Addr: tcp/10.2.3.11:8300) (DC: dc1)
    2018/09/26 05:48:17 [WARN] memberlist: Refuting a suspect message (from: consul-server-2.dc1)
    2018/09/26 05:48:17 [INFO] serf: EventMemberJoin: consul-server-1.dc1 10.2.3.11
    2018/09/26 05:48:17 [INFO] consul: Handled member-join event for server "consul-server-1.dc1" in area "wan"
    2018/09/26 05:48:17 [WARN] memberlist: Failed to resolve consul-server-2.consul-server.service-discovery.svc: lookup consul-server-2.consul-server.service-discovery.svc on 10.3.0.10:53: no such host
    2018/09/26 05:48:17 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2018/09/26 05:48:17 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/09/26 05:48:24 [WARN] raft: no known peers, aborting election
    2018/09/26 05:48:25 [ERR] agent: failed to sync remote state: No cluster leader

Any hints what may be wrong?
I've noticed that the probe's initialDelaySeconds default value is 5, so I'm guessing it may have failed before the pvcs have been bound? Perhaps it'd make sense to have this value configurable?

Unable to upgrade due to `spec.selector.matchLabels` changing

I am attempting to upgrade our consul install from chart 0.1.0 to 0.4.0 and am receiving the following error messages:

Error: UPGRADE FAILED: DaemonSet.apps "consul" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"consul", "chart":"consul-0.4.0", "component":"client", "hasDNS":"true", "release":"consul"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable 
&& StatefulSet.apps "consul-server" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.

Looks like the lable chart is being updating from consul-0.1.0 to consul-0.4.0, which seems to be the norm. However this is causing the spec.selector.matchLabels.chart section in the StatefulSet and DaemonSet to change, which is an immutable field. I was able to get around this issue by changing the chart version back to 0.1.0 in the Chart.yaml file.

I think we should remove the chart label from any label selectors as this value will change with chart upgrades.

helm install error

When I do helm install . I get the following error:

Error: parse error in "consul/templates/_helpers.tpl": template: consul/templates/_helpers.tpl:1: function "ceil" not defined

Any idea what I am missing?

helm version
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1+unreleased", GitCommit:"c98726f110e2149ba6780f88bc9a3cff22c37923", GitTreeState:"clean"}

Thanks!

Add Service Type option to Server and Client sections

We're wanting to use Consul in GKE. Our current architecture has applications in multiple clusters, depending on their context. We'd like to be able to use the helm chart to install the server and clients, but the server would reside in its own cluster. There doesn't seem to be a way to change the ServiceType for the server and/or client currently. I believe that we'd need to use something other than ClusterIP for our use case (please correct me if I'm wrong, I'd love to see a POC).

Services not showing up in intentions tab in the UI

If I access the UI, I can see my services (with the -proxy suffix) plus the consul service, and I can also see the services via the CLI (kubectl exec ... consul catalog services), but when I go to the Intentions tab, the dropdown menu only shows * (All Services) and consul, and if I try to manually enter a service name, it lets me create it but I get Use a future Consul Service called '<servicename>' when entering the name.

The strange thing is that if I create an intention anyway, it does take effect, but only if I create it without the -proxy suffix (i.e. the name of the services based on their annotation and/or the name of the first container). The behavior is the same when creating the intention via the CLI.

Is this the intended Consul behavior? Or is that something specific to consul-helm/consul-k8s?

I'm using consul-helm 0.3.0 (installed with Helm 2.10), consul-k8s 0.2.1, and Kubernetes 1.9.4, running on EC2 (not EKS). I have the sync catalog enabled but defaulted to false and both toConsul and toK8S are set to false.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.