GithubHelp home page GithubHelp logo

aws / aws-cloud-map-mcs-controller-for-k8s Goto Github PK

View Code? Open in Web Editor NEW
88.0 10.0 28.0 1.51 MB

K8s controller implementing Multi-Cluster Services API based on AWS Cloud Map.

Home Page: https://aws.amazon.com/blogs/opensource/kubernetes-multi-cluster-service-discovery-using-open-source-aws-cloud-map-mcs-controller/

License: Apache License 2.0

Dockerfile 0.35% Makefile 2.87% Go 89.95% Shell 6.82%
aws aws-cloudmap kubernetes golang kubernetes-controller k8s multicluster k8s-controller eks

aws-cloud-map-mcs-controller-for-k8s's Introduction

AWS Cloud Map MCS Controller for K8s

Documentation CodeQL Build status Deploy status Integration status codecov

License contributions welcome GitHub issues Go Report Card

Introduction

The AWS Cloud Map Multi-cluster Service Discovery Controller for Kubernetes (K8s) implements the Kubernetes KEP-1645: Multi-Cluster Services API and KEP-2149: ClusterId for ClusterSet identification, which allows services to communicate across multiple clusters. The implementation relies on AWS Cloud Map for enabling cross-cluster service discovery. We have detailed step-by-step setup guide!

⚠ NOTE: The current version GitHub Release is in Alpha phase, and NOT intended for production use. The support will be limited to critical bug fixes.

Checkout the Graduation Criteria for moving the project to the next phase.

Installation

Perform the following installation steps on each participating cluster.

  • For multi-cluster service discovery and consumption, the controller should be installed on a minimum of 2 EKS clusters.
  • Participating clusters should be provisioned into a single AWS account, within a single AWS region.

Dependencies

Network

The AWS Cloud Map MCS Controller for K8s provides service discovery and communication across multiple clusters, therefore implementations depend on end-end network connectivity between workloads provisioned within each participating cluster.

  • In deployment scenarios where participating clusters are provisioned into separate VPCs, connectivity will depend on correctly configured VPC Peering, inter-VPC routing, and Security Group configuration. The VPC Reachability Analyzer can be used to test and validate end-end connectivity between worker nodes within each cluster.
  • Undefined behavior may occur if controllers are deployed without the required network connectivity between clusters.

Configure CoreDNS

Install the CoreDNS multicluster plugin into each participating cluster. The multicluster plugin enables CoreDNS to lifecycle manage DNS records for ServiceImport objects.

To install the plugin, run the following commands.

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/coredns-clusterrole.yaml"
kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/coredns-configmap.yaml"
kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/coredns-deployment.yaml"

Install Controller

To install the latest release of the controller, run the following commands.

NOTE: AWS region environment variable can be optionaly set like export AWS_REGION=us-west-2 Otherwise the controller will infer region in the order AWS_REGION environment variable, ~/.aws/config file, then EC2 metadata (for EKS environment)

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release"

πŸ“Œ See Releases section for details on how to install other versions.

The controller must have sufficient IAM permissions to perform required Cloud Map operations. Grant IAM access rights AWSCloudMapFullAccess to the controller Service Account to enable the controller to manage Cloud Map resources.

Usage

Configure cluster.clusterset.k8s.io and clusterset.k8s.io

cluster.clusterset.k8s.io is a unique identifier for the cluster.

clusterset.k8s.io is an identifier that relates to the ClusterSet in which the cluster belongs.

apiVersion: about.k8s.io/v1alpha1
kind: ClusterProperty
metadata:
  name: cluster.clusterset.k8s.io
spec:
  value: [Your Cluster identifier]
---
apiVersion: about.k8s.io/v1alpha1
kind: ClusterProperty
metadata:
  name: clusterset.k8s.io
spec:
  value: [Your ClusterSet identifier]

Example:

apiVersion: about.k8s.io/v1alpha1
kind: ClusterProperty
metadata:
  name: cluster.clusterset.k8s.io
spec:
  value: my-first-cluster
---
apiVersion: about.k8s.io/v1alpha1
kind: ClusterProperty
metadata:
  name: clusterset.k8s.io
spec:
  value: my-clusterset

Export services

Then assuming you already have a Service installed, apply a ServiceExport yaml to the cluster in which you want to export a service. This can be done for each service you want to export.

kind: ServiceExport
apiVersion: multicluster.x-k8s.io/v1alpha1
metadata:
  namespace: [Your service namespace here]
  name: [Your service name]

Example: This will export a service with name my-amazing-service in namespace hello

kind: ServiceExport
apiVersion: multicluster.x-k8s.io/v1alpha1
metadata:
  namespace: hello
  name: my-amazing-service

See the samples directory for a set of example yaml files to set up a service and export it. To apply the sample files run the following commands.

kubectl create namespace example
kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/example-deployment.yaml
kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/example-service.yaml
kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/samples/example-serviceexport.yaml

Import services

In your other cluster, the controller will automatically sync services registered in AWS Cloud Map by applying the appropriate ServiceImport. To list them all, run the following command.

kubectl get ServiceImport -A

Releases

AWS Cloud Map MCS Controller for K8s adheres to the SemVer specification. Each release updates the major version tag (eg. vX), a major/minor version tag (eg. vX.Y) and a major/minor/patch version tag (eg. vX.Y.Z). To see a full list of all releases, refer to our Github releases page.

NOTE: AWS region environment variable can be optionally set like export AWS_REGION=us-west-2 Otherwise controller will infer region in the order AWS_REGION environment variable, ~/.aws/config file, then EC2 metadata (for EKS environment)

The following command format is used to install from a particular release.

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release[?ref=*git version tag*]"

Run the following command to install the latest release.

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release"

The following example will install release v0.1.0.

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release?ref=v0.1.0"

We also maintain a latest tag, which is updated to stay in line with the main branch. We do not recommend installing this on any production cluster, as any new major versions updated on the main branch will introduce breaking changes.

To install from latest tag run the following command.

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_latest"

Graduation Criteria

Alpha -> Beta Graduation

Beta -> GA Graduation

Slack community

We have an open Slack community where users may get support with integration, discuss controller functionality and provide input on our feature roadmap. https://awsappmesh.slack.com/#k8s-mcs-controller Join the channel with this invite.

Contributing

aws-cloud-map-mcs-controller-for-k8s is an open source project. See CONTRIBUTING for details.

License

This project is distributed under the Apache License, Version 2.0, see LICENSE and NOTICE for more information.

aws-cloud-map-mcs-controller-for-k8s's People

Contributors

astaticvoid avatar bansal19 avatar bendu avatar cameronsenese avatar curtisthe avatar dependabot[bot] avatar fredjywang avatar hendoncr avatar matthewgoodman13 avatar runakash avatar thalleslmf avatar vanekjar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-cloud-map-mcs-controller-for-k8s's Issues

Makefile Build Failure with Go version 1.18.x

Installing executables in module mode with go get is deprecated: notice.

/Users/fjywang/aws-cloud-map-mcs-controller-for-k8s/bin/controller-gen "crd:trivialVersions=true,preserveUnknownFields=false" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
bash: /Users/fjywang/aws-cloud-map-mcs-controller-for-k8s/bin/controller-gen: No such file or directory

Support log level configuration

Our logs are very noisy, and only support info/error levels. We should reduce log output for no-ops and consider other logging solutions for warn/debug levels.

Extract correct service port when exporting a service

The existing implementation ignores virtual service port and extracts port information directly from endpoints. Please note that endpoints may be exposed on different port than the service.

See the difference between Port and TargetPort in the Service spec. Both ports must be exported to Cloud Map and import controller must create imported service with Port and imported EndpointSlice with TargetPort.

Detailed explanation of service ports mapping at https://nigelpoulton.com/explained-kubernetes-service-ports/

type ServicePort struct {
	// The name of this port within the service. This must be a DNS_LABEL.
	// All ports within a ServiceSpec must have unique names. When considering
	// the endpoints for a Service, this must match the 'name' field in the
	// EndpointPort.
	Name string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`

	// The IP protocol for this port. Supports "TCP", "UDP", and "SCTP".
	// Default is TCP.
	Protocol Protocol `json:"protocol,omitempty" protobuf:"bytes,2,opt,name=protocol,casttype=Protocol"`

	// The application protocol for this port.
	AppProtocol *string `json:"appProtocol,omitempty" protobuf:"bytes,6,opt,name=appProtocol"`

	// The port that will be exposed by this service.
	Port int32 `json:"port" protobuf:"varint,3,opt,name=port"`

	// Number or name of the port to access on the pods targeted by the service.
	TargetPort intstr.IntOrString `json:"targetPort,omitempty" protobuf:"bytes,4,opt,name=targetPort"`

	// The port on each node on which this service is exposed when type=NodePort or LoadBalancer.
	NodePort int32 `json:"nodePort,omitempty" protobuf:"varint,5,opt,name=nodePort"`
}

Implement end-to-end test suite

Implement a test suite that can test the controller end-to-end - spin up a new cluster, introduce ServiceExport, check if ServiceImport and corresponding endpoints are created.

kubectl apply -k not working

When I run kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release" I'm not able to install. It appears that the change from using "bases" to "resources" is not a drop in change. Resources does not support parent directories, even though bases is deprecated...

error: rawResources failed to read Resources: Load from path ../default failed: '../default' must be a file (got d='/private/var/folders/r9/7wj7vmd510qd9wh_7ws36g9n14zk4z/T/kustomize-184008480/config/default')

Cross AWS account support

See #105

To better support cross account scenarios, we can add the ability to assume a cross-account role to the controller. Implementation wise, it would be a credential provider that assumes the role specified by an environment variable set on the container.

Once code change is in. Customer would take the follow steps to enable:

  1. Customer creates role in AWS account with permissions to Cloud Map. Role is configured to be assumed by other account.
  2. Customer adds permissions for EKS pod role to assume the role created in step 1.
  3. Customer sets environment variable on controller with arn of role to assume.

ServiceExport controller should watch changes on Service resources

Currently, the ServiceExport controller watches only ServiceExport resources. When the underlying exported Service scales, it's not reflected and added/removed endpoints are not synchronized to Cloud Map.

Task:

  • Watch changes on Service resources (if they're exported)
  • Update endpoints when Service scales
  • Remove ServiceExport and all Cloud Map endpoints when Service is deleted

Update /samples and README.md

Update /samples folder to include:

  • CoreDNS samples to implement CoreDNS multicluster plugin (clusterrole, configmap, and deployment)

Update README.md to:

  • Include minor grammatical updates
  • Mention more clearly network dependencies for successful operation
  • Include instruction to install CoreDNS multicluster plugin
  • Mention more clearly IAM access rights to lifecycle Cloud Map resources

Sync EKS clusters in different AWS accounts

Hello there,

We have tried to use aws-cloud-map-mcs-controller-for-k8s tool, but we like to have sync between two EKS clusters that are running in different AWS accounts (Not sure if that one is possible). VPC peering is enabled and nACL rules are checked - there's no restrictions between two accounts so we can access services running outside EKS in account_A from account_B (for example). When we have run the tool in both clusters and export a service in cluster running in account_B, we can see service's namespace is added in AWS cloud-map in account that is running, but cluster in account_A couldn't see any imported service:

# cluster in account_B:
$ kubectl get ServiceExport -A
NAMESPACE            NAME    AGE
cross-cluster-test   nginx   46h
===
# cluster in account_A:
$ kubectl get ServiceImport -A
No resources found

Is there any other config I need to do? Maybe sync two cloud-maps between accounts? I just followed the steps in Readme file, and I don't see any errors in cloud-map-mcs-controller-manager pod.

Thank you

Kubernetes ServiceExport fails to create Cloud Map Namespace

Submitting Kubernetes request for a new ServiceExport fails to create the corresponding Namespace in Cloud Map.

Assumption: ServiceExport operation creates both Namespace and Service objects in Cloud Map where no pre-existing Namespace in Cloud Map exists..

After user submits a Kubernetes ServiceExport request, MCS Controller Manager appears to not attempt Cloud Map Namespace creation, then reports error: namespace <name> not found.

Steps to reproduce:

1/ Create EKS cluster.
2/ Install MCS Controller:

kubectl apply -k "github.com/aws/aws-cloud-map-mcs-controller-for-k8s/config/controller_install_release"

3/ Create Deployment, Service, and ServiceExport:

kubectl create namespace demo
kubectl apply -f https://raw.githubusercontent.com/aws/aws-cloud-map-mcs-controller-for-k8s/main/samples/demo-deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/aws-cloud-map-mcs-controller-for-k8s/main/samples/demo-service.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/aws-cloud-map-mcs-controller-for-k8s/main/samples/demo-export.yaml

4/ MCS Controller Manager reports error:

2021-11-03T07:34:27.219Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "cloud-map-mcs-system"}
2021-11-03T07:34:27.235Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:34:27.253Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}
2021-11-03T07:34:27.257Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-node-lease"}
2021-11-03T07:34:27.260Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-public"}
2021-11-03T07:34:27.276Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-system"}
2021-11-03T07:34:28.273Z        INFO    controllers.ServiceExport       updating Cloud Map service      {"serviceexport": "demo/demo-service", "namespace": "demo", "name": "demo-service"}
2021-11-03T07:34:28.273Z        INFO    cloudmap        fetching a service      {"namespaceName": "demo", "serviceName": "demo-service"}
2021-11-03T07:34:28.291Z        INFO    cloudmap        creating a new service  {"namespace": "demo", "name": "demo-service"}
2021-11-03T07:34:29.220Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "cloud-map-mcs-system"}
2021-11-03T07:34:29.223Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:34:29.227Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}
2021-11-03T07:34:29.231Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-node-lease"}
2021-11-03T07:34:29.234Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-public"}
2021-11-03T07:34:29.238Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-system"}
2021-11-03T07:34:31.072Z        ERROR   controllers.ServiceExport       error when creating new service in Cloud Map    {"serviceexport": "demo/demo-service", "namespace": "demo", "name": "demo-service", "error": "namespace demo not found"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
github.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/controllers.(*ServiceExportReconciler).handleUpdate
        /workspace/pkg/controllers/serviceexport_controller.go:109
github.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/controllers.(*ServiceExportReconciler).Reconcile
        /workspace/pkg/controllers/serviceexport_controller.go:65
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99
2021-11-03T07:34:31.072Z        ERROR   controller-runtime.manager.controller.serviceexport     Reconciler error        {"reconciler group": "multicluster.x-k8s.io", "reconciler kind": "ServiceExport", "name": "demo-service", "namespace": "demo", "error": "namespace demo not found"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:267
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99

Note:
If the Cloud Map Namespace demo is manually pre-created then the operation succeeds, i.e. the Cloud Map Service and associated Service instances are created in the demo namespace:

2021-11-03T07:37:45.219Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "cloud-map-mcs-system"}
2021-11-03T07:37:45.223Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:37:45.256Z        INFO    controllers.ServiceExport       updating Cloud Map service      {"serviceexport": "demo/demo-service", "namespace": "demo", "name": "demo-service"}
2021-11-03T07:37:45.256Z        INFO    cloudmap        fetching a service      {"namespaceName": "demo", "serviceName": "demo-service"}
2021-11-03T07:37:45.289Z        INFO    cloudmap        creating a new service  {"namespace": "demo", "name": "demo-service"}
2021-11-03T07:37:45.291Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}
2021-11-03T07:37:45.305Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-node-lease"}
2021-11-03T07:37:45.322Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-public"}
2021-11-03T07:37:45.326Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-system"}
2021-11-03T07:37:45.355Z        INFO    cloudmap        skipping endpoint registration for empty endpoint list  {"serviceName": "demo-service"}
2021-11-03T07:37:45.355Z        INFO    cloudmap        registering endpoints   {"namespaceName": "demo", "serviceName": "demo-service", "endpoints": [{"Id":"192_168_71_177","IP":"192.168.71.177","Port":80,"Attributes":{"K8S_CONTROLLER":"aws-cloud-map-mcs-controller-for-k8s 2158517-dirty (2158517-dirty)"}},{"Id":"192_168_93_154","IP":"192.168.93.154","Port":80,"Attributes":{"K8S_CONTROLLER":"aws-cloud-map-mcs-controller-for-k8s 2158517-dirty (2158517-dirty)"}},{"Id":"192_168_81_96","IP":"192.168.81.96","Port":80,"Attributes":{"K8S_CONTROLLER":"aws-cloud-map-mcs-controller-for-k8s 2158517-dirty (2158517-dirty)"}},{"Id":"192_168_86_14","IP":"192.168.86.14","Port":80,"Attributes":{"K8S_CONTROLLER":"aws-cloud-map-mcs-controller-for-k8s 2158517-dirty (2158517-dirty)"}},{"Id":"192_168_81_238","IP":"192.168.81.238","Port":80,"Attributes":{"K8S_CONTROLLER":"aws-cloud-map-mcs-controller-for-k8s 2158517-dirty (2158517-dirty)"}}]}
2021-11-03T07:37:47.220Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "cloud-map-mcs-system"}
2021-11-03T07:37:47.224Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:37:47.266Z        INFO    controllers.CloudMap    syncing service {"namespace": "demo", "service": "demo-service"}
2021-11-03T07:37:47.294Z        INFO    controllers.CloudMap    created ServiceImport   {"namespace": "demo", "name": "demo-service"}
2021-11-03T07:37:47.294Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}
2021-11-03T07:37:47.298Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-node-lease"}
2021-11-03T07:37:47.302Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-public"}
2021-11-03T07:37:47.305Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-system"}
2021-11-03T07:37:48.523Z        INFO    cloudmap        polling operations      {"operations": ["3c7oyezgq7dcnw4frctwdhah6nzmf72r-k520xjs1", "53tofooa5anjr7msoa6f47pnmdj2kuft-63sup7c7", "u4pzpec3uqutvcw4s26rodq4hjne66mp-6pkyfmt1", "txqpqsdh5gahbwbq2aqq7b4nt63qj6b4-k6yuoeb2", "k5n7yqdfpapx6ulz36jv6ipiof2oe2ae-31ge95y8"]}
2021-11-03T07:37:48.544Z        INFO    cloudmap        operations completed successfully
2021-11-03T07:37:49.220Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}
2021-11-03T07:37:49.245Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-node-lease"}
2021-11-03T07:37:51.114Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-public"}
2021-11-03T07:37:51.130Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "kube-system"}
2021-11-03T07:37:51.134Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "cloud-map-mcs-system"}
2021-11-03T07:37:51.138Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:37:51.156Z        INFO    controllers.CloudMap    syncing service {"namespace": "demo", "service": "demo-service"}
2021-11-03T07:37:51.186Z        INFO    controllers.CloudMap    created derived Service {"namespace": "demo", "name": "imported-9vbl979mtp"}
2021-11-03T07:37:51.219Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "demo"}
2021-11-03T07:37:51.234Z        INFO    controllers.CloudMap    syncing service {"namespace": "demo", "service": "demo-service"}
2021-11-03T07:37:51.249Z        INFO    controllers.CloudMap    updated ServiceImport   {"namespace": "demo", "name": "demo-service", "IP": ["10.100.8.76"], "ports": [{"protocol":"TCP","port":80}]}
W1103 07:37:51.259871       1 warnings.go:67] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
2021-11-03T07:37:51.260Z        INFO    controllers.CloudMap    created EndpointSlice   {"namespace": "demo", "name": "imported-9vbl979mtp-nbq5v"}
2021-11-03T07:37:51.260Z        INFO    controllers.CloudMap    syncing namespace       {"namespace": "default"}

Support same service running in multiple clusters (Cluster ID)

Current implementation assumes that [namespace name, service name] pair uniquely identifies a running service in a given cluster set. In case a service with the same name is exported from multiple clusters it causes conflict on the Cloud Map side (only one cluster can claim ownership over the Cloud Map service).

The goal is to enable the use case of the same service running across multiple clusters. Proposed solution:

  • Cloud Map instances must be identified by some form of cluster ID. That way the controller can claim ownership over individual endpoints and not the whole service.
  • Consider adopting existing Cluster ID proposal to uniquely identify clusters

Need to add about.k8s.io to CRD paths in suite_test.go

Need to add about.k8s.io for the clusterproperties CRD to crd paths in suite_test.go.
Currently adding them causes an annotation error as the annotation patch is not being applied before the CRD is getting installed WHEN running local tests. No issues with kind integration test. Error attached at end.

How the kubernetes-sigs team deals with this is with a USE_EXISTING_CLUSTER environment variable as seen in their repo in the same suite_test.go. But - are we using a local cluster while testing?

More research into issue and resolution needed.

Error:

Unexpected error:
      <*fmt.wrapError | 0xc000914320>: {
          msg: "unable to install CRDs onto control plane: unable to create CRD instances: unable to create CRD \"clusterproperties.about.k8s.io\": CustomResourceDefinition.apiextensions.k8s.io \"clusterproperties.about.k8s.io\" is invalid: metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation \"api-approved.kubernetes.io\", see https://github.com/kubernetes/enhancements/pull/1111",
          err: <*fmt.wrapError | 0xc000914300>{
              msg: "unable to create CRD instances: unable to create CRD \"clusterproperties.about.k8s.io\": CustomResourceDefinition.apiextensions.k8s.io \"clusterproperties.about.k8s.io\" is invalid: metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation \"api-approved.kubernetes.io\", see https://github.com/kubernetes/enhancements/pull/1111",
              err: <*fmt.wrapError | 0xc0009142e0>{
                  msg: "unable to create CRD \"clusterproperties.about.k8s.io\": CustomResourceDefinition.apiextensions.k8s.io \"clusterproperties.about.k8s.io\" is invalid: metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation \"api-approved.kubernetes.io\", see https://github.com/kubernetes/enhancements/pull/1111",
                  err: <*errors.StatusError | 0xc0001c6e60>{
                      ErrStatus: {
                          TypeMeta: {Kind: "", APIVersion: ""},
                          ListMeta: {
                              SelfLink: "",
                              ResourceVersion: "",
                              Continue: "",
                              RemainingItemCount: nil,
                          },
                          Status: "Failure",
                          Message: "CustomResourceDefinition.apiextensions.k8s.io \"clusterproperties.about.k8s.io\" is invalid: metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation \"api-approved.kubernetes.io\", see https://github.com/kubernetes/enhancements/pull/1111",
                          Reason: "Invalid",
                          Details: {
                              Name: "clusterproperties.about.k8s.io",
                              Group: "apiextensions.k8s.io",
                              Kind: "CustomResourceDefinition",
                              UID: "",
                              Causes: [
                                  {
                                      Type: "FieldValueRequired",
                                      Message: "Required value: protected groups must have approval annotation \"api-approved.kubernetes.io\", see https://github.com/kubernetes/enhancements/pull/1111",
                                      Field: "metadata.annotations[api-approved.kubernetes.io]",
                                  },
                              ],
                              RetryAfterSeconds: 0,
                          },
                          Code: 422,
                      },
                  },
              },
          },
      }
      unable to install CRDs onto control plane: unable to create CRD instances: unable to create CRD "clusterproperties.about.k8s.io": CustomResourceDefinition.apiextensions.k8s.io "clusterproperties.about.k8s.io" is invalid: metadata.annotations[api-approved.kubernetes.io]: Required value: protected groups must have approval annotation "api-approved.kubernetes.io", see https://github.com/kubernetes/enhancements/pull/1111
  occurred

Automatically create Cloud Map namespace if it doesn't exist

When a ServiceExport is created for given service, controller attempts to create corresponding Cloud Map service record in an existing namespace with the same. In case the namespace is missing on the Cloud Map side, the reconciliation is not successful.

Task: Create Cloud Map namespace on demand when it doesn't exist.

Split Service and EndpointSlices reconciliation

The CloudMapReconciler reconciles Service, ServiceImports and EndpointSlices every 2 seconds. Below is the proposal to re-architecture the reconciler by breaking into two components.

  1. ServiceReconciler: The Service and ServiceImports does not change often, we can reconcile them every 30 seconds. The steps:
    • First, find out the namespaces to reconcile. (ListNamespace API call)
    • Per Namespace, find out the desired state of the k8s Service. (ListService API call)
    • Create/Update k8s Service and ServiceImport to match the desired state.
    • Store the namespaceName and servcieName locally.
  2. InstanceReconciler: The EndpointSlices can change often, it can be kept to reconcile every 2 seconds. The steps:
    • Get the list of namespace-service to reconcile from the local store
    • Discover Instances for the namespace-service, and create the desired EndpointSlice state.(DiscoverInstance API call)
    • Update the EndpointSlice state to match the desired state.

Benefits:

  • We have clear separation of concerns.
  • Reduce the number of API call to the kubernetes.
  • Reduce the complexity within the reconciler.
  • It will as per the recommended architecture of the operator sdk. (One Controller per kind that is reconciled (one per CRDs))

Startup error handling

Proper error handling during the startup for the non-recoverable states.

For example:
If we fail to sign request because of missing credentials, terminate the controller instead of retrying

failed to sign request: failed to retrieve credentials: no EC2 IMDS role found

Connectivity through AWS Transit Gateway

In our Organization we have a huge amount of Teams with lots of EKS Clusters distributed throughout the Organization, making the usage of VPC Peering compared to Transit Gateway a lot more complicated. Will there be upcoming support of Transit Gateway usage for the MCS Controller?

Add support to work multi-region

Note that this walk-through assumes throughout to operate in the `us-west-2` region.

I deployed this into us-east-1 and received the following error:

2021-10-19T23:07:39.211Z        ERROR   controllers.ServiceExport       error when creating new service in Cloud Map    {"serviceexport": "demo/demo-service", "namespace": "demo", "name": "demo-service", "error": "operation error ServiceDiscovery: ListNamespaces, failed to resolve service endpoint, an AWS region is required, but was not found"}

To resolve this, I added an environment variable into the deployment manifest named migration-controller-manager in namespace migration-system. The example of what I used is below:

         env:
          - name: AWS_REGION
            value: us-east-1

IPv6 addressType support

Initial implementation of the controller is purely focused on IPv4 support. Add the support for IPv6.

RBAC Issue

Looks to be a permissions issue:
https://github.com/aws/aws-cloud-map-mcs-controller-for-k8s/tree/main/config/rbac

Command I ran to retrieve logs:
kubectl logs -n migration-system -c manager

2021-10-19T20:39:30.791Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Lease","namespace":"migration-system","name":"db692913.x-k8s.io","uid":"9c3b0ba1-6894-4e8c-80cd-78063754f700","apiVersion":"coordination.k8s.io/v1","resourceVersion":"233208"}, "reason": "LeaderElection", "message": "migration-controller-manager-587f969689-nvsl2_c6be2365-0c9f-4e5e-afb2-268eccce65e7 became leader"}

E1019 20:39:30.802354 1 reflector.go:127] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:migration-system:migration-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope

2021-10-19T20:39:30.992Z INFO controller-runtime.manager.controller.serviceexport Starting Controller {"reconciler group": "multicluster.x-k8s.io", "reconciler kind": "ServiceExport"}

E1019 20:39:31.959615 1 reflector.go:127] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:migration-system:migration-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope

Error handling in the reconciler

Current strategy for error handling is to log the error. This story is figure out ways to extend error handling capabilities, and make the reconciler fault-tolerant.

Handle exported schema changes

We export controller version as an instance attribute. We should check this variable to determine if and how we will import the endpoint.

Panic in reconciler causing segmentation violation

During benchmark testing, controller completely halted with a panic due to the attempt of accessing an invalid memory address or nil pointer dereference causing a segmentation violation.

Here are the last logs from the manager:

{"level":"info","ts":1655941251.2284162,"logger":"controllers.Cloudmap","msg":"CalculateChanges_ES_Plan","elapsed":9}
{"level":"info","ts":1655941251.228578,"logger":"controllers.Cloudmap","msg":"CalculateChanges_ES_Plan","elapsed":10}
{"level":"info","ts":1655941252.7608504,"logger":"controllers.ServiceExport","msg":"updating Cloud Map service","namespace":"demov2","name":"nginx-benchmark-service-61"}
{"level":"info","ts":1655941252.7608776,"logger":"cloudmap","msg":"fetching a service","namespace":"demov2","name":"nginx-benchmark-service-61"}
{"level":"info","ts":1655941252.76091,"logger":"cloudmap","msg":"creating a new service","namespace":"demov2","name":"nginx-benchmark-service-61"}
{"level":"info","ts":1655941252.8043206,"logger":"cloudmap","msg":"service created","namespace":"demov2","name":"nginx-benchmark-service-61","id":"srv-vvx3saukvkvkytng"}
{"level":"info","ts":1655941252.8043518,"logger":"cloudmap","msg":"fetching a service","namespace":"demov2","name":"nginx-benchmark-service-61"}
{"level":"info","ts":1655941252.8277564,"msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"serviceexport","controllerGroup":"multicluster.x-k8s.io","controllerKind":"ServiceExport","serviceExport":{"name":"nginx-benchmark-service-61","namespace":"demov2"},"namespace":"demov2","name":"nginx-benchmark-service-61","reconcileID":"f1c0f919-fdb1-4e63-be72-6aa2f4df35e7"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x15204f7]

goroutine 221 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x16819e0, 0x271b0e0})
	/usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/controllers.(*ServiceExportReconciler).handleUpdate(0xc000461e40, {0x1aad4b0, 0xc0007644c0}, 0xc00032be00, 0xc000d08000)
	/workspace/pkg/controllers/serviceexport_controller.go:136 +0x597
github.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/controllers.(*ServiceExportReconciler).Reconcile(0xc000461e40, {0x1aad558, 0xc000b44540}, {{{0xc000d0b876, 0x18}, {0xc000b68780, 0x40d3a7}}})
	/workspace/pkg/controllers/serviceexport_controller.go:93 +0x7d5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1aad4b0, {0x1aad558, 0xc000b44540}, {{{0xc000d0b876, 0x17a4d00}, {0xc000b68780, 0x40ee1d}}})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 +0xd1
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00031b7c0, {0x1aad4b0, 0xc000461c40}, {0x16e0800, 0xc00098b8c0})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00031b7c0, {0x1aad4b0, 0xc000461c40})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:230 +0x36f

Note that the code lines do not line up exactly with the Controller's code.
serviceexport_controller.go:136 points to the Current: cmService.Endpoints line
and serviceexport_controller.go:93 to the return r.handleUpdate(ctx, &serviceExport, &service)

Strategy to evict stale endpoints

As per the Multi-Cluster Services API, we need to define the strategy for below

Endpoint TTL

To prevent stale endpoints from persisting in the event that the mcs-controller is unable to reach a cluster, it is recommended that an implementation provide an in-cluster controller to monitor and remove stale endpoints. This may be the mcs-controller itself in distributed implementations.

We recommend creating leases to represent connectivity with source clusters. These leases should be periodically renewed by the mcs-controller while the connection with the source cluster is confirmed alive. When a lease expires, the cluster name and multicluster.kubernetes.io/source-cluster label may be used to find and remove all EndpointSlices containing endpoints from the unreachable cluster.

M1 CPU support: make test fails due to tar error

setup-envtest is failing:

test -f /Users/dhemmerl/aws-cloud-map-mcs-controller-for-k8s/testbin/setup-envtest.sh || curl -sSLo /Users/dhemmerl/aws-cloud-map-mcs-controller-for-k8s/testbin/setup-envtest.sh https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.7.2/hack/setup-envtest.sh
source /Users/dhemmerl/aws-cloud-map-mcs-controller-for-k8s/testbin/setup-envtest.sh; fetch_envtest_tools /Users/dhemmerl/aws-cloud-map-mcs-controller-for-k8s/testbin
fetching envtest [email protected] (into '/Users/dhemmerl/aws-cloud-map-mcs-controller-for-k8s/testbin')
tar: Error opening archive: Unrecognized archive format
make: *** [test-setup] Error 1

Integration test does not restore original kubectl context

To recreate:

  1. Set up cluster according to CONTRIBUTING.md docs.
  2. Run make integration-suite, which creates it's own kind cluster and sets its own context.
  3. After a successful integration test, if you make clean then run make install again, you may get this error:
ο»ΏThe connection to the server localhost:8080 was refused - did you specify the right host or port?

Can fix by resetting kubectl context with kind export kubeconfig --name my-cluster, but may be beneficial for integration to reset kubectl context for user.

Support headless services

Currently the controller supports only services of type ClusterIP, or in multi-cluster terms, ClusterSetIP (see code).That's not fully compliant with the MCS API.

  • Implement headless Services on the export side β€” headless Services is marked by a custom instance attribute
  • Implement headless ServiceImports β€” import controller translates headless instances to Headless ServiceImport type
  • Figure out corner case like β€” service with the same name is defined as headless in one cluster, but ClusterIP in another cluster

Improve unit tests code coverage

Some packages are missing proper unit tests as the first implementation was just a prototype. The current code coverage is very low. Let's add unit tests where possible to improve the code quality.

codecov

AWS SDK region is not correctly set

Controller loads the target region from AWS_REGION env variable, but this variable is never set in configuration files.

config.WithRegion(os.Getenv("AWS_REGION")),

Error message when running the controller:

2021-10-19T23:07:39.211Z ERROR controllers.ServiceExport error when creating new service in Cloud Map {"serviceexport": "demo/demo-service", "namespace": "demo", "name": "demo-service", "error": "operation error ServiceDiscovery: ListNamespaces, failed to resolve service endpoint, an AWS region is required, but was not found"}

Cloud Map API Throttling

I keep getting API error from MCS cloud map controller

{"level":"error","ts":1649927224.8055146,"logger":"controllers.Cloudmap","msg":"Cloud Map reconciliation error","error":"operation error ServiceDiscovery: ListServices, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: 81d0ef9f-ed2b-49d5-98ee-b3b1f6934f7f, api error ThrottlingException: Rate exceeded","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132\ngithub.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/common.logger.Error\n\t/workspace/pkg/common/logger.go:39\ngithub.com/aws/aws-cloud-map-mcs-controller-for-k8s/pkg/controllers.(*CloudMapReconciler).Start\n\t/workspace/pkg/controllers/cloudmap_controller.go:43\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:681"}

How can we configure the Cloud map controller to reduce the API call ?

Thanks

Load testing and performance tuning

Propose realistic testing scenario for number of clusters, services, endpoints and run the controller with that setup. Document the output on GitHub doc/readme to give users confidence about controller capabilities.

TBD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.