azure / aks-engine Goto Github PK

View Code? Open in Web Editor NEW

1.0K 75.0 527.0 105.97 MB

AKS Engine: legacy tool for Kubernetes on Azure (see status)

Home Page: https://github.com/Azure/aks-engine

License: MIT License

Makefile 0.18% Go 92.90% PowerShell 3.17% Shell 3.66% Python 0.04% Batchfile 0.01% C# 0.03% Dockerfile 0.02%

kubernetes docker azure containers orchestration golang

aks-engine's Introduction

AKS Engine - Deprecated tool for self-managed Kubernetes on Azure

Project status

This project is deprecated for Azure public cloud customers. Please use Azure Kubernetes Service (AKS) for managed Kubernetes or Cluster API Provider Azure for self-managed Kubernetes. There are no further releases planned; Kubernetes 1.24 was the final version to receive updates.

For use on the Azure Stack Hub product this project is fully supported and will continue to be supported by the Hub team throughout the lifespan of Azure Stack Hub. Development is already moved to a new Azure Stack Hub specific repository (Azure/aks-engine-azurestack). This new repository is where new releases for Azure Stack Hub clouds, starting at v0.75.3, will be published and where issues concerning Azure Stack Hub should be created.

Support

Please see our support policy.

Code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Data Collection

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

For more information, please see the telemetry documentation.

aks-engine's People

Stargazers

Watchers

Forkers

jackfrancis mboersma tariq1890 max8899 khenidak stijnv1 jungho adesso-as-a-service yves-vogl alexeldeib cecilerobertmichon bacongobbler ritazh junsun17 ceteongvanness tikyau andyzhangx patricklang vimauro dbsanfte andyliuliming feiskyer idanlevin sylr norshtein honcao kkmsft jameslj kaleido-io jluk jreypo song-jiang z4-box karataliu bbbmj dennis-benzinger-hybris realtba jsturtevant aeroglyphic dmitsh brysonshepherd autonomic-ai cjj69 juliacomputing saiyan86 itowlson serbrech bhuvaneswari-santharam bingosummer adelina-t sharmasushant dstrebel vyta skinny ciscoxiaobing chrouid shaikatz lastcoolnameleft deborahc tamilmani1989 mdanzinger cpuguy83 immuzz miknoj mkosieradzki marlonsingleton telefonica jon-walton cloudmelon digideskio melmaliacone sonujose derekmeek andrewwangn axier chengliangli0918 yannickzj jackbatzner xizha162 bospoort zachpuck letubert cnadolny liuyusen brendandburns protomech danmassie davidewatson egernst injeti-manohar thereallukl barakatsoluto kehoej juhacket bathizte gsacavdm wuhanyumsft mgrabarz palma21 hhy5277

aks-engine's Issues

Add option to keep generated files in _output/ for debugging

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): Feature

What version of acs-engine?: all

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm): Kubernetes

What happened:

When you generate a template - there are some big big long strings for customData added to each VM. It's difficult to read these, and it's almost impossible to see how they'll be modified as you make changes to acs-engine.

I propose adding a new verbose flag that will output anything to a file for review before it's encoded into customData strings. That will make it easier to review changes to kubelet flags, apiserver settings and so on before running az group deploy

How to reproduce it (as minimally and precisely as possible):
acs-engine generate <anything.json>

Private cluster jumpbox user home owned by root

** Is this a request for help?: **

YES

Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.10

What happened:

When creating a private cluster (Kubernetes), the jumpbox (if enabled) user's home is owned by root and not writable for the user, e.g. `/home/azureuser` is owned by `root` and not by `azureuser`.

What you expected to happen:

I would expect `/home/azureuser` to be owned by `azureuser`

How to reproduce it (as minimally and precisely as possible):

Deploy a basic Kubernetes cluster and enable private cluster and jumpbox.

Anything else we need to know:

N/A

Deployment on Azure Stack

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
Latest

Is Azure Stack on the roadmap of ACS Engine? I am trying to get K8S up and running on Azure Stack and curious, if this should be tracked here, in the same stream as ACS Engine for Public Azure. Or I should just install K8S on IaaS "from scratch".

Move to etcdctl API v3

acs-engine has been using etcd v3 as default since #1934. etcd v3 introduced a new API (https://coreos.com/blog/migrating-applications-etcd-v3.html). acs-engine still builds cluster with the v2 API. In order to enable the v3 API, we need to add the env var ETCDCTL_API=3 (https://github.com/coreos/etcd/tree/master/etcdctl). With this change we will need to make sure current commands that are used in acs-engine code such as etcdctl cluster-health get changed to be v3 compatible.

Allow node selectivity when scaling down or provide replace command

Is this a request for help?: NO

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

What version of acs-engine?: 0.11.0 (but same thing on 0.12.x AFAIK)

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) Kubernetes 1.7.5+

ACS Engine provides a scale command now, which is great. However, the documentation states "Nodes will always be added or removed from the end of the agent pool."

When scaling down, this is not ideal for every use case. If the use case is simply capacity, then it is fine. However, following the "cattle, not pets" concept i.e. a bad node doesn't deserve attention, it just needs to be shot and replaced, it should be possible to select specific nodes when scaling down.

One example right now with my cluster is that I have Ubuntu nodes that are using the "generic" kernel. I want to update all of these nodes to use the "azure" kernel announced a month or two ago. However, this is impossible since the earlier nodes are generic and the later nodes are azure.

Another example is that I have 2 nodes that are regularly having an issue with omiagent taking 100% CPU. I'd like to remove these specific nodes from my cluster. This is also not possible since these are nodes 2 and 3.

One potentially simple implementation that wouldn't be ideal but could address a few use cases is to scale up by adding nodes to the end of the cluster, but scale down by removing nodes from the beginning. At least this way, one can replace any particular node simply by scaling.

Another possible implementation might be to allow the user to pass the nodes they wish to decommission on the command line, or by adding a specific tag to the relevant VMs.

A third possible implementation, which would actually probably address my use cases even better, is to provide a replace command. Instead of scale, this command would destroy a particular node, and replace it with a new one. This would also have the nice side-effect of keeping the node numbering contiguous.

A last possible implementation is to inspect what is running on each node -- if there are no pods (other than DaemonSet pods) and the node is cordoned, then that node should be a candidate for scaling down before other nodes.

cc: @itowlson

Pods get stuck in `ContainerCreating` on the cluster with Windows nodes of Datacenter 2016

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.21.1

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes, 1.10

What happened:
I created a kubernetes cluster with a specific windows version (2016-Datacenter-with-Containers) using following config file.

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.10"
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "",
      "vmSize": "Standard_D2_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "windowspool2",
        "count": 2,
        "vmSize": "Standard_DS13_v2",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows",
        "osDiskSizeGB": 128
      }
    ],
    "windowsProfile": {
      "adminUsername": "azureuser",
      "adminPassword": "passwordEasy1234!",
      "windowsPublisher": "MicrosoftWindowsServer",
      "windowsOffer": "WindowsServer",
      "windowsSku": "2016-Datacenter-with-Containers"
    },
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}

When I try to create deployments, pods got stuck in ContainerCreating:

kubectl describe pod <pod_name>

Name:           iis-2016-6588ff4745-gplsf
Namespace:      default
Node:           40488k8s9000/10.240.0.4
Start Time:     Tue, 11 Sep 2018 15:18:56 +0800
Labels:         app=iis-2016
                pod-template-hash=2144990301
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  ReplicaSet/iis-2016-6588ff4745
Containers:
  iis:
    Container ID:
    Image:          microsoft/iis:windowsservercore-ltsc2016
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5cspd (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-5cspd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5cspd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=windows
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age               From                   Message
  ----     ------                  ----              ----                   -------
  Normal   Scheduled               36s               default-scheduler      Successfully assigned iis-2016-6588ff4745-gplsf to 40488k8s9000
  Normal   SuccessfulMountVolume   36s               kubelet, 40488k8s9000  MountVolume.SetUp succeeded for volume "default-token-5cspd"
  Warning  FailedCreatePodSandBox  5s (x3 over 36s)  kubelet, 40488k8s9000  Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "kubletwin/pause": Error response from daemon: repository kubletwin/pause not found: does not exist or no pull access

What you expected to happen:
The pods become running.

How to reproduce it (as minimally and precisely as possible):
Follow the steps above.

Anything else we need to know:
Occasionally, kubectl get nodes can only show one Windows node ( 2 VMs created) after creating the cluster .

Publish and use the AKS Base Image for sovereign clouds

https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/defaults.go#L172
https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/defaults.go#L189
https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/defaults.go#L231

validate quota before attempting upgrade/scale

Before we do long-running operations that will potentially increase the resource overhead in a given subscription, we should validate quota to fail fast in the event that more quota is needed.

Incomplete upgrade of vmss agent pool

Is this a request for help?: Yes

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?:
Version: v0.21.1
GitCommit: 132dab3a5
GitTreeState: clean

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.10.6 -> 1.11.2

What happened:
While upgrading a cluster with a vmss agent pool, the upgrade command failed.

It seems that there are two problems:

After upgrading the master, acs-engine has an authentication problem and exits before upgrading the vmss agent pool.
The upgrade of the vmss agent pool updates the vmss model and tags, but does not actually upgrade the existing vmss instances.

What you expected to happen:
I expected the upgrade command to upgrade all nodes in the cluster.

How to reproduce it (as minimally and precisely as possible):
I created a cluster with custom vnet, one master and a vmss agent pool with two nodes, running k8s 1.10.6. Then I ran the upgrade command to upgrade to k8s 1.11.2:

$ acs-engine upgrade --subscription-id xxxx --resource-group mhy-test-k8s-1 --deployment-dir _output/mhy-test-k8s-1 --location westeurope --upgrade-version 1.11.2 --auth-method client_secret --client-id xxxx --client-secret xxxx --debug
INFO[0000] validating...
DEBU[0000] Resolving tenantID for subscriptionID: xxxx
DEBU[0003] Already registered for "Microsoft.Compute"
DEBU[0003] Already registered for "Microsoft.Storage"
DEBU[0003] Already registered for "Microsoft.Network"
INFO[0003] Name suffix: 42855177
INFO[0003] Gathering agent pool names...
INFO[0003] VM k8s-agentpool1-42855177-vmss_0 in VMSS k8s-agentpool1-42855177-vmss has a current tag of Kubernetes:1.10.6 and a desired tag of Kubernetes:1.11.2. Upgrading this node.

INFO[0003] VM k8s-agentpool1-42855177-vmss_1 in VMSS k8s-agentpool1-42855177-vmss has a current tag of Kubernetes:1.10.6 and a desired tag of Kubernetes:1.11.2. Upgrading this node.

INFO[0003] Master VM name: k8s-master-42855177-0, orchestrator: Kubernetes:1.10.6 (MasterVMs)

INFO[0003] Upgrading to Kubernetes version 1.11.2

INFO[0003] Master nodes StorageProfile: ManagedDisks
INFO[0003] Prepping master nodes for upgrade...
INFO[0003] Resource count before running NormalizeResourcesForK8sMasterUpgrade: 10
INFO[0003] Evaluating if agent pool: master, resource: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')))] needs to be removed
INFO[0003] Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')),'/cse', '-master-', copyIndex(variables('masterOffset')))] needs to be removed
INFO[0003] Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')), '/computeAksLinuxBilling')] needs to be removed
INFO[0003] Resource count after running NormalizeResourcesForK8sMasterUpgrade: 10
INFO[0003] Total expected master count: 1
INFO[0003] Master nodes that need to be upgraded: 1
INFO[0003] Master nodes that have been upgraded: 0
INFO[0003] Starting upgrade of master nodes...
INFO[0003] masterNodesInCluster: 1
INFO[0003] Upgrading Master VM: k8s-master-42855177-0
INFO[0003] fetching VM: mhy-test-k8s-1/k8s-master-42855177-0
INFO[0003] found nic name for VM (mhy-test-k8s-1/k8s-master-42855177-0): k8s-master-42855177-nic-0
INFO[0003] deleting VM: mhy-test-k8s-1/k8s-master-42855177-0
INFO[0003] waiting for vm deletion: mhy-test-k8s-1/k8s-master-42855177-0
INFO[0166] deleting nic: mhy-test-k8s-1/k8s-master-42855177-nic-0
INFO[0166] waiting for nic deletion: mhy-test-k8s-1/k8s-master-42855177-nic-0
INFO[0192] deleting managed disk: mhy-test-k8s-1/k8s-master-42855177-0_OsDisk_1_d2e861bd33ae47a4a85afecabba802af
INFO[0252] Master offset: 0

INFO[0252] Master pool set count to: 1 temporarily during upgrade...

INFO[0252] Starting ARM Deployment (master-18-09-05T14.52.24-1865577621). This will take some time...
INFO[1036] Finished ARM Deployment (master-18-09-05T14.52.24-1865577621). Succeeded
INFO[1036] Error validating upgraded master VM: k8s-master-42855177-0
FATA[1036] Error upgrading cluster: No Auth Provider found for name "azure"

The current state:

$ kubectl get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
k8s-agentpool1-42855177-vmss000000   Ready     agent     21m       v1.10.6
k8s-agentpool1-42855177-vmss000001   Ready     agent     23m       v1.10.6
k8s-master-42855177-0                Ready     master    27m       v1.11.2

I restarted the upgrade command:

$ acs-engine upgrade --subscription-id xxxx --resource-group mhy-test-k8s-1 --deployment-dir _output/mhy-test-k8s-1 --location westeurope --upgrade-version 1.11.2 --auth-method client_secret --client-id xxxx --client-secret xxxx --debug
INFO[0000] validating...
DEBU[0000] Resolving tenantID for subscriptionID: xxxx
DEBU[0002] Already registered for "Microsoft.Compute"
DEBU[0002] Already registered for "Microsoft.Storage"
DEBU[0002] Already registered for "Microsoft.Network"
INFO[0003] Name suffix: 42855177
INFO[0003] Gathering agent pool names...
INFO[0003] Master VM name: k8s-master-42855177-0, orchestrator: Kubernetes:1.11.2 (UpgradedMasterVMs)

INFO[0003] Upgrading to Kubernetes version 1.11.2

INFO[0003] Master nodes StorageProfile: ManagedDisks
INFO[0003] Prepping master nodes for upgrade...
INFO[0003] Resource count before running NormalizeResourcesForK8sMasterUpgrade: 10
INFO[0003] Evaluating if agent pool: master, resource: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')))] needs to be removed
INFO[0003] Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')),'/cse', '-master-', copyIndex(variables('masterOffset')))] needs to be removed
INFO[0003] Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')), '/computeAksLinuxBilling')] needs to be removed
INFO[0003] Resource count after running NormalizeResourcesForK8sMasterUpgrade: 10
INFO[0003] Total expected master count: 1
INFO[0003] Master nodes that need to be upgraded: 0
INFO[0003] Master nodes that have been upgraded: 1
INFO[0003] Starting upgrade of master nodes...
INFO[0003] masterNodesInCluster: 1
INFO[0003] Master VM: k8s-master-42855177-0 is upgraded to expected orchestrator version
INFO[0003] Expected master count: 1, Creating 0 more master VMs
INFO[0003] Deploying the agent scale sets ARM template...
INFO[0003] Starting ARM Deployment (agentscaleset-18-09-05T15.07.07-954170151). This will take some time...
INFO[0055] Finished ARM Deployment (agentscaleset-18-09-05T15.07.07-954170151). Succeeded
INFO[0055] Upgrading VMSS k8s-agentpool1-42855177-vmss
INFO[0055] No VMs to upgrade for VMSS k8s-agentpool1-42855177-vmss, skipping
INFO[0055] Completed upgrading all VMSS
INFO[0055] Cluster upgraded successfully to Kubernetes version 1.11.2

DEBU[0055] output: wrote _output/mhy-test-k8s-1/apimodel.json

The current state:

$ kubectl get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
k8s-agentpool1-42855177-vmss000000   Ready     agent     23m       v1.10.6
k8s-agentpool1-42855177-vmss000001   Ready     agent     25m       v1.10.6
k8s-master-42855177-0                Ready     master    29m       v1.11.2

In the Azure portal I can see that the two vmss nodes are now not the latest model:

I could manually scale up the vmss:

Then cordon, drain and delete the original two nodes with kubectl, and finally get an upgraded cluster:

$ kubectl get nodes
NAME                                 STATUS    ROLES     AGE       VERSION
k8s-agentpool1-42855177-vmss000002   Ready     agent     21m       v1.11.2
k8s-agentpool1-42855177-vmss000003   Ready     agent     21m       v1.11.2
k8s-master-42855177-0                Ready     master    1h        v1.11.2

Delete the original vmss instances from the Azure portal:

Anythinge else we need to know:

Add Custom NSG rules to template

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE

What version of acs-engine?:
ALL

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
ALL

What you expected to happen:
When the template is used for generating the resource/deployment files a master NSG is created with some default securitygroup rules; as it is there is no current way to override or add custom rules in the template - they need to be added manually after generation. This would be a very nice feature to add for people trying to lock down vnet/subnets for security + compliance when running ACS clusters for production.

Validate Managed Disk

It seems that our managed disk implementation doesn't include encryption at REST. In addition, the standard docs suggest we're on an old API version.

The scope of this is to document any gaps in our current Managed Disk implementation.

See:

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/managed-disks-overview

Group agent variables into an object variable

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
v0.14.5

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
ALL

Task description
Currently, each agentpool has its own set of variables defined as follows:

    "agentpool2AccountName": "[concat(variables('storageAccountBaseName'), 'agnt2')]",
    "agentpool2AvailabilitySet": "[concat('agentpool2-availabilitySet-', variables('nameSuffix'))]",
    "agentpool2Count": "[parameters('agentpool2Count')]",
    "agentpool2Index": 2,
    "agentpool2Offset": "[parameters('agentpool2Offset')]",
    "agentpool2StorageAccountOffset": "[mul(variables('maxStorageAccountsPerAgent'),variables('agentpool2Index'))]",
    "agentpool2StorageAccountsCount": "[add(div(variables('agentpool2Count'), variables('maxVMsPerStorageAccount')), mod(add(mod(variables('agentpool2Count'), variables('maxVMsPerStorageAccount')),2), add(mod(variables('agentpool2Count'), variables('maxVMsPerStorageAccount')),1)))]",
    "agentpool2SubnetName": "[variables('subnetName')]",
    "agentpool2VMNamePrefix": "[concat(variables('winResourceNamePrefix'), 'acs', add(900,variables('agentpool2Index')))]",
    "agentpool2VMSize": "[parameters('agentpool2VMSize')]",
"agentpool2VnetSubnetID": "[variables('vnetSubnetID')]",

This template definition does not scale well for cluster definitions with many agent pools. We should merge the agent pool variables into an object:

“agentpool2Config”: {
   “accountName”: “[concat(variables('storageAccountBaseName'), 'agnt2')]”,
   “availabilitySet”: "[concat('agentpool2-availabilitySet-', variables('nameSuffix'))]",
    …..
}

And reference as "[variables('agentpool2Config').count]"

The difficulty of this task lies in the fact that it might not be backwards compatible and break upgrade/scale for cluster created with the existing template definition.

incluster api calls to kubernetes.default.svc from master nodes failes on multimaster deployments

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.22.2

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:
on k8s multimaster installation - "masterProfile": {"count": 3, ...}
a pod scheduled to master nodes gets sporadic errors on accessing kubernetes.default.svc (internal endpoint for k8s api) by kubectl or curl
Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout

What you expected to happen:
accessing to kubernetes.default.svc from master nodes to be stable

How to reproduce it (as minimally and precisely as possible):
Deploy 3-master k8s cluster - "masterProfile": {"count": 3, ...} in the json
Ensure that 3 masters are up:

kubectl get nodes -l kubernetes.io/role=master
NAME                    STATUS    ROLES     AGE       VERSION
k8s-master-17552040-0   Ready     master    6d        v1.10.8
k8s-master-17552040-1   Ready     master    6d        v1.10.8
k8s-master-17552040-2   Ready     master    6d        v1.10.8

submit pod like below:

---
apiVersion: v1
kind: Pod
metadata:
  name: kubectl-test
  labels:
    app: kubectl
spec:
  containers:
  - image: lachlanevenson/k8s-kubectl:latest
    name: kubectl
    command:
      - sleep 
      - "1000000"
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: "Exists"
  nodeSelector:
    kubernetes.io/role: master
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

enter to pod's shell and try kubectl or curl to master

kubectl get pods -owide
NAME           READY     STATUS    RESTARTS   AGE       IP             NODE
kubectl-test   1/1       Running   0          14h       10.240.254.6   k8s-master-17552040-1
kubectl exec -it  kubectl-test sh

~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout

Anything else we need to know:
in acs-engine installation such requests go through internal load balance with endpoint to each master on port 4443. There is also iptables nat PREROUTING to redirect 4443 to 443 , see https://github.com/Azure/acs-engine/blob/master/parts/k8s/kubernetesmastercustomdata.yml

{{if gt .MasterProfile.Count 1}}
    # Azure does not support two LoadBalancers(LB) sharing the same nic and backend port.
    # As a workaround, the Internal LB(ILB) listens for apiserver traffic on port 4443 and the External LB(ELB) on port 443
    # This IPTable rule then redirects ILB traffic to port 443 in the prerouting chain
    iptables -t nat -A PREROUTING -p tcp --dport 4443 -j REDIRECT --to-port 443
{{end}}

Looks like PREROUTING chain is not working then it goes to the same host there pod is running:
on my case the pod has been scheduled to k8s-master-17552040-1 , so this node cannot be accessed, but 2 others are ok, and that is why we get the error in ~1/3 occurancies:
accessing local node:

~ # curl -k https://k8s-master-17552040-1:4443
curl: (7) Failed to connect to k8s-master-17552040-1 port 4443: Connection refused

accessing other master nodes:

~ # curl -k https://k8s-master-17552040-0:4443
{
  "kind": "Status",
.....
~ # curl -k https://k8s-master-17552040-1:4443
{
  "kind": "Status",
.....

I tried to change PREROUTING to OUTPUT on all masters iptables -t nat -A OUTPUT -p tcp --dport 4443 -j REDIRECT --to-port 443 , it fixes curl, but does not fix kubectl

nodes with public IP addresses

To support use-cases like agones:

https://github.com/GoogleCloudPlatform/agones

We need nodes w/ public IP addresses. We should make this a 1st class option.

Add unit tests

Currently coverage is 38%. We should add unit tests as much as possible and get closer to 100%. Good places to start:

Add IPVS support

Scope/implementation TBD

https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy/ipvs

Feature Request: aks based on ubuntu 18.04.1

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
Feature Request

When I SSH into a cluster deployed by acs-engine with the default distro, it says:

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Are there any plans to upgrade the default distro (aks?) to a newer version of ubuntu?
Would using the newer version of ubuntu have prevented #3931 / #3933?

Custom vNets prevent Windows containers from reaching public internet

Is this a request for help?:

Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):

Issue

What version of acs-engine?:

v.18.1, v.18.5

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes v1.10.4

What happened:

Deploy a private Windows/Linux hybrid cluster within a custom vNet

Can access public internet from the Windows host
Cannot access public internet from the Windows container (iis/windowsservercore-1803)
Can access public internet from the Linux host
Cannot access public internet from the Linux container.

Deploy a private Windows/Linux hybrid cluster without a custom vNet

Can access public internet from the Windows host
Can access public internet from the Windows container (iis/windowsservercore-1803)
Can access public internet from the Linux host
Cannot access public internet from the Linux container.

What you expected to happen:

Windows containers should be able to reach the public internet.

How to reproduce it (as minimally and precisely as possible):

Deploy a hybrid cluster with a custom vNet.

Anything else we need to know:

Working with a customer this week on getting their ASP .NET 4.6 application containerized and orchestrated on Kubernetes. Toward the end of the week we ran into this blocker with custom vNets. Was hoping to document the issue here in more detail, but not sure where to look on the windows agent nodes for relevant logs/information.

upgrade kube-dns version

Is this a request for help?:

Azure/AKS#445
Based on ^^ customer wants to have a newer kube-dns in AKS.

Is this an ISSUE or FEATURE REQUEST? (choose one):

What version of acs-engine?:

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

VMSS masters can't be upgraded

VMSS master support is currently for experimentation only. We need to be able to upgrade them to call this a 1st class feature.

ETCDCTL_API=3

For convenience, set ETCDCTL_API=3 in bash profile for both root and admin user when etcd is v3.

Health Probe from Azure Load balancer causes handshake error

Is this a request for help?: NO

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: v0.10.0

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.6.6

What happened:

Azure load balancer uses a TCP based Health Probe on port 443 to check availability of the master nodes within the cluster. This causes the following in the api server logs:

{"log":"I0124 15:46:58.711408 1 logs.go:41] http: TLS handshake error from 168.63.129.16:59087: EOF\n","stream":"stderr","time":"2018-01-24T15:46:58.711746619Z"}

What you expected to happen:

Probe does not cause errors to be generated in the logs

How to reproduce it (as minimally and precisely as possible):

Create a cluster with a acs-engine template

Anything else we need to know:

Similar issue resolved on AWS...

kubernetes-retired/kube-aws#604

Dashboard permission errors from `kubectl proxy` on K8s 1.10.3 cluster deployed from acs-engine master

Is this an ISSUE or FEATURE REQUEST?

Issue

What version of acs-engine?

Version: canary
GitCommit: a1fe789
GitTreeState: dirty

(I had one change to the Makefile so that make dev works on my system with a strange net config)

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:

The dashboard at http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/overview?namespace=default doesn't seem to have any permissions for the service account it's running as.

warning
configmaps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list configmaps in the namespace "default" 
close 
warning
persistentvolumeclaims is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list persistentvolumeclaims in the namespace "default" 
close 
warning
secrets is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list secrets in the namespace "default" 
close 
warning
services is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list services in the namespace "default" 
close 
warning
ingresses.extensions is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list ingresses.extensions in the namespace "default" 
close 
warning
daemonsets.apps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list daemonsets.apps in the namespace "default" 
close 
warning
pods is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list pods in the namespace "default" 
close 
warning
events is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list events in the namespace "default" 
close 
warning
deployments.apps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list deployments.apps in the namespace "default" 
close 
warning
replicasets.apps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list replicasets.apps in the namespace "default" 
close 
warning
jobs.batch is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list jobs.batch in the namespace "default" 
close 
warning
cronjobs.batch is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list cronjobs.batch in the namespace "default" 
close 
warning
replicationcontrollers is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list replicationcontrollers in the namespace "default" 
close 
warning
statefulsets.apps is forbidden: User "system:serviceaccount:kube-system:kubernetes-dashboard" cannot list statefulsets.apps in the namespace "default"

What you expected to happen:

Dashboard should work, be able to show nodes, pods, and so on

How to reproduce it (as minimally and precisely as possible):

Create a JSON file with

"orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.10.3"
    },

acs-engine generate, deploy it
Copy .kube/config from master node
kubectl proxy
browse to http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/overview?namespace=default

apimodel validation should check that the provided servicePrincipal has the appropriate permissions

Is this a request for help?:
no

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
0.13

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.9.3

What happened:
After deploying a new cluster with prepopulated ClientId & Secret in the apimodel.json and networkPolicy 'calico' neither master or agents became Ready but stay in the NotReady state

Cluster deployed successfully but failed to schedule or deploy anything because the used credentials did not have the Contributor role in the newly created resource group

What you expected to happen:
An up and running cluster

How to reproduce it (as minimally and precisely as possible):

Deploy a new cluster with prepopulated ClientId & Secret in the apimodel.json with networkPolicy 'calico' (to make it visible from the start)

Anything else we need to know:

My suggestion would be to at least show a warning to the user when the used credentials don't have the correct permissions to the resource group. Better (?) would be to try to assign the correct role during deployment. After all, when you leave it up to acs-engine to create the principal it also assigns the correct role automatically

Consolidate defaulting in a single place

There are two different places defaulting happens today during a generate or deploy flow:

during conversion in convertVLabsProperties (see setVlabsKubernetesDefaults)
during defaulting in SetPropertiesDefaults

We should probably remove the former in favor of the later.

Unable to upgrade/scale AAD integrated clusters

Is this a request for help? Yes

Is this an ISSUE or FEATURE REQUEST?: This is an ISSUE.

What version of acs-engine?: v0.14.4

Orchestrator and version: Kubernetes v1.9.3

What happened:
Run acs-engine to upgrade a Kubernetes v1.9.3 cluster to v1.9.6.
Upgrade starts but errors out with:

[0m[0009] Error deleting agent VM k8s-nodepool1-93283987-0: No Auth Provider found for name "azure"
FATA[0009] Error upgrading cluster: No Auth Provider found for name "azure"

Rerunning the upgrade results in the same error leaving cluster partially upgraded and with an extra node.

What you expected to happen:
ACS-Engine to complete the cluster upgrade.

How to reproduce it (as minimally and precisely as possible):
Deploy a cluster with AAD integration and then run an upgrade.

Anything else we need to know:
I believe this applies also to scaling operations.

Private cluster jumpbox kubeconfig broken when using KeyVault secrets

Is this a request for help?:

Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE

What version of acs-engine?:

0.17.0

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.10.2

What happened:

The kubeconfig file on the jump box contains a base64 encoded string in the following section:

        "users": [
            {
                "name": "acs-np-admin",
                "user": {"client-certificate-data":"<base 64 string here>"}
            }
        ]

When decoded, the string says (with redactions):

"/subscriptions/<sub>/resourceGroups/<resgrp>/providers/Microsoft.KeyVault/vaults/<vault>/secrets/np-kubeConfigCertificate"

What you expected to happen:

The client-certificate-data should contain the actual kubeConfigCertificate data, not the reference to it in KeyVault

How to reproduce it (as minimally and precisely as possible):

Build a private Kubernetes cluster, using KeyVault key paths in acs-engine.json config file, example:

    "certificateProfile": {
      "caCertificate": "/subscriptions/<sub>/resourceGroups/<resrgp>/providers/Microsoft.KeyVault/vaults/<vault>/secrets/np-caCertificate",
      "caPrivateKey": "/subscriptions/<sub>/resourceGroups/<resgrp>providers/Microsoft.KeyVault/vaults/<vault>/secrets/np-caPrivateKey",
...

Anything else we need to know:

https://github.com/Azure/acs-engine/blob/5c5bba85c75130eb7f82df31e2a5c4a97a4feda2/pkg/acsengine/engine.go#L372-L375

Windows kubelet logs have deprecation notices

Is this a request for help?:

Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE

What version of acs-engine?:

Master Branch

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.11.2 - windows

What happened:
When working on issue #3707, Looking at the Windows kubelet logs found depreciation notices:

ERROR: The process "azure-vnet-ipam.exe" not found.
ERROR: The process "azure-vnet.exe" not found.
Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --allow-privileged has been deprecated, will be removed in a future version
Flag --enable-debugging-handlers has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --hairpin-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroups-per-qos has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --enforce-node-allocatable has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
I0821 22:17:41.366071     884 flags.go:27] FLAG: --address="0.0.0.0"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --allow-privileged="true"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --allowed-unsafe-sysctls="[]"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --alsologtostderr="false"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --anonymous-auth="true"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --authentication-token-webhook="false"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --authentication-token-webhook-cache-ttl="2m0s"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --authorization-mode="AlwaysAllow"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --authorization-webhook-cache-authorized-ttl="5m0s"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --authorization-webhook-cache-unauthorized-ttl="30s"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --azure-container-registry-config="c:\\k\\azure.json"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --bootstrap-checkpoint-path=""
I0821 22:17:41.400017     884 flags.go:27] FLAG: --bootstrap-kubeconfig=""
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cadvisor-port="0"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cert-dir="/var/lib/kubelet/pki"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cgroup-driver="cgroupfs"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cgroup-root=""
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cgroups-per-qos="false"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --chaos-chance="0"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --client-ca-file=""
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cloud-config="c:\\k\\azure.json"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cloud-provider="azure"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cluster-dns="[10.0.0.10]"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cluster-domain="cluster.local"
I0821 22:17:41.400017     884 flags.go:27] FLAG: --cni-bin-dir="c:\\k\\azurecni\\bin"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --cni-conf-dir="c:\\k\\azurecni\\netconf"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --config=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --container-log-max-files="5"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --container-log-max-size="10Mi"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --container-runtime="docker"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --container-runtime-endpoint="tcp://localhost:3735"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --containerized="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --contention-profiling="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --cpu-cfs-quota="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --cpu-manager-policy="none"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --cpu-manager-reconcile-period="10s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --docker-disable-shared-pid="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --docker-endpoint=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --dynamic-config-dir=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --enable-controller-attach-detach="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --enable-debugging-handlers="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --enable-server="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --enforce-node-allocatable="[]"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --event-burst="10"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --event-qps="5"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-hard="imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-max-pod-grace-period="0"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-minimum-reclaim=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-pressure-transition-period="5m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-soft=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --eviction-soft-grace-period=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --exit-on-lock-contention="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-allocatable-ignore-eviction="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-bootstrap-kubeconfig=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-check-node-capabilities-before-mount="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-dockershim="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-dockershim-root-directory="/var/lib/dockershim"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-fail-swap-on="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-kernel-memcg-notification="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --experimental-mounter-path=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --fail-swap-on="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --feature-gates=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --file-check-frequency="20s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --google-json-key=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --hairpin-mode="promiscuous-bridge"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --healthz-bind-address="127.0.0.1"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --healthz-port="10248"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --help="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --host-ipc-sources="[*]"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --host-network-sources="[*]"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --host-pid-sources="[*]"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --hostname-override="18706k8s9010"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --http-check-frequency="20s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --image-gc-high-threshold="85"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --image-gc-low-threshold="80"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --image-pull-progress-deadline="20m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --image-service-endpoint=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --iptables-drop-bit="15"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --iptables-masquerade-bit="14"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --keep-terminated-pod-volumes="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kube-api-burst="10"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kube-api-qps="5"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kube-reserved=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kube-reserved-cgroup=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kubeconfig="c:\\k\\config"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --kubelet-cgroups=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --lock-file=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --log-backtrace-at=":0"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --log-dir=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --log-flush-frequency="5s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --logtostderr="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --make-iptables-util-chains="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --manifest-url=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --manifest-url-header=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --master-service-namespace="default"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --max-open-files="1000000"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --max-pods="110"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --maximum-dead-containers="-1"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --maximum-dead-containers-per-container="1"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --minimum-container-ttl-duration="0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --minimum-image-ttl-duration="2m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --network-plugin="cni"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --network-plugin-mtu="0"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --node-ip=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --node-labels=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --node-status-max-images="50"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --node-status-update-frequency="10s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --non-masquerade-cidr="10.0.0.0/8"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --oom-score-adj="-999"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --pod-cidr=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --pod-infra-container-image="kubletwin/pause"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --pod-manifest-path=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --pod-max-pids="-1"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --pods-per-core="0"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --port="10250"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --protect-kernel-defaults="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --provider-id=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --qos-reserved=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --read-only-port="10255"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --really-crash-for-testing="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --redirect-container-streaming="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --register-node="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --register-schedulable="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --register-with-taints=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --registry-burst="10"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --registry-qps="5"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --resolv-conf=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --root-dir="/var/lib/kubelet"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --rotate-certificates="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --runonce="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --runtime-cgroups=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --runtime-request-timeout="10m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --seccomp-profile-root="\\var\\lib\\kubelet\\seccomp"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --serialize-image-pulls="true"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --stderrthreshold="2"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --streaming-connection-idle-timeout="4h0m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --sync-frequency="1m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --system-cgroups=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --system-reserved=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --system-reserved-cgroup=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --tls-cert-file=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --tls-cipher-suites="[]"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --tls-min-version=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --tls-private-key-file=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --v="2"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --version="false"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --vmodule=""
I0821 22:17:41.401015     884 flags.go:27] FLAG: --volume-plugin-dir="c:\\k\\volumeplugins"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --volume-stats-agg-period="1m0s"
I0821 22:17:41.401015     884 flags.go:27] FLAG: --windows-service="false"
I0821 22:17:41.402020     884 feature_gate.go:230] feature gates: &{map[]}
I0821 22:17:41.402020     884 feature_gate.go:230] feature gates: &{map[]}
I0821 22:17:41.410005     884 server.go:408] Version: v1.11.2
I0821 22:17:41.410005     884 feature_gate.go:230] feature gates: &{map[]}
I0821 22:17:41.411000     884 feature_gate.go:230] feature gates: &{map[]}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
deploy k8s 1.11 cluster

Anything else we need to know:
cc: @PatrickLang, dependent on https://github.com/patricklang/acs-engine/tree/patricklang-2627

Pod's Source IP address is not preserved when connecting to another pod through a Service Cluster IP

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: v0.19.3

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) Kubernetes v1.11.0

What happened:

In Kubernetes, you can create Services with specific Cluster IPs, which can be used to communicate with certain Kubernetes pods (set with a certain label).

For example, a pod [1] with the label testid: sourceip-test-240c71c5 can be accessed by other pods through its private IP (e.g.: 10.240.0.72), or through its associated service's Cluster IP (10.0.14.95), which has the selector testid: sourceip-test-240c71c5 [2]

However, when a pod is using the Cluster IP, the client's source IP will be the Windows node's IP address, instead of the container's IP address. This issue does not exist when the client pod accesses the destination pod directly through its private IP; the client's source IP will be the source pod's IP address.

This issue causes some Kubernetes e2e tests to fail (Should preserve source pod IP for traffic thru service cluster IP).

[1] echoserver.yaml https://paste.ubuntu.com/p/xF6BQ47pgf/
[2] service.yaml https://paste.ubuntu.com/p/QmNzy2zSPv/
[3] execpod.yaml https://paste.ubuntu.com/p/mfCrvgVPYf/

What you expected to happen:

The source pod IP address should be preserved when connecting through cluster IP.

How to reproduce it (as minimally and precisely as possible):

Create the [1][2][3] yaml files. The echoserver.yaml file uses an image which contains nginx which will echo the client's information, including the client address.

Then execute:

kubectl create namespace test-echo
kubectl create -n test-echo echoserver.yaml
kubectl create -n test-echo service.yaml
kubectl create -n test-echo execpod.yaml

Get the echoserver's and execpod's private IPs by running:

kubectl exec -n test-echo echoserver-sourceip -- powershell -Command "(Get-NetIPConfiguration).IPv4Address.IPAddress"
kubectl exec -n test-echo execpod-sourceip -- powershell -Command "(Get-NetIPConfiguration).IPv4Address.IPAddress"

Observe that the private IP is preserved when the execpod will create a request to the echoserver through its private IP:

kubectl exec -n test-echo execpod-sourceip -- curl -s $private_ip:8080 | grep client_address

Observe that the private ip is not preserved when the execpod will create a request to the echoserver through its service Cluster IP:

kubectl exec -n test-echo execpod-sourceip -- curl -s 10.0.14.95:8080 | grep client_address

The echoed IP is instead the Windows node's IP, which can be seen by running:

kubectl get nodes -o yaml

Anything else we need to know:

Health probe not set on LB for multi masters

Version: Canary

Stop 2 of 3 masters and kubectl will get timeouts.
This is because Health prob is not net on InternalLBRuleHTTPS.

vendor and use official kubectl drain code, and use the appropriate version depending on the server version

At the moment it looks like acs-engine broadly copy/pasted node draining code from https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/cmd/drain.go to https://github.com/Azure/acs-engine/blob/master/pkg/operations/cordondrainvm.go .

It may be better to get to the point where we are directly vendoring the upstream kubectl draining code, and ideally once for each major supported kubernetes version, such that we can guarantee we're using the corresponding client code against the kubernetes server version.

The downside of this is potential bloat. The upside is better compatibility/supportability. Both the k8s and openshift orchestrators would see the benefit of this.

Add a Windows agent pool flag to enable or disable Windows Update

Is this an ISSUE or FEATURE REQUEST? (choose one):
Feature request needed for conformance testing

What version of acs-engine?: all

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
kubernetes

What happened:

Windows test passes are scheduled with a specific patch + hotfix level that are deployed using the windows-patches extension. Since tests may take a few hours - we cannot allow any other changes or reboots. That can cause test failures, or invalidate test results by testing an unintended version.

What you expected to happen:

There should be an option in windowsProfile to opt in or out of Windows Update. If a user opts-out, they're expected to manually install patches or enable it later with other means.

Make creating a service principal and granting contributor rights a better experience

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:

Version: v0.22.1
GitCommit: 0908e5151
GitTreeState: clean

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.11.2

What happened:
kubectl create secret failed

What you expected to happen:
kubectl create secret should succeed

How to reproduce it (as minimally and precisely as possible):
tl;dr
The service principal needs to be provisioned with Owner permissions via IAM on both the key vault and storage account resources.

Details:

In apimodel.json, set "enableEncryptionWithExternalKms": true on a new resource group.
Run kubectl create secret generic secret1 -n default --from-literal=mykey=mydata.

Error message:

Error from server (InternalError): Internal error occurred: rpc error: code = Unknown desc = failed to get vault, error: failed to get vault, error: keyvault.VaultsClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<clientId>' with object id '<objectId>' does not have authorization to perform action 'Microsoft.KeyVault/vaults/read' over scope '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.KeyVault/vaults/<keyVaultName>'."

Work around that error by running az role assignment create --subscription $SubscriptionId --role Owner --assignee-object-id $servicePrincipalObjectId --scope keyVaultResourceId.
Again run kubectl create secret generic secret1 -n default --from-literal=mykey=mydata.

Second error message:

Error from server (InternalError): Internal error occurred: rpc error: code = Unknown desc = failed to create key, error: storage.AccountsClient#ListKeys: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<clientId>' with object id <objectId>' does not have authorization to perform action 'Microsoft.Storage/storageAccounts/listKeys/action' over scope '/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.Storage/storageAccounts/<keyVaultName>'."

Work around the second error by running az role assignment create --subscription $SubscriptionId --role Owner --assignee-object-id $servicePrincipalObjectId --scope $storageAccountResourceId.

The kubectl create secret command will now work.

When using custom VM Images (UseAgentCustomImage, UseMasterCustomImage), it should possible to provide the subscription ID.

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
v0.21.2

Having the ability to use custom VM images provides more possibilities to enhance the cluster.
The current solution requires to use a Custom Image from the same subscription as where the cluster is to be deployed.

"masterProfile":  {
    ...
    "imageReference": {
        "name": "sourceImageName",
        "resourceGroup": "sourceResourceGroup"
    },
    ...
}

It should be possible to specify the subscription where the Image is in order to use an image from a different subscription, without having to copy it, as described below:

"masterProfile":  {
    ...
    "imageReference": {
        "name": "sourceImageName",
        "resourceGroup": "sourceResourceGroup",
        "subscriptionId": "sourceSubscriptionID" 
    },
    ...
}

The documentation should specify that the servicePrincipal given in the template should have read access to the subscription with the image or to the image itself.

Docker log rotation policy is not set for the Windows agent

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
All

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes, all versions

What happened:
Docker's container log rotation policy is not set for the Windows agents. The containers' log is written into C:\ProgramData\docker\containers folder. If the container produces a huge amount of log, it will use up all the HD spaces on the C: drive, and make the VMAgent down, and VM unreachable from the ARM control plane.

What you expected to happen:
For Linux agent, the default log rotation policy is
"log-opts": {
"max-size": "50m",
"max-file": "5"
}
I expect the same for the Windows agent.

How to reproduce it (as minimally and precisely as possible):
Create a k8s cluster with a Windows agent. Create a pod writing log.
Here is a snippet:

  containers:
  - command:
    - powershell.exe
    - -Command
    - sleep 10; $a="1234567890"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a";
      $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; $a="$a-$a";
      $a="$a-$a"; $a="$a-$a"; $a="$a-$a"; do { write-host "$(date)-$a"; sleep 0.1;
      } until($false)
    image: microsoft/nanoserver
    imagePullPolicy: Always
    name: testwin

Anything else we need to know:
You can create the docker config (C:\ProgramData\docker\config\daemon.json) with the following value:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "50m",
"max-file": "5"
}
}
and restart the docker service.

Promotheus extension doesn't work with Windows

We should

Add validation to error if user tries to deploy an extension on windows that isn't supported (short term)https://github.com/Azure/acs-engine/blob/d0fbe11d334e9c90efceaaf39c1e304e471652d8/pkg/api/vlabs/validate.go#L1024
See how we can extend what is there for Linux to Windows (long term)

kubelet config: --protect-kernel-defaults

We want to deliver the --protect-kernel-defaults kubelet option, but are currently unable because we're building clusters without the expected kernel default settings. On a recently built 1.7 cluster:

kubelet.go:1323] Failed to start ContainerManager [Invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0, Invalid kernel flag: kernel/panic, expected value: 10, actual value: 0, Invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0]

K8s Windows 1803 pod creation fails with "HNS failed with error : Element not found"

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
22.1

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:
I have a cluster w/ k8s 10.8.1 running Windows v1803 nodes. All was well for 6 days with a dozen of my pods deployed, then suddenly I start seeing the error below when deploying new versions of a pod:

Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "dev-fossil-api-777f779f74-fhx88_default" network: Failed to create endpoint: HNS failed with error : Element not found.

I noticed these errors are only seen when k8s tries to schedule new pods on a specific node. I have 2 Windows nodes, one is fine the other exhibits this problem.

Since this seems network related I looked at the ipconfig /all results on each node and do not see a pattern. Here is the output in case it's useful.

First node exhibiting the problem:

Windows IP Configuration
   Host Name . . . . . . . . . . . . : 13832k8s9000
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net

Ethernet adapter vEthernet (Ethernet 2):
   Connection-specific DNS Suffix  . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #2
   Physical Address. . . . . . . . . : 00-0D-3A-03-52-90
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::5542:5b81:1f86:875c%6(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.240.0.34(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.240.0.0
   Lease Obtained. . . . . . . . . . : Tuesday, October 16, 2018 1:37:01 AM
   Lease Expires . . . . . . . . . . : Friday, November 22, 2154 8:29:19 AM
   Default Gateway . . . . . . . . . : 10.240.0.1
   DHCP Server . . . . . . . . . . . : 168.63.129.16
   DHCPv6 IAID . . . . . . . . . . . : 218107194
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-38-4F-57-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : 168.63.129.16
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter vEthernet (nat):
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
   Physical Address. . . . . . . . . : 00-15-5D-3D-2D-76
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::8d3:f1d2:1e7:4805%10(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.22.176.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 201332061
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-38-4F-57-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

Node that is working fine:

Windows IP Configuration
   Host Name . . . . . . . . . . . . : 13832k8s9001
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net

Ethernet adapter vEthernet (Ethernet 2):
   Connection-specific DNS Suffix  . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #2
   Physical Address. . . . . . . . . : 00-0D-3A-03-5A-E5
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::c71:f557:2367:787%8(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.240.0.65(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.240.0.0
   Lease Obtained. . . . . . . . . . : Monday, October 15, 2018 9:14:07 PM
   Lease Expires . . . . . . . . . . : Friday, November 22, 2154 11:38:39 PM
   Default Gateway . . . . . . . . . : 10.240.0.1
   DHCP Server . . . . . . . . . . . : 168.63.129.16
   DHCPv6 IAID . . . . . . . . . . . : 218107194
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-38-4F-61-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : 168.63.129.16
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter vEthernet (nat):
   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
   Physical Address. . . . . . . . . : 00-15-5D-3D-2D-76
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes

Ethernet adapter vEthernet (nat) 2:
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #3
   Physical Address. . . . . . . . . : 00-15-5D-0D-0F-74
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::934:49c5:45af:a7%7(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.29.96.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 318772573
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-38-4F-61-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

Second node exhibiting the problem:

Windows IP Configuration
   Host Name . . . . . . . . . . . . : 13832k8s9002
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net

Ethernet adapter vEthernet (Ethernet 2):
   Connection-specific DNS Suffix  . : i2fxeuipe3wuzilxz3jq1ieqbb.cx.internal.cloudapp.net
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #9
   Physical Address. . . . . . . . . : 00-0D-3A-04-8A-BB
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::3cf5:3c5c:b63e:d6c4%11(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.240.0.96(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.240.0.0
   Lease Obtained. . . . . . . . . . : Monday, October 15, 2018 9:19:12 PM
   Lease Expires . . . . . . . . . . : Friday, November 22, 2154 3:59:03 AM
   Default Gateway . . . . . . . . . : 10.240.0.1
   DHCP Server . . . . . . . . . . . : 168.63.129.16
   DHCPv6 IAID . . . . . . . . . . . : 150998330
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-51-17-95-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : 168.63.129.16
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter vEthernet (nat):
   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
   Physical Address. . . . . . . . . : 00-15-5D-3D-2D-76
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes

Ethernet adapter vEthernet (nat) 2:
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #2
   Physical Address. . . . . . . . . : 00-15-5D-20-BF-B7
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::57a:4521:f014:b58c%8(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.17.224.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 268440925
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-51-17-95-00-15-5D-1C-3A-59
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

What you expected to happen:
For the pods to be created on any Windows node in the cluster.

How to reproduce it (as minimally and precisely as possible):
I have a cluster that exhibits this now and can easily repro it. But I do not have steps to repro it on a new cluster, it feels random-ish.

Anything else we need to know:
Since I have no idea what causes this or how to fix it my only recourse was to taint the affected node so k8s stops trying to schedule to it. I then scaled my cluster up to get a new node on which I can deploy, and stopped the affected node. Since then the new node worked fine for a few days and then exhibited the same problem. I now have two tainted nodes turned off.

I'm very willing to work with someone that can help me debug this.

tests to ensure that --enable-debugging-handlers is false on k8s clusters

We want to verify --enable-debugging-handlers is what we think it is.

Kubernetes Component Configuration

This is a place holder/request for moving acs-engine to use component configuration instead of command line flags for various control plane including kubelet components

The details upstream are there
kubernetes/enhancements#115

@jackfrancis @CecileRobertMichon

Create regression test for GPU support after reboots

Refs #2956, #3688.

We've had issues (in AKS' usage of acs-engine) where GPU support disappears after a node is rebooted. Once this is fixed, let's add a regression test w/ rebooting to prevent a regression.

docs: how to change Service Principal credentials?

Is this a request for help?: YES

Is this an ISSUE or FEATURE REQUEST? (choose one): QUESTION

What version of acs-engine?: 0.16.2

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm): Kubernetes 1.9.7

I've created a Service Principal and then deployed a K8S cluster providing --client-id and --client-secret to set the Service Principal credentials.
Everything goes well, but now I need to change the Service Principal password. I used az ad sp credential reset ... to set a new password and I can login using the new password.
Operations in the cluster that need to talk to the Azure API eventually start to fail (as I expected, because I've changed the SP password).
The question is: How can I use acs-engine to set the new Service Principal password in a running cluster?

I've tried to use acs-engine deploy again with the new password in the --client-secret argument, but it fails with the error Changing property 'customData' is not allowed.

Add k8s upstream conformance tests to e2e suite

Is this a request for help?: yes

Is this an ISSUE or FEATURE REQUEST? (choose one): Feature

What version of acs-engine?: all

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) Kubernetes

What happened:

Itegrate sonobuoy k8s conformance tests into e2e suite

What you expected to happen:

All k8s versions pass

How to reproduce it (as minimally and precisely as possible):

https://github.com/heptio/sonobuoy/tree/master

sonobuoy run
sonobuoy status
sonobuoy retrieve .
mkdir ./results; tar xzf *.tar.gz -C ./results
look at e2e.log and confirm
Ran 125 of 697 Specs in 3304.570 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 572 Skipped PASS

Anything else we need to know:

Audit resource limits on all k8s resources

Ensure that all k8s resources delivered by acs-engine have concrete resource limits defined.

Homebrew formula

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

Any plans for an homebrew formula to install acs-engine?

re-deployment of private cluster will break existing k8s service load balancers

Is this a request for help?:

Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):

ISSUE

What version of acs-engine?:

0.15.1

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.10.0

What happened:
If a cluster has provisioned a load balancer inside Azure targeting an availability set subsequent deployments (incremental) of the acs-engine ARM template will break the load balancer. The backend pool of the load balancer will lose the association to the availability set.

What you expected to happen:
ARM deployment to not touch load balancers it didn't create.

How to reproduce it (as minimally and precisely as possible):

Deploy a k8s cluster using acs-engine
Expose a service using an internal load balancer (external may work also?). Note that the load balancer is created and has your availability set as a backend.
Deploy the same template again in the running cluster RG. (for whatever reason, e.g. trying to fix broken agent node)
Note that the load balancer of the service will stop working.

Anything else we need to know:
Presumably the load balancer configuration is a property of the NIC that is re-deployed. However, that NIC doesn't know that it could have subsequently been associated to a load balancer.

Automate process of pushing new aks-engine releases to MCR

We should add a VSTS pipeline (or other CI) to automate building and pushing the new acs-engine version images to https://hub.docker.com/r/microsoft/acs-engine/tags/.

Instructions for updating the Docker image:

On Windows:

PS> cd releases 
PS> $VERSION="0.19.1" 
PS> docker build --no-cache --build-arg BUILD_DATE=$(Get-Date((Get-Date).ToUniversalTime()) -UFormat "%Y-%m-%dT%H:%M:%SZ") --build-arg ACSENGINE_VERSION="$VERSION" -t microsoft/acs-engine:$VERSION --file .\Dockerfile.linux . 
PS> docker push "microsoft/acs-engine:$VERSION"

Remove kubelet flags deprecated from k8s 1.10/1.11+

Multiple kubelet flags are now deprecated, they should be removed/set via a config file.

May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --address has been deprecated, This parameter should be set via the config file specified by the Kubelet's
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --allow-privileged has been deprecated, will be removed in a future version
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kub
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --authorization-mode has been deprecated, This parameter should be set via the config file specified by the
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --cadvisor-port has been deprecated, The default will change to 0 (disabled) in 1.12, and the cadvisor port
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --cgroups-per-qos has been deprecated, This parameter should be set via the config file specified by the Ku
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kub
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubele
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kub
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --enforce-node-allocatable has been deprecated, This parameter should be set via the config file specified
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --event-qps has been deprecated, This parameter should be set via the config file specified by the Kubelet'
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --eviction-hard has been deprecated, This parameter should be set via the config file specified by the Kube
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kube
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --image-gc-high-threshold has been deprecated, This parameter should be set via the config file specified b
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --image-gc-low-threshold has been deprecated, This parameter should be set via the config file specified by
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --keep-terminated-pod-volumes has been deprecated, will be removed in a future version
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --node-status-update-frequency has been deprecated, This parameter should be set via the config file specif
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --non-masquerade-cidr has been deprecated, will be removed in a future version
May 01 16:48:03 k8s-agentsa-87245758-1 kubelet[8346]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the

1.11 support was added in #2814

Ref:

https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

better acceleratedNetworking whitelist

We have a static whitelist introduced here:

Azure/acs-engine#3600

One way to make this better would be to incorporate az output parsing using this script:

pkg/acsengine/Get-AzureConstants.py

Or, even better, figure out an API we can call to determine up-to-date per-VM support.

Add customized config: use /mnt (the ephemeral/temp drive) for pod volumes

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

What happened:
we’re processing large assets (video files of 10GB+ size) in Kubernetes. Running on agent nodes (Azure H8 instances), we have vast amounts of space under /mnt (on the ephemeral / local SSD). Unfortunately, it seems ACS/AKS agents always have their docker directories on the OS disk…

Is there a way to specify that writable filesystem of a k8s pod should be on the fast, large, ephemeral disk (/dev/sdb1) instead of the OS disk?

forensic@k8s-agent-EDD077D5-0:~$ df -H
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 32G 4.1G 28G 13% /
/dev/sdb1 1.1T 75M 1.1T 1% /mnt

There is a manual way:
https://github.com/andyzhangx/Demo/tree/master/acs-engine#change-varlibdocker-to-mnt-which-has-100gb-disk-space

@JiangtianLi @jackfrancis , we may add this as a customized config into acs-engine?
I could see there are lots of customers asking for this feature, see:
Azure/acs-engine#1307
Azure/acs-engine#543

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

azure / aks-engine Goto Github PK

aks-engine's Introduction

AKS Engine - Deprecated tool for self-managed Kubernetes on Azure

Project status

Support

Code of conduct

Data Collection

aks-engine's People

Stargazers

Watchers

Forkers

aks-engine's Issues

Is this a request for help?: NO

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

What version of acs-engine?: Latest

Is this a request for help?: yes

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: v0.21.1

Is this a request for help?: No

What version of acs-engine?: ALL

Is this a request for help?: NO

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

What version of acs-engine?: v0.14.5

Is this a request for help?: yes

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: v0.22.2

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): Feature Request

Is this a request for help?: no

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: 0.13

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

What version of acs-engine?: v0.21.2

Is this a request for help?: Yes

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

What version of acs-engine?: All

Is this a request for help?: Yes

Is this an ISSUE or FEATURE REQUEST? (choose one): Issue

What version of acs-engine?: 22.1

Is this a request for help?: Yes

Is this an ISSUE or FEATURE REQUEST? (choose one): FEATURE REQUEST

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
Latest

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.21.1

Is this a request for help?:
No

What version of acs-engine?:
ALL

Is this a request for help?:
NO

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
v0.14.5

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.22.2

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
Feature Request

Is this a request for help?:
no

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
0.13

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

Is this a request for help?:
No

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
v0.21.2

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST

What version of acs-engine?:
All

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
22.1

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
FEATURE REQUEST