GithubHelp home page GithubHelp logo

terraform-google-modules / terraform-google-kubernetes-engine Goto Github PK

View Code? Open in Web Editor NEW
1.1K 53.0 1.1K 5.04 MB

Configures opinionated GKE clusters

Home Page: https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google

License: Apache License 2.0

Makefile 0.30% HCL 86.59% Python 1.75% Shell 2.03% Ruby 4.25% Smarty 0.03% Go 5.06%
cft-terraform compute containers

terraform-google-kubernetes-engine's Introduction

Terraform Kubernetes Engine Module

This module handles opinionated Google Cloud Platform Kubernetes Engine cluster creation and configuration with Node Pools, IP MASQ, Network Policy, etc. The resources/services/activations/deletions that this module will create/trigger are:

  • Create a GKE cluster with the provided addons
  • Create GKE Node Pool(s) with provided configuration and attach to cluster
  • Replace the default kube-dns configmap if stub_domains are provided
  • Activate network policy if network_policy is true
  • Add ip-masq-agent configmap with provided non_masquerade_cidrs if configure_ip_masq is true

Sub modules are provided for creating private clusters, beta private clusters, and beta public clusters as well. Beta sub modules allow for the use of various GKE beta features. See the modules directory for the various sub modules.

Compatibility

This module is meant for use with Terraform 1.3+ and tested using Terraform 1.0+. If you find incompatibilities using Terraform >=1.3, please open an issue.

If you haven't upgraded to 1.3 and need a Terraform 0.13.x-compatible version of this module, the last released version intended for Terraform 0.13.x is [27.0.0].

If you haven't upgraded to 0.13 and need a Terraform 0.12.x-compatible version of this module, the last released version intended for Terraform 0.12.x is 12.3.0.

Usage

There are multiple examples included in the examples folder but simple usage is as follows:

# google_client_config and kubernetes provider must be explicitly specified like the following.
data "google_client_config" "default" {}

provider "kubernetes" {
  host                   = "https://${module.gke.endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.gke.ca_certificate)
}

module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google"
  project_id                 = "<PROJECT ID>"
  name                       = "gke-test-1"
  region                     = "us-central1"
  zones                      = ["us-central1-a", "us-central1-b", "us-central1-f"]
  network                    = "vpc-01"
  subnetwork                 = "us-central1-01"
  ip_range_pods              = "us-central1-01-gke-01-pods"
  ip_range_services          = "us-central1-01-gke-01-services"
  http_load_balancing        = false
  network_policy             = false
  horizontal_pod_autoscaling = true
  filestore_csi_driver       = false
  dns_cache                  = false

  node_pools = [
    {
      name                        = "default-node-pool"
      machine_type                = "e2-medium"
      node_locations              = "us-central1-b,us-central1-c"
      min_count                   = 1
      max_count                   = 100
      local_ssd_count             = 0
      spot                        = false
      disk_size_gb                = 100
      disk_type                   = "pd-standard"
      image_type                  = "COS_CONTAINERD"
      enable_gcfs                 = false
      enable_gvnic                = false
      logging_variant             = "DEFAULT"
      auto_repair                 = true
      auto_upgrade                = true
      service_account             = "project-service-account@<PROJECT ID>.iam.gserviceaccount.com"
      preemptible                 = false
      initial_node_count          = 80
      accelerator_count           = 1
      accelerator_type            = "nvidia-l4"
      gpu_driver_version          = "LATEST"
      gpu_sharing_strategy        = "TIME_SHARING"
      max_shared_clients_per_gpu = 2
    },
  ]

  node_pools_oauth_scopes = {
    all = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = true
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = true
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Then perform the following commands on the root folder:

  • terraform init to get the plugins
  • terraform plan to see the infrastructure plan
  • terraform apply to apply the infrastructure build
  • terraform destroy to destroy the built infrastructure

Inputs

Name Description Type Default Required
add_cluster_firewall_rules Create additional firewall rules bool false no
add_master_webhook_firewall_rules Create master_webhook firewall rules for ports defined in firewall_inbound_ports bool false no
add_shadow_firewall_rules Create GKE shadow firewall (the same as default firewall rules with firewall logs enabled). bool false no
additional_ip_range_pods List of names of the additional secondary subnet ip ranges to use for pods list(string) [] no
authenticator_security_group The name of the RBAC security group for use with Google security groups in Kubernetes RBAC. Group name must be in format [email protected] string null no
boot_disk_kms_key The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool, if not overridden in node_pools. This should be of the form projects/[KEY_PROJECT_ID]/locations/[LOCATION]/keyRings/[RING_NAME]/cryptoKeys/[KEY_NAME]. For more information about protecting resources with Cloud KMS Keys please see: https://cloud.google.com/compute/docs/disks/customer-managed-encryption string null no
cluster_autoscaling Cluster autoscaling configuration. See more details
object({
enabled = bool
autoscaling_profile = string
min_cpu_cores = number
max_cpu_cores = number
min_memory_gb = number
max_memory_gb = number
gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))
auto_repair = bool
auto_upgrade = bool
disk_size = optional(number)
disk_type = optional(string)
image_type = optional(string)
strategy = optional(string)
max_surge = optional(number)
max_unavailable = optional(number)
node_pool_soak_duration = optional(string)
batch_soak_duration = optional(string)
batch_percentage = optional(number)
batch_node_count = optional(number)
enable_secure_boot = optional(bool, false)
enable_integrity_monitoring = optional(bool, true)
})
{
"auto_repair": true,
"auto_upgrade": true,
"autoscaling_profile": "BALANCED",
"disk_size": 100,
"disk_type": "pd-standard",
"enable_integrity_monitoring": true,
"enable_secure_boot": false,
"enabled": false,
"gpu_resources": [],
"image_type": "COS_CONTAINERD",
"max_cpu_cores": 0,
"max_memory_gb": 0,
"min_cpu_cores": 0,
"min_memory_gb": 0
}
no
cluster_dns_domain The suffix used for all cluster service records. string "" no
cluster_dns_provider Which in-cluster DNS provider should be used. PROVIDER_UNSPECIFIED (default) or PLATFORM_DEFAULT or CLOUD_DNS. string "PROVIDER_UNSPECIFIED" no
cluster_dns_scope The scope of access to cluster DNS records. DNS_SCOPE_UNSPECIFIED (default) or CLUSTER_SCOPE or VPC_SCOPE. string "DNS_SCOPE_UNSPECIFIED" no
cluster_ipv4_cidr The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. string null no
cluster_resource_labels The GCE resource labels (a map of key/value pairs) to be applied to the cluster map(string) {} no
config_connector Whether ConfigConnector is enabled for this cluster. bool false no
configure_ip_masq Enables the installation of ip masquerading, which is usually no longer required when using aliasied IP addresses. IP masquerading uses a kubectl call, so when you have a private cluster, you will need access to the API server. bool false no
create_service_account Defines if service account specified to run nodes should be created. bool true no
database_encryption Application-layer Secrets Encryption settings. The object format is {state = string, key_name = string}. Valid values of state are: "ENCRYPTED"; "DECRYPTED". key_name is the name of a CloudKMS key. list(object({ state = string, key_name = string }))
[
{
"key_name": "",
"state": "DECRYPTED"
}
]
no
datapath_provider The desired datapath provider for this cluster. By default, DATAPATH_PROVIDER_UNSPECIFIED enables the IPTables-based kube-proxy implementation. ADVANCED_DATAPATH enables Dataplane-V2 feature. string "DATAPATH_PROVIDER_UNSPECIFIED" no
default_max_pods_per_node The maximum number of pods to schedule per node number 110 no
deletion_protection Whether or not to allow Terraform to destroy the cluster. bool true no
description The description of the cluster string "" no
disable_default_snat Whether to disable the default SNAT to support the private use of public IP addresses bool false no
disable_legacy_metadata_endpoints Disable the /0.1/ and /v1beta1/ metadata server endpoints on the node. Changing this value will cause all node pools to be recreated. bool true no
dns_cache The status of the NodeLocal DNSCache addon. bool false no
enable_binary_authorization Enable BinAuthZ Admission controller bool false no
enable_cilium_clusterwide_network_policy Enable Cilium Cluster Wide Network Policies on the cluster bool false no
enable_confidential_nodes An optional flag to enable confidential node config. bool false no
enable_cost_allocation Enables Cost Allocation Feature and the cluster name and namespace of your GKE workloads appear in the labels field of the billing export to BigQuery bool false no
enable_default_node_pools_metadata Whether to enable the default node pools metadata key-value pairs such as cluster_name and node_pool bool true no
enable_identity_service Enable the Identity Service component, which allows customers to use external identity providers with the K8S API. bool false no
enable_intranode_visibility Whether Intra-node visibility is enabled for this cluster. This makes same node pod to pod traffic visible for VPC network bool false no
enable_kubernetes_alpha Whether to enable Kubernetes Alpha features for this cluster. Note that when this option is enabled, the cluster cannot be upgraded and will be automatically deleted after 30 days. bool false no
enable_l4_ilb_subsetting Enable L4 ILB Subsetting on the cluster bool false no
enable_mesh_certificates Controls the issuance of workload mTLS certificates. When enabled the GKE Workload Identity Certificates controller and node agent will be deployed in the cluster. Requires Workload Identity. bool false no
enable_network_egress_export Whether to enable network egress metering for this cluster. If enabled, a daemonset will be created in the cluster to meter network egress traffic. bool false no
enable_resource_consumption_export Whether to enable resource consumption metering on this cluster. When enabled, a table will be created in the resource export BigQuery dataset to store resource consumption data. The resulting table can be joined with the resource usage table or with BigQuery billing export. bool true no
enable_shielded_nodes Enable Shielded Nodes features on all nodes in this cluster bool true no
enable_tpu Enable Cloud TPU resources in the cluster. WARNING: changing this after cluster creation is destructive! bool false no
enable_vertical_pod_autoscaling Vertical Pod Autoscaling automatically adjusts the resources of pods controlled by it bool false no
filestore_csi_driver The status of the Filestore CSI driver addon, which allows the usage of filestore instance as volumes bool false no
firewall_inbound_ports List of TCP ports for admission/webhook controllers. Either flag add_master_webhook_firewall_rules or add_cluster_firewall_rules (also adds egress rules) must be set to true for inbound-ports firewall rules to be applied. list(string)
[
"8443",
"9443",
"15017"
]
no
firewall_priority Priority rule for firewall rules number 1000 no
fleet_project (Optional) Register the cluster with the fleet in this project. string null no
gateway_api_channel The gateway api channel of this cluster. Accepted values are CHANNEL_STANDARD and CHANNEL_DISABLED. string null no
gce_pd_csi_driver Whether this cluster should enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. bool true no
gcs_fuse_csi_driver Whether GCE FUSE CSI driver is enabled for this cluster. bool false no
gke_backup_agent_config Whether Backup for GKE agent is enabled for this cluster. bool false no
grant_registry_access Grants created cluster-specific service account storage.objectViewer and artifactregistry.reader roles. bool false no
horizontal_pod_autoscaling Enable horizontal pod autoscaling addon bool true no
http_load_balancing Enable httpload balancer addon bool true no
identity_namespace The workload pool to attach all Kubernetes service accounts to. (Default value of enabled automatically sets project-based pool [project_id].svc.id.goog) string "enabled" no
initial_node_count The number of nodes to create in this cluster's default node pool. number 0 no
ip_masq_link_local Whether to masquerade traffic to the link-local prefix (169.254.0.0/16). bool false no
ip_masq_resync_interval The interval at which the agent attempts to sync its ConfigMap file from the disk. string "60s" no
ip_range_pods The name of the secondary subnet ip range to use for pods string n/a yes
ip_range_services The name of the secondary subnet range to use for services string n/a yes
issue_client_certificate Issues a client certificate to authenticate to the cluster endpoint. To maximize the security of your cluster, leave this option disabled. Client certificates don't automatically rotate and aren't easily revocable. WARNING: changing this after cluster creation is destructive! bool false no
kubernetes_version The Kubernetes version of the masters. If set to 'latest' it will pull latest available version in the selected region. string "latest" no
logging_enabled_components List of services to monitor: SYSTEM_COMPONENTS, APISERVER, CONTROLLER_MANAGER, SCHEDULER, and WORKLOADS. Empty list is default GKE configuration. list(string) [] no
logging_service The logging service that the cluster should write logs to. Available options include logging.googleapis.com, logging.googleapis.com/kubernetes (beta), and none string "logging.googleapis.com/kubernetes" no
maintenance_end_time Time window specified for recurring maintenance operations in RFC3339 format string "" no
maintenance_exclusions List of maintenance exclusions. A cluster can have up to three list(object({ name = string, start_time = string, end_time = string, exclusion_scope = string })) [] no
maintenance_recurrence Frequency of the recurring maintenance window in RFC5545 format. string "" no
maintenance_start_time Time window specified for daily or recurring maintenance operations in RFC3339 format string "05:00" no
master_authorized_networks List of master authorized networks. If none are provided, disallow external access (except the cluster node IPs, which GKE automatically whitelists). list(object({ cidr_block = string, display_name = string })) [] no
monitoring_enable_managed_prometheus Configuration for Managed Service for Prometheus. Whether or not the managed collection is enabled. bool false no
monitoring_enable_observability_metrics Whether or not the advanced datapath metrics are enabled. bool false no
monitoring_enabled_components List of services to monitor: SYSTEM_COMPONENTS, APISERVER, SCHEDULER, CONTROLLER_MANAGER, STORAGE, HPA, POD, DAEMONSET, DEPLOYMENT, STATEFULSET, KUBELET, CADVISOR and DCGM. In beta provider, WORKLOADS is supported on top of those 12 values. (WORKLOADS is deprecated and removed in GKE 1.24.) KUBELET and CADVISOR are only supported in GKE 1.29.3-gke.1093000 and above. Empty list is default GKE configuration. list(string) [] no
monitoring_observability_metrics_relay_mode Mode used to make advanced datapath metrics relay available. string null no
monitoring_service The monitoring service that the cluster should write metrics to. Automatically send metrics from pods in the cluster to the Google Cloud Monitoring API. VM metrics will be collected by Google Compute Engine regardless of this setting Available options include monitoring.googleapis.com, monitoring.googleapis.com/kubernetes (beta) and none string "monitoring.googleapis.com/kubernetes" no
name The name of the cluster (required) string n/a yes
network The VPC network to host the cluster in (required) string n/a yes
network_policy Enable network policy addon bool false no
network_policy_provider The network policy provider. string "CALICO" no
network_project_id The project ID of the shared VPC's host (for shared vpc support) string "" no
network_tags (Optional) - List of network tags applied to auto-provisioned node pools. list(string) [] no
node_metadata Specifies how node metadata is exposed to the workload running on the node string "GKE_METADATA" no
node_pools List of maps containing node pools list(map(any))
[
{
"name": "default-node-pool"
}
]
no
node_pools_cgroup_mode Map of strings containing cgroup node config by node-pool name map(string)
{
"all": "",
"default-node-pool": ""
}
no
node_pools_labels Map of maps containing node labels by node-pool name map(map(string))
{
"all": {},
"default-node-pool": {}
}
no
node_pools_linux_node_configs_sysctls Map of maps containing linux node config sysctls by node-pool name map(map(string))
{
"all": {},
"default-node-pool": {}
}
no
node_pools_metadata Map of maps containing node metadata by node-pool name map(map(string))
{
"all": {},
"default-node-pool": {}
}
no
node_pools_oauth_scopes Map of lists containing node oauth scopes by node-pool name map(list(string))
{
"all": [
"https://www.googleapis.com/auth/cloud-platform"
],
"default-node-pool": []
}
no
node_pools_resource_labels Map of maps containing resource labels by node-pool name map(map(string))
{
"all": {},
"default-node-pool": {}
}
no
node_pools_resource_manager_tags Map of maps containing resource manager tags by node-pool name map(map(string))
{
"all": {},
"default-node-pool": {}
}
no
node_pools_tags Map of lists containing node network tags by node-pool name map(list(string))
{
"all": [],
"default-node-pool": []
}
no
node_pools_taints Map of lists containing node taints by node-pool name map(list(object({ key = string, value = string, effect = string })))
{
"all": [],
"default-node-pool": []
}
no
non_masquerade_cidrs List of strings in CIDR notation that specify the IP address ranges that do not use IP masquerading. list(string)
[
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16"
]
no
notification_config_topic The desired Pub/Sub topic to which notifications will be sent by GKE. Format is projects/{project}/topics/{topic}. string "" no
notification_filter_event_type Choose what type of notifications you want to receive. If no filters are applied, you'll receive all notification types. Can be used to filter what notifications are sent. Accepted values are UPGRADE_AVAILABLE_EVENT, UPGRADE_EVENT, and SECURITY_BULLETIN_EVENT. list(string) [] no
project_id The project ID to host the cluster in (required) string n/a yes
ray_operator_config The Ray Operator Addon configuration for this cluster.
object({
enabled = bool
logging_enabled = optional(bool, false)
monitoring_enabled = optional(bool, false)
})
{
"enabled": false,
"logging_enabled": false,
"monitoring_enabled": false
}
no
region The region to host the cluster in (optional if zonal cluster / required if regional) string null no
regional Whether is a regional cluster (zonal cluster if set false. WARNING: changing this after cluster creation is destructive!) bool true no
registry_project_ids Projects holding Google Container Registries. If empty, we use the cluster project. If a service account is created and the grant_registry_access variable is set to true, the storage.objectViewer and artifactregsitry.reader roles are assigned on these projects. list(string) [] no
release_channel The release channel of this cluster. Accepted values are UNSPECIFIED, RAPID, REGULAR and STABLE. Defaults to REGULAR. string "REGULAR" no
remove_default_node_pool Remove default node pool while setting up the cluster bool false no
resource_usage_export_dataset_id The ID of a BigQuery Dataset for using BigQuery as the destination of resource usage export. string "" no
security_posture_mode Security posture mode. Accepted values are DISABLED and BASIC. Defaults to DISABLED. string "DISABLED" no
security_posture_vulnerability_mode Security posture vulnerability mode. Accepted values are VULNERABILITY_DISABLED, VULNERABILITY_BASIC, and VULNERABILITY_ENTERPRISE. Defaults to VULNERABILITY_DISABLED. string "VULNERABILITY_DISABLED" no
service_account The service account to run nodes as if not overridden in node_pools. The create_service_account variable default value (true) will cause a cluster-specific service account to be created. This service account should already exists and it will be used by the node pools. If you wish to only override the service account name, you can use service_account_name variable. string "" no
service_account_name The name of the service account that will be created if create_service_account is true. If you wish to use an existing service account, use service_account variable. string "" no
service_external_ips Whether external ips specified by a service will be allowed in this cluster bool false no
shadow_firewall_rules_log_config The log_config for shadow firewall rules. You can set this variable to null to disable logging.
object({
metadata = string
})
{
"metadata": "INCLUDE_ALL_METADATA"
}
no
shadow_firewall_rules_priority The firewall priority of GKE shadow firewall rules. The priority should be less than default firewall, which is 1000. number 999 no
stack_type The stack type to use for this cluster. Either IPV4 or IPV4_IPV6. Defaults to IPV4. string "IPV4" no
stateful_ha Whether the Stateful HA Addon is enabled for this cluster. bool false no
stub_domains Map of stub domains and their resolvers to forward DNS queries for a certain domain to an external DNS server map(list(string)) {} no
subnetwork The subnetwork to host the cluster in (required) string n/a yes
timeouts Timeout for cluster operations. map(string) {} no
upstream_nameservers If specified, the values replace the nameservers taken by default from the node’s /etc/resolv.conf list(string) [] no
windows_node_pools List of maps containing Windows node pools list(map(string)) [] no
zones The zones to host the cluster in (optional if regional cluster / required if zonal) list(string) [] no

Outputs

Name Description
ca_certificate Cluster ca certificate (base64 encoded)
cluster_id Cluster ID
dns_cache_enabled Whether DNS Cache enabled
endpoint Cluster endpoint
fleet_membership Fleet membership (if registered)
gateway_api_channel The gateway api channel of this cluster.
horizontal_pod_autoscaling_enabled Whether horizontal pod autoscaling enabled
http_load_balancing_enabled Whether http load balancing enabled
identity_namespace Workload Identity pool
identity_service_enabled Whether Identity Service is enabled
instance_group_urls List of GKE generated instance groups
intranode_visibility_enabled Whether intra-node visibility is enabled
location Cluster location (region if regional cluster, zone if zonal cluster)
logging_service Logging service used
master_authorized_networks_config Networks from which access to master is permitted
master_version Current master kubernetes version
mesh_certificates_config Mesh certificates configuration
min_master_version Minimum master kubernetes version
monitoring_service Monitoring service used
name Cluster name
network_policy_enabled Whether network policy enabled
node_pools_names List of node pools names
node_pools_versions Node pool versions by node pool name
region Cluster region
release_channel The release channel of this cluster
service_account The service account to default running nodes as if not overridden in node_pools.
tpu_ipv4_cidr_block The IP range in CIDR notation used for the TPUs
type Cluster type (regional / zonal)
vertical_pod_autoscaling_enabled Whether vertical pod autoscaling enabled
zones List of zones in which the cluster resides

node_pools variable

Use this variable for provisioning linux based node pools. For Windows based node pools use windows_node_pools

The node_pools variable takes the following parameters:

Name Description Default Requirement
accelerator_count The number of the guest accelerator cards exposed to this instance 0 Optional
accelerator_type The accelerator type resource to expose to the instance " " Optional
auto_repair Whether the nodes will be automatically repaired true Optional
autoscaling Configuration required by cluster autoscaler to adjust the size of the node pool to the current cluster usage true Optional
auto_upgrade Whether the nodes will be automatically upgraded true (if cluster is regional) Optional
boot_disk_kms_key The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool. This should be of the form projects/[KEY_PROJECT_ID]/locations/[LOCATION]/keyRings/[RING_NAME]/cryptoKeys/[KEY_NAME]. " " Optional
cpu_manager_policy The CPU manager policy on the node. One of "none" or "static". "static" Optional
cpu_cfs_quota Enforces the Pod's CPU limit. Setting this value to false means that the CPU limits for Pods are ignored null Optional
cpu_cfs_quota_period The CPU CFS quota period value, which specifies the period of how often a cgroup's access to CPU resources should be reallocated null Optional
pod_pids_limit Controls the maximum number of processes allowed to run in a pod. The value must be greater than or equal to 1024 and less than 4194304. null Optional
enable_confidential_nodes An optional flag to enable confidential node config. false Optional
disk_size_gb Size of the disk attached to each node, specified in GB. The smallest allowed disk size is 10GB 100 Optional
disk_type Type of the disk attached to each node (e.g. 'pd-standard' or 'pd-ssd') pd-standard Optional
effect Effect for the taint Required
enable_gcfs Google Container File System (gcfs) has to be enabled for image streaming to be active. Needs image_type to be set to COS_CONTAINERD. false Optional
enable_gvnic gVNIC (GVE) is an alternative to the virtIO-based ethernet driver. Needs a Container-Optimized OS node image. false Optional
enable_integrity_monitoring Enables monitoring and attestation of the boot integrity of the instance. The attestation is performed against the integrity policy baseline. This baseline is initially derived from the implicitly trusted boot image when the instance is created. true Optional
enable_secure_boot Secure Boot helps ensure that the system only runs authentic software by verifying the digital signature of all boot components, and halting the boot process if signature verification fails. false Optional
gpu_driver_version Mode for how the GPU driver is installed null Optional
gpu_partition_size Size of partitions to create on the GPU null Optional
image_type The image type to use for this node. Note that changing the image type will delete and recreate all nodes in the node pool COS_CONTAINERD Optional
initial_node_count The initial number of nodes for the pool. In regional or multi-zonal clusters, this is the number of nodes per zone. Changing this will force recreation of the resource. Defaults to the value of min_count " " Optional
key The key required for the taint Required
logging_variant The type of logging agent that is deployed by default for newly created node pools in the cluster. Valid values include DEFAULT and MAX_THROUGHPUT. DEFAULT Optional
local_ssd_count The amount of local SSD disks that will be attached to each cluster node and may be used as a hostpath volume or a local PersistentVolume. 0 Optional
local_ssd_ephemeral_storage_count The amount of local SSD disks that will be attached to each cluster node and assigned as scratch space as an emptyDir volume. If unspecified, ephemeral storage is backed by the cluster node boot disk. 0 Optional
local_nvme_ssd_count Number of raw-block local NVMe SSD disks to be attached to the node.Each local SSD is 375 GB in size. If zero, it means no raw-block local NVMe SSD disks to be attached to the node. 0 Optional
machine_type The name of a Google Compute Engine machine type e2-medium Optional
min_cpu_platform Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. " " Optional
enable_confidential_storage Enabling Confidential Storage will create boot disk with confidential mode. false Optional
max_count Maximum number of nodes in the NodePool. Must be >= min_count. Cannot be used with total limits. 100 Optional
total_max_count Total maximum number of nodes in the NodePool. Must be >= min_count. Cannot be used with per zone limits. null Optional
max_pods_per_node The maximum number of pods per node in this cluster null Optional
strategy The upgrade stragey to be used for upgrading the nodes. Valid values of state are: SURGE or BLUE_GREEN "SURGE" Optional
threads_per_core Optional The number of threads per physical core. To disable simultaneous multithreading (SMT) set this to 1. If unset, the maximum number of threads supported per core by the underlying processor is assumed null Optional
enable_nested_virtualization Whether the node should have nested virtualization null Optional
max_surge The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. Only works with SURGE strategy. 1 Optional
max_unavailable The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. Only works with SURGE strategy. 0 Optional
node_pool_soak_duration Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). Only works with BLUE_GREEN strategy. "3600s" Optional
batch_soak_duration Soak time after each batch gets drained, with the default being zero seconds. Only works with BLUE_GREEN strategy. "0s" Optional
batch_node_count Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. Cannot be used together with batch_percentage. Only works with BLUE_GREEN strategy. 1 Optional
batch_percentage Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. Cannot be used together with batch_node_count. Only works with BLUE_GREEN strategy. null Optional
min_count Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true. Cannot be used with total limits. 1 Optional
total_min_count Total minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true. Cannot be used with per zone limits. null Optional
name The name of the node pool Required
placement_policy Placement type to set for nodes in a node pool. Can be set as COMPACT if desired Optional
pod_range The name of the secondary range for pod IPs. Optional
enable_private_nodes Whether nodes have internal IP addresses only. Optional
node_count The number of nodes in the nodepool when autoscaling is false. Otherwise defaults to 1. Only valid for non-autoscaling clusters Required
node_locations The list of zones in which the cluster's nodes are located. Nodes must be in the region of their regional cluster or in the same region as their cluster's zone for zonal clusters. Defaults to cluster level node locations if nothing is specified " " Optional
node_metadata Options to expose the node metadata to the workload running on the node Optional
preemptible A boolean that represents whether or not the underlying node VMs are preemptible false Optional
spot A boolean that represents whether the underlying node VMs are spot false Optional
service_account The service account to be used by the Node VMs " " Optional
tags The list of instance tags applied to all nodes Required
value The value for the taint Required
version The Kubernetes version for the nodes in this pool. Should only be set if auto_upgrade is false " " Optional
location_policy Location policy specifies the algorithm used when scaling-up the node pool. Location policy is supported only in 1.24.1+ clusters. " " Optional
secondary_boot_disk Image of a secondary boot disk to preload container images and data on new nodes. For detail see documentation. gcfs_config must be enabled=true for this feature to work. Optional
queued_provisioning Makes nodes obtainable through the ProvisioningRequest API exclusively. Optional
gpu_sharing_strategy The type of GPU sharing strategy to enable on the GPU node. Accepted values are: "TIME_SHARING" and "MPS". Optional
max_shared_clients_per_gpu The maximum number of containers that can share a GPU. Optional

windows_node_pools variable

The windows_node_pools variable takes the same parameters as node_pools but is reserved for provisioning Windows based node pools only. This variable is introduced to satisfy a specific requirement for the presence of at least one linux based node pool in the cluster before a windows based node pool can be created.

Requirements

Before this module can be used on a project, you must ensure that the following pre-requisites are fulfilled:

  1. Terraform and kubectl are installed on the machine where Terraform is executed.
  2. The Service Account you execute the module with has the right permissions.
  3. The Compute Engine and Kubernetes Engine APIs are active on the project you will launch the cluster in.
  4. If you are using a Shared VPC, the APIs must also be activated on the Shared VPC host project and your service account needs the proper permissions there.

The project factory can be used to provision projects with the correct APIs active and the necessary Shared VPC connections.

Software Dependencies

Kubectl

Terraform and Plugins

gcloud

Some submodules use the terraform-google-gcloud module. By default, this module assumes you already have gcloud installed in your $PATH. See the module documentation for more information.

Configure a Service Account

In order to execute this module you must have a Service Account with the following project roles:

  • roles/compute.viewer
  • roles/compute.securityAdmin (only required if add_cluster_firewall_rules is set to true)
  • roles/container.clusterAdmin
  • roles/container.developer
  • roles/iam.serviceAccountAdmin
  • roles/iam.serviceAccountUser
  • roles/resourcemanager.projectIamAdmin (only required if service_account is set to create)

Additionally, if service_account is set to create and grant_registry_access is requested, the service account requires the following role on the registry_project_ids projects:

  • roles/resourcemanager.projectIamAdmin

Enable APIs

In order to operate with the Service Account you must activate the following APIs on the project where the Service Account was created:

  • Compute Engine API - compute.googleapis.com
  • Kubernetes Engine API - container.googleapis.com

terraform-google-kubernetes-engine's People

Contributors

aaron-lane avatar adrienthebo avatar akshaybathija-github avatar alekhyal avatar alexkonkin avatar apeabody avatar bharathkkb avatar chrissng avatar cloud-foundation-bot avatar cloud-pharaoh avatar coryodaniel avatar dependabot[bot] avatar dev25 avatar drfaust92 avatar ericyz avatar ingwarr avatar jberlinsky avatar kopachevsky avatar lauraseidler avatar marko7460 avatar mkubaczyk avatar morgante avatar omazin avatar paulpalamarchuk avatar release-please[bot] avatar renovate[bot] avatar richardmcsong avatar shashindran avatar skinlayers avatar thefirstofthe300 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-google-kubernetes-engine's Issues

Add configuration flag for `enable_binary_authorization`

https://www.terraform.io/docs/providers/google/r/container_cluster.html#enable_binary_authorization

Suggest plumbing the flag for it with the default as false. It allows for enabling the BinAuthZ Admission controller for being able to set a whitelist policy for approved container registry paths and also enforce image signing if desired. Note that can safely be set to be true if desired as the GCP project's default BinAuthZ is allow all/permissive.

Suggest enabling metadata-concealment by default

Strongly suggest that the metadata-concealment proxy be enabled to protect cluster privilege escalation attacks (without this control in place, any pod has the ability to use the instance metadata API to obtain the kubelet's credentials which provides a path to gain access to all cluster secrets).

https://www.terraform.io/docs/providers/google/r/container_cluster.html#node_metadata
e.g.

workload_metadata_config {
  node_metadata = "SECURE"
}

See: https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment https://www.4armed.com/blog/kubeletmein-kubelet-hacking-tool/ and https://www.qwiklabs.com/focuses/5158?parent=catalog for more background info on why this control is so important.

Terraform Google provider >= 2.4 throws error: "node_pool": conflicts with remove_default_node_pool

I'm just trying out this module for a PoC and ran into some difficulties following the v2.0.0 release. I created a cluster OK using with v1.0.1 but just upgraded and am now getting this error when running a plan.

I've stripped back my config and still see the same issue with the main example config in the
README.md

Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool

Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

I'm not setting remove_default_node_pool option but have tried explicitly setting both true and false and get the same error.

Terraform Version

Terraform v0.11.13
+ provider.google v2.5.1
+ provider.google-beta v2.5.1
+ provider.kubernetes v1.6.2
+ provider.null v2.1.1
+ provider.random v2.1.1

Affected Resource(s)

  • module.gke.google_container_cluster.primary

Terraform Configuration Files

provider "google" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
provider "google-beta" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
module "gke" {

  source                     = "terraform-google-modules/kubernetes-engine/google"
  project_id                 = "<PROJECT ID>"
  name                       = "gke-test-1"
  region                     = "us-central1"
  zones                      = ["us-central1-a", "us-central1-b", "us-central1-f"]
  network                    = "vpc-01"
  subnetwork                 = "us-central1-01"
  ip_range_pods              = "us-central1-01-gke-01-pods"
  ip_range_services          = "us-central1-01-gke-01-services"
  http_load_balancing        = false
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  remove_default_node_pool   = true

  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "n1-standard-2"
      min_count          = 1
      max_count          = 100
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = true
      service_account    = "project-service-account@<PROJECT ID>.iam.gserviceaccount.com"
      preemptible        = false
      initial_node_count = 80
    },
  ]

  node_pools_oauth_scopes = {
    all = []

    default-node-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Expected Behavior

Terraform creates plan successfully.

Actual Behavior

terraform plan errors with

Warning: module.gke.data.google_container_engine_versions.region: "region": [DEPRECATED] Use location instead



Warning: module.gke.data.google_container_engine_versions.zone: "zone": [DEPRECATED] Use location instead



Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] Use location instead



Warning: module.gke.google_container_node_pool.pools: "region": [DEPRECATED] use location instead



Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool



Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

Steps to Reproduce

  1. terraform plan

Autoscaling cannot be disabled

It appears that users must either specify min_node_count and max_node_count or have them default to 1 and 100; the autoscaling block is always created. Is this by design, or might we in future be able to specify a static node count and disable autoscaling?

Update examples to include example of setting service account

I was setting up a cluster with this module using the examples as a reference and kept running into issues about the service account not existing (I set up the project with the project factory).

As noted in the README1 the node pools should specify the service account. I got tripped up on this since it wasn't in the examples.

It could be helpful to add a note in the examples or in the examples' README.

Looks like issue #23 would also resolve this.

Footnotes:

  1. pardon the blame view, needed to link to the line in the readme

stub_domains test failed

As of current master (3f7527e when writing this), with a following test/fixtures/shared/terraform.tfvars:

project_id="redacted-project-name"
credentials_path_relative="../../../credentials.json"
region="europe-west1"
zones=["europe-west1-c"]
compute_engine_service_account="[email protected]"

make docker_build_kitchen_terraform, make docker_run, kitchen create and kitchen converge passed fine.

kitchen verify passed fine for deploy_service, node_pool, shared_vpc, simple_regional and simple_zonal. It failed at stub_domains as follows:

Verifying stub_domains

Profile: stub_domain
Version: (not specified)
Target:  local://

  ×  gcloud: Google Compute Engine GKE configuration (1 failed)
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` exit_status should eq 0
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` stderr should eq ""
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster is running
     ×  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster has the expected addon settings
     
     expected: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{}}
          got: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{"disabled"=>true}}
     
     (compared using ==)
     
     Diff:
     @@ -1,5 +1,5 @@
      "horizontalPodAutoscaling" => {},
      "httpLoadBalancing" => {},
      "kubernetesDashboard" => {"disabled"=>true},
     -"networkPolicyConfig" => {},
     +"networkPolicyConfig" => {"disabled"=>true},

  ✔  kubectl: Kubernetes configuration
     ✔  kubernetes configmap kube-dns is created by Terraform
     ✔  kubernetes configmap kube-dns reflects the stub_domains configuration
     ✔  kubernetes configmap ipmasq is created by Terraform
     ✔  kubernetes configmap ipmasq is configured properly


Profile Summary: 1 successful control, 1 control failure, 0 controls skipped
Test Summary: 7 successful, 1 failure, 0 skipped
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Verify failed on instance <stub-domains-local>.  Please see .kitchen/logs/stub-domains-local.log for more details
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

I have the .kitchen/logs/kitchen.log and kitchen diagnose --all output copied, so let me know if you need that.

Support for preemptible nodes

Is it in the roadmap for support?
I think this can solve without minor issues:

node_config {
    preemptible    = "${lookup(var.node_pools[count.index], "preemptible", false)}"
}

Private cluster timeout

Private cluster fails with a timeout when posting the config map (when network_policy is set to true:

1 error(s) occurred:

* module.gke.kubernetes_config_map.ip-masq-agent: 1 error(s) occurred:

* kubernetes_config_map.ip-masq-agent: Post https://192.168.134.2/api/v1/namespaces/kube-system/configmaps: dial tcp 192.168.134.2:443: i/o timeout

I think this might be linked to the missing feature for VPC Transitive peering that prevents an on-prem network from reaching the GKE master when it's private.

"region" is deprecated in google_container_cluster

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

You guys require the region setting in your module:

$ terraform plan
Error: module "gke": missing required argument "region"

But:

Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Forced recreating node_pool at any plan

Once a simple zonal cluster with a node_pool is correctly created, if I run again a terraform apply without any changes, terraform want to destroy and recreate cluster and node_pool.

This is my configuration:

module "kubernetes-cluster" {
  source  = "terraform-google-modules/kubernetes-engine/google"
  version = "0.4.0"
  project_id         = "${var.project_id}"
  name               = "internal-cluster"
  regional           = false
  region             = "${var.region}"
  zones              = ["${var.zone}"]
  network            = "${var.network_name}"
  subnetwork         = "${var.network_name}-subnet-01"
  ip_range_pods      = "${var.network_name}-pod-secondary-range"
  ip_range_services  = "${var.network_name}-services-secondary-range"
  kubernetes_version = "${var.kubernetes_version}"
  node_version       = "${var.kubernetes_version}"
  remove_default_node_pool = true

  providers = {
    google = "google-beta"
  }

  node_pools = [
    {
      name            = "forge-pool"
      machine_type    = "n1-standard-2"
      min_count       = 1
      max_count       = 3
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = false
      service_account = "gke-monitoring@${var.project_id}.iam.gserviceaccount.com"
    },
  ]

  node_pools_labels = {
    all = {}

    forge-pool = {
      scope = "forge"
    }
  }

  node_pools_taints = {
    all = []

    forge-pool = []
  }

  node_pools_tags = {
    all = []

    forge-pool = []
  }
}

As you probably note (presence of remove_default_node_pool in cluster config) I applied patch at #15 and, after that the problem is a little bit mitigated and terraform want to destroy and recreate only the node_pool. This is the output of a terraform plan

Terraform will perform the following actions:

-/+ module.kubernetes-cluster.google_container_node_pool.zonal_pools (new resource required)
      id:                                              "europe-west3-b/internal-cluster/forge-pool" => <computed> (forces new resource)
      autoscaling.#:                                   "1" => "1"
      autoscaling.0.max_node_count:                    "3" => "3"
      autoscaling.0.min_node_count:                    "1" => "1"
      cluster:                                         "internal-cluster" => "internal-cluster"
      initial_node_count:                              "1" => "1"
      instance_group_urls.#:                           "1" => <computed>
      management.#:                                    "1" => "1"
      management.0.auto_repair:                        "true" => "true"
      management.0.auto_upgrade:                       "false" => "false"
      max_pods_per_node:                               "110" => <computed>
      name:                                            "forge-pool" => "forge-pool"
      name_prefix:                                     "" => <computed>
      node_config.#:                                   "1" => "1"
      node_config.0.disk_size_gb:                      "100" => "100"
      node_config.0.disk_type:                         "pd-standard" => "pd-standard"
      node_config.0.guest_accelerator.#:               "0" => <computed>
      node_config.0.image_type:                        "COS" => "COS"
      node_config.0.labels.%:                          "3" => "3"
      node_config.0.labels.cluster_name:               "internal-cluster" => "internal-cluster"
      node_config.0.labels.node_pool:                  "forge-pool" => "forge-pool"
      node_config.0.labels.scope:                      "forge" => "forge"
      node_config.0.local_ssd_count:                   "0" => <computed>
      node_config.0.machine_type:                      "n1-standard-2" => "n1-standard-2"
      node_config.0.metadata.%:                        "1" => "0" (forces new resource)
      node_config.0.metadata.disable-legacy-endpoints: "true" => "" (forces new resource)
      node_config.0.oauth_scopes.#:                    "1" => "1"
      node_config.0.oauth_scopes.1733087937:           "https://www.googleapis.com/auth/cloud-platform" => "https://www.googleapis.com/auth/cloud-platform"
      node_config.0.preemptible:                       "false" => "false"
      node_config.0.service_account:                   "[email protected]" => "[email protected]"
      node_config.0.tags.#:                            "2" => "2"
      node_config.0.tags.0:                            "gke-internal-cluster" => "gke-internal-cluster"
      node_config.0.tags.1:                            "gke-internal-cluster-forge-pool" => "gke-internal-cluster-forge-pool"
      node_count:                                      "1" => <computed>
      project:                                         "xxx-infrastructure" => "xxx-infrastructure"
      version:                                         "1.12.5-gke.5" => "1.12.5-gke.5"
      zone:                                            "europe-west3-b" => "europe-west3-b"


Plan: 1 to add, 0 to change, 1 to destroy.

Is this could be related to this hashicorp/terraform-provider-google#2115?

Any help will be appreciated.

Default GKE version failing

The default version for GKE master (1.10.6-gke.2) has been deprecated.

Currently running with the default version throws:

* google_container_cluster.primary: googleapi: Error 400: Master version "1.10.6-gke.2" is unsupported., badRequest

Error: "node_config.0.taint" - module doesn't work with 2.0.0 google provider

Error: module.demo-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Error: module.test-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Deprecation warnings from 1.20.0 google provider became errors for 2.0.0, as expected. To fix this we might need to change the way providers are defined inside the module, right? Any quick fix for such thing?

Create a service account for nodes if one isn't provided.

We need a holistic solution here which permanently removes the dependency on the default service account. Including:

  1. Adding a top-level variable of service_account which accepts three values:
    a. the email of a custom Service Account,
    b. default-compute (the default compute service account), or
    c. create - automatically creates a service account for use

This top-level service account will be default for all node pools which don't explicitly provided.

These flags can optionally be implemented incrementally.

wait-for-cluster.sh throw error jq: command not found

I've been testing this module for a PoC, but it seems that a script in deployment process assume the presence of a binary which is not present anymore in COS.

The complete error :

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Terraform version

Terraform v0.11.13
+ provider.google v2.3.0
+ provider.google-beta v2.3.0
+ provider.kubernetes v1.6.2
+ provider.null v2.1.2
+ provider.random v2.1.2

Configuration file

provider.tf

provider "random" {
  version = "~> 2.1"
}

provider "google" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

provider "google-beta" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

main.tf

resource "random_id" "gke_id" {
  byte_length = 4
  prefix      = "gke-test-"
}

module "gke" {
  source            = "terraform-google-modules/kubernetes-engine/google"
  version           = "2.0.1"
  project_id        = "<PROJECT_ID>"
  name              = "${random_id.gke_id.hex}"
  region            = "europe-west1"
  network           = "default"
  subnetwork        = "default"
  ip_range_pods     = "p1"
  ip_range_services = "p2"
}

Expected Behavior

Terraform creates plan successfully without error.

Actual Behavior

terraform apply errors with

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Steps to Reproduce

terraform plan -out=test-plan && terraform apply test-plan

Importing existing kube clusters seems to be forbidden

importing clusters seems to fail with this error:

terraform import module.prod_cluster.google_container_cluster.primary $PROJECT/$REGION/$CLUSTER

Error: Provider "kubernetes" depends on non-var "local.cluster_endpoint". Providers for import can currently
only depend on variables or must be hardcoded. You can stop import
from loading configurations by specifying `-config=""`.

This is a huge problem because it breaks the terraform import functionality

Fails with dynamic service account variable

This config fails:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

While this config works:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

Use TravisCI to run checks

It should be really straightforward to setup a TravisCI integration so tests run on every PR and give a status check; this would greatly help external contributors (who may not be familiar with testing) to refine their PR's until tests Pass and then finally ask for review.

Note that TravisCi integration is free for Public repos on Github.com

can't provision a cluster with less than than 2 "zones"

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

With code as pasted in the bottom-most section of this ticket, which seems valid as per your docs and examples, I'm getting the following error at terraform plan:

Error: Error running plan: 1 error(s) occurred:

* module.gke.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: Resource 'google_container_cluster.zonal_primary' does not have attribute 'additional_zones' for variable 'google_container_cluster.zonal_primary.*.additional_zones'

Why is user forced to specify more than 1 zone? This is supposed to be a generic module after all.

variable "default-scopes" {
  type = "list"

  default = [
    "https://www.googleapis.com/auth/monitoring",
    "https://www.googleapis.com/auth/devstorage.read_only",
    "https://www.googleapis.com/auth/logging.write",
    "https://www.googleapis.com/auth/service.management.readonly",
    "https://www.googleapis.com/auth/servicecontrol",
    "https://www.googleapis.com/auth/trace.append",
  ]
}

module "gke" {
  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine?ref=master"
  ip_range_pods              = ""                 #TODO
  ip_range_services          = ""                 #TODO
  name                       = "cluster-you-name-it"
  network                    = "vpc-you-name-it"
  project_id                 = "project-you-name-it"
  region                     = "europe-west1"
  subnetwork                 = "vpc-sub-you-name-it"
  zones                      = ["europe-west1-c"]
  monitoring_service         = "monitoring.googleapis.com/kubernetes"
  logging_service            = "logging.googleapis.com/kubernetes"
  maintenance_start_time     = "04:00"
  kubernetes_version         = "1.11.3-gke.18"
  horizontal_pod_autoscaling = true
  regional                   = false

  node_pools = [
    {
      name               = "core"
      machine_type       = "n1-standard-2"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 1
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
    },
    {
      name               = "cc"
      machine_type       = "custom-6-23040"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 0
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
      preemptible        = true
      node_version       = "1.10.9-gke.7"
    },
  ]

  node_pools_labels = {
    all  = {}
    core = {}
    cc   = {}
  }

  node_pools_tags = {
    all  = []
    core = []
    cc   = []
  }

  node_pools_taints = {
    all  = []
    core = []
    cc   = []
  }
}

can't provision cluster with shared vpc

terraform apply with error

1 error(s) occurred:

  • module.gke.google_container_cluster.primary: 1 error(s) occurred:

  • google_container_cluster.primary: googleapi: Error 400: The user does not have access to service account "[email protected]". Ask a project owner to grant you the iam.serviceAccountUser role on the service account., badRequest

The user what ?

sevices account [email protected] owner project

my config with share-vpc

module "gke" {
source = "./modules/tf-moduel-k8s-2.0.0"
project_id = "${var.project}"
name = "${local.cluster_type}-cluster${var.cluster_name_suffix}"
region = "${var.region}"
network = "${var.network}"
network_project_id = "${var.network_project_id}"
subnetwork = "${var.subnetwork}"
ip_range_pods = "${var.ip_range_pods}"
ip_range_services = "${var.ip_range_services}"
service_account = "${var.compute_engine_service_account}"
}

Tks

Suggest `network_policy` be enabled by default

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/master/autogen/variables.tf#L106

Suggest enabling it by default on newly created clusters by default. Recommended by the CIS GCP Benchmark to be enabled. See: https://www.cisecurity.org/benchmark/google_cloud_computing_platform/

Pros: Allows for support of NetworkPolicy objects if they are applied without having to modify the cluster.
Cons: The slight overhead of Calico agents and Typha in the cluster if NetworkPolicy is unused.

Change node pool resources to use for_each

In the circumstance that a node pool must be replaced and workload transitioned with zero downtime, having all node pools defined in a list and launched with a single google_container_node_pool resource makes it difficult to carry out.

For example, if two node pools are defined and node pool 0 must be replaced. It seems like the only safe approach is to temporarily create a third node pool, transition the workload from node pool 0 to the new node pool, change or replace node pool 0, transition the workload back to node pool 0, and finally destroy the temporary node pool so you're back to two. Which is fine, but it's probably more work than necessary.

Another consideration is if there are many node pools defined and you need to destroy node pool 0 completely. There is no way to do this without affecting node pool 1 and any other subsequently defined node pools.

This adaptability seems to be a limitation of this module. But also perhaps the use of count in resources in general. It might be better if users had control over their node pools as independent resources. However, leveraging count and a list of node pools is likely the only way to make any GKE module flexible enough for broad adoption given the current limitations of Terraform.

I'm opening this issue to see if there is a better way. Or if we can come up with a way to improve conditions.

No self link, apply is failing

We're following almost exactly the readme spec and terraform plan works fine, but when we run apply we get this error: * module.gke.google_container_cluster.primary: Resource 'data.google_compute_subnetwork.gke_subnetwork' not found for variable 'data.google_compute_subnetwork.gke_subnetwork.self_link'

Is this a versioning problem on our end maybe? We've tried going through the other issues and the readme but have struggled to find the source of our problem. for the provider we have:

provider "google-beta" {
  project     = "project-name"
  region      = "region-name"
}

And in main.tf we define the module with that specified provider:

module "gke" {
  providers {
    google = "google-beta"
  }
...

Any help is much appreciated, thanks.

Enable easier shared VPC usage

Ran into this issue with creating a GKE cluster.

`1 error(s) occurred:

  • module.prod-gke-cluster.google_container_cluster.primary: 1 error(s) occurred:
  • google_container_cluster.primary: googleapi: Error 404: Not found: GAIA email lookup., notFound`

TF Debug output in attached file
debug.log

module "dev-gke-cluster" {
  source = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  name = "gke-dev"
  kubernetes_version = "latest"
  project_id = "${google_project.dev-project.project_id}"
  region = "${var.region}"
  network = "${google_compute_network.dev-network.name}"
  subnetwork = "${google_compute_subnetwork.dev-app-subnet.name}"
  network_project_id = "${google_compute_shared_vpc_host_project.shared_vpc.project}"
  ip_range_pods = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.0.range_name}"
  ip_range_services = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.1.range_name}"
  regional = true
  horizontal_pod_autoscaling = true
  network_policy = true 
  master_authorized_networks_config = [{
    cidr_blocks = [{ cidr_block = "0.0.0.0/0" display_name = "all" }]  
  }]
  node_pools = [
    {
      name = "default-node-pool"
      machine_type    = "n1-standard-2"
      min_count = 1
      max_count = 3
      disk_size_gb = 100
      disk_type = "pd-standard"
      image_type = "COS"
      auto_repair = true
      auto_upgrade = true
    },
  ]
}

Setup gcloud credentials when running interactively within docker

Pushing this out to a separate issue so that we can get #20 merged.

When using make docker_run, we should source the test config so that CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE is set in the shell environment.

diff --git i/Makefile w/Makefile
index 6e16919..73ff8bf 100644
--- i/Makefile
+++ w/Makefile
@@ -119,7 +119,7 @@ docker_run:
        docker run --rm -it \
                -v $(CURDIR):/cftk/workdir \
                ${DOCKER_IMAGE_KITCHEN_TERRAFORM}:${DOCKER_TAG_KITCHEN_TERRAFORM} \
-               /bin/bash
+               /bin/bash --rcfile ${TEST_CONFIG_FILE_LOCATION}

 .PHONY: docker_create
 docker_create: docker_build_terraform docker_build_kitchen_terraform

Alternately, we can set the environment variables within the tests, as such:

ENV['CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE'] = File.expand_path(
  File.join("../..", credentials_path),
  __FILE__)

Though this feels somewhat brittle.

Can not use dynamic Service Account

I am trying to use this module, based on the provided examples, but can't seem to get it to work. It used to be fine few days ago, but not anymore.

Here is the error I get:

Warning: module.gke-cluster.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.


Warning: module.gke-cluster.google_container_node_pool.pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.pools: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.zonal_pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.project.google_project.project: "app_engine": [DEPRECATED] Use the google_app_engine_application resource instead.



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.taint: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.taint: should be a list

And here is the terraform used:

  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  project_id                 = "${local.project_id}"
  name                       = "${local.gke_cluster_name}"
  network                    = "${local.network_name}"
  subnetwork                 = "${local.subnetwork_name}"
  region                     = "${var.default_region}"
  zones                      = "${var.default_zones}"
  ip_range_pods              = "${var.default_region}-gke-01-pods"
  ip_range_services          = "${var.default_region}-gke-01-services"
  http_load_balancing        = true
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  kubernetes_version         = "1.10.6-gke.6"



  node_pools = [
    {
      name            = "default-node-pool"
      machine_type    = "${var.node_pool_machine_type}"
      min_count       = 1
      max_count       = 10
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = true
      service_account = "${module.project.service_account_name}"
    },
  ]

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Cannot update a node pool count

Tried to create a private cluster with 1 node pool with 3 nodes initially, and a min = 1, max = 3 --> this worked.
Tried to update the node pool with 30 nodes initially, and a min=1, max=300 --> This fails with the following error:
Screen Shot 2019-03-26 at 2 54 52 PM

Workaround: comment out the GKE cluster module call, terraform apply, uncomment the GKE cluster module call.

Outputs break after failed create

* module.gke-dev-cluster.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: concat: unexpected type list in list of type string in:

${concat(slice(var.zones,1,length(var.zones)), list(list()))}

Document requirements for running test-kitchen tests

We need to document the requirements/dependencies for running tests with test-kitchen.

For reference, this is my current set of fixtures:

locals {
  project_name = "thebo-gkefixture"
  project_id   = "${local.project_name}-${random_id.project-suffix.hex}"
}

resource "random_id" "project-suffix" {
  byte_length = 2
}

resource "google_project" "main" {
  name            = "${local.project_name}"
  project_id      = "${local.project_id}"
  folder_id       = "${var.folder_id}"
  billing_account = "${var.billing_account}"
}

resource "google_project_services" "main" {
  project  = "${google_project.main.project_id}"
  services = [
    "compute.googleapis.com",
    "bigquery-json.googleapis.com",
    "container.googleapis.com",
    "containerregistry.googleapis.com",
    "oslogin.googleapis.com",
    "pubsub.googleapis.com",
    "storage-api.googleapis.com",
  ]
}

module "network" {
    source = "github.com/terraform-google-modules/terraform-google-network"

    project_id      = "${google_project.main.project_id}"
    network_name    = "vpc-01"
    //shared_vpc_host = "true"

    subnets = [
        {
            subnet_name   = "us-east4-01"
            subnet_ip     = "10.20.16.0/20"
            subnet_region = "us-east4"
        },
    ]

    secondary_ranges = {
        "us-east4-01" = [
            {
                range_name    = "us-east4-01-gke-01-pod"
                ip_cidr_range = "172.18.16.0/20"
            },
            {
                range_name    = "us-east4-01-gke-01-service"
                ip_cidr_range = "172.18.32.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-pod"
                ip_cidr_range = "172.18.48.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-service"
                ip_cidr_range = "172.18.64.0/20"
            },
        ]
    }
}

Networks and Subnetworks are updated everytime

Everytime the module is executed, it try to change the network and subnetworks internal links. Effectively this causes nothing, but the internal Google Api only writes the pattern: projects/project-id/global/networks/network-name

~ module.kubernetes.google_container_cluster.primary
      network:    "projects/my-project/global/networks/my-network" => "https://www.googleapis.com/compute/v1/projects/my-project/global/networks/my-network"
      subnetwork: "projects/my-project/regions/southamerica-east1/subnetworks/my-subnet" => "https://www.googleapis.com/compute/v1/projects/my-project/regions/southamerica-east1/subnetworks/my-subnet"

I believe this commit may be the cause. This change was made by some issue? I couldn't find a related one

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.