microsoft / windows-container-networking Goto Github PK
View Code? Open in Web Editor NEWContainer networking plugins for Windows containers
License: MIT License
Container networking plugins for Windows containers
License: MIT License
Describe the bug
Most non-Flannel-based CNI config examples include a key named master
which contains the name of a host interface such as "Ethernet"
.
Although this may have been used at some point in the past, the CNI plugin code currently does NOT check/use this key in any way.
Looking through the main code (excluding the unit test code), the main place where the config is parsed is through wincni.ParseNetworkConfig, which returns a wincni.NetworkConfig
which doesn't load said argument at all.
FWIW I have run the containerd intergration test suite (uses the nat
plugin for tests) using a random interface name on my fork and all tests ran fine.
Additionally, my colleague @ionutbalutoiu has recently removed any interface renaming behavior on the Flannel-based CI and also confirmed that Flannel doesn't use said argument either.
To Reproduce
Steps to reproduce the behavior:
runhcs
for running a full container.master
to any random string which isn't an interface name on the host.Expected behavior
For the value of the master
key to be used internally in some way, and setting it to a random non-existent interface name should have lead to the CNI plugins failing or at logging a warning about it.
CNI Version
Tried with CNI config version 0.2.0 and 0.3.0.
Additional context
There is a chance that the master
config param was once used in the past for automatically adding routes to the host interface by name for the HNS endpoint acting as the gateway on SDNBridge networks, similar to how the connectivity tests handle it?
There is also a chance that the example configs featuring the master
key were initially meant for a "higher-level" tool internal to Microsoft which used it, only to then later call the plugins in turn. (though I have no insider knowledge of such a tool)
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
Following results of audit, refactor the code to elimanate overlap and improve quality of code.
The network creation code in our testing has a lot of routing/policies/ips hardcoded directly for the test node we have. This should be changed to be more independent.
In order for flannel + sdnoverlay to work in overlay mode this extra endpoint policy should be added:
{
"Name":"EndpointPolicy",
"Value":{
"Type":"ProviderAddress",
"Settings":{
"ProviderAddress":"host ip"
}
}
}
Due to this issue microsoft/Windows-Containers#210, I think the v0.2.0
release is really outdated. Would you please update the latest version to the Release of this repo?
We trying to run windows containers, but the kubernetes pod cannot be up during network setup. The CNI congiurations are:
{
"cniVersion": "0.2.0",
"name": "nat",
"type": "nat",
"master": "Ethernet",
"ipam": {
"subnet": "10.172.204.0/8",
"routes": [
{
"GW": "10.172.204.178"
}
]
},
"capabilities": {
"portMappings": true,
"dns": true
}
}
The output from wincni.log
{"level":"debug","msg":"[cni-net] Plugin wcn-net version .","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[net] Network interface: {Index:14 MTU:1500 Name:Ethernet HardwareAddr:54:bf:64:98:1d:c6 Flags:up|broadcast|multicast} with IP addresses: [2404:f801:10:124:883:4be4:12ff:53d4/64 fe80::883:4be4:12ff:53d4/64 10.172.204.178/24]","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128 127.0.0.1/8]","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[net] Network interface: {Index:21 MTU:1500 Name:vEthernet (nat) HardwareAddr:00:15:5d:5f:9a:47 Flags:up|broadcast|multicast} with IP addresses: [fe80::c1ed:f301:c00e:2af6/64 172.27.0.1/20]","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[cni-net] Plugin started.","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[cni-net] Processing ADD command with args {ContainerID:dcc9786445818f4a843162bfaf0c164d4617ece21ddc692c6ffb554462bafac6 Netns:a19ed0b3-b149-4fd2-9457-66815769ad99 IfName:eth0 Args:K8S_POD_INFRA_CONTAINER_ID=dcc9786445818f4a843162bfaf0c164d4617ece21ddc692c6ffb554462bafac6;K8S_POD_UID=f3c02187-8def-4311-9b2c-13fd75b5440f;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-flannel-ds-windows-amd64-tbvjd Path:c:/opt/cni/bin}.","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[cni-net] Read network configuration \u0026{CniVersion:0.2.0 Name:nat Type:nat Ipam:{Type: Environment: AddrSpace: Subnet:10.172.204.0/8 Address: QueryInterval: Routes:[{Dst:{IP:\u003cnil\u003e Mask:\u003cnil\u003e} GW:10.172.204.178}]} DNS:{Nameservers:[] Domain: Search:[] Options:[]} OptionalFlags:{LocalRoutePortMapping:false AllowAclPortMapping:false ForceBridgeGateway:false EnableDualStack:false LoopbackDSR:false GatewayFromAdditionalRoutes:false} RuntimeConfig:{PortMappings:[] DNS:{Servers:[] Searches:[] Options:[]}} AdditionalRoutes:[] AdditionalArgs:[]}.","time":"2022-05-26T06:44:29-07:00"}
{"level":"info","msg":"[cni-net] Dual stack is disabled","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"Parsing port mappings from []","time":"2022-05-26T06:44:29-07:00"}
{"level":"info","msg":"[cni-net] Creating network.","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"hcn::HostComputeNetwork::Create id=","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"hcn::HostComputeNetwork::Create JSON: {\"Name\":\"nat\",\"Type\":\"nat\",\"MacPool\":{},\"Dns\":{},\"Ipams\":[{\"Type\":\"Static\",\"Subnets\":[{\"IpAddressPrefix\":\"10.0.0.0/8\",\"Routes\":[{\"NextHop\":\"10.172.204.178\",\"DestinationPrefix\":\"0.0.0.0/0\"}]}]}],\"SchemaVersion\":{\"Major\":2,\"Minor\":0}}","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"hcnCreateNetwork failed in Win32: Invalid JSON document string. (0x803b001b) {\"Success\":false,\"Error\":\"Invalid JSON document string. \",\"ErrorCode\":2151350299}","time":"2022-05-26T06:44:29-07:00"}
{"level":"error","msg":"[cni-net] Failed to create network, err:hcnCreateNetwork failed in Win32: Invalid JSON document string. (0x803b001b) {\"Success\":false,\"Error\":\"Invalid JSON document string. \",\"ErrorCode\":2151350299}.","time":"2022-05-26T06:44:29-07:00"}
{"level":"debug","msg":"[cni-net] Plugin stopped.","time":"2022-05-26T06:44:29-07:00"}
{"level":"error","msg":"Failed to Execute network plugin, err:hcnCreateNetwork failed in Win32: Invalid JSON document string. (0x803b001b) {\"Success\":false,\"Error\":\"Invalid JSON document string. \",\"ErrorCode\":2151350299}.\n","time":"2022-05-26T06:44:29-07:00"}
-26T06:33:55-07:00"}
Windows Edition: Windows Server 2019 Datacenter
Describe the bug
Currently, the CNI plugins simply attempt to look up the name
of HCN networks as provided in the CNI config and use them unquestionably if they already exist, regardless of their properties lining up with what the CNI config expects.
To Reproduce
Steps to reproduce the behavior:
New-HNSNetwork
ctr
, nerdctl
, or any other means which calls ADD
on the CNI plugins (in which case the plugins will automatically create the HNS network for us)name
ofc)ADD
of the plugins with the updated JSON config and notice that the new properties are completely ignored (neither causing error, nor leading to the HNS network properties being synced to what the CNI config asks for)Expected behavior
The plugins should notice that the CNI config properties mismatch with the HNS network properties and either:
CNI Version
CNI config version 0.2.0/0.3.0 (or any future ones we plan to support too)
Latest commit on CNI plugins as of time of writing: d502b1b
Additional context
FWIW, at least some of the reference CNI plugins for Linux do take steps to create OS-side resources
I will need to further check how/if they handle updates to CNI settings, however.
Describe the bug
'make test' command is throwing errors. CNI tests shouldn't be skipped in the CI pipeline.
To Reproduce
Steps to reproduce the behavior:
? github.com/Microsoft/windows-container-networking/cni [no test files]
? github.com/Microsoft/windows-container-networking/common [no test files]
? github.com/Microsoft/windows-container-networking/common/core [no test files]
? github.com/Microsoft/windows-container-networking/network [no test files]
=== RUN TestNatCmdAdd
connectivity_testing.go:320: Interface Found: [&{0 0 0}] with ip []
plugin_testing.go:49: Setup for Network Plugin of type: Nat ...
plugin_testing.go:50: [DEBUG] Using Host IP: []
plugin_testing.go:54: Error while creating supplied network: hcnCreateNetwork failed in Win32: Access is denied. (0x5)
plugin_testing.go:227: Running Unit Test
--- FAIL: TestNatCmdAdd (0.10s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0xc04557]
goroutine 18 [running]:
testing.tRunner.func1.2({0xc67f40, 0x10b4960})
C:/Program Files/Go/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
C:/Program Files/Go/src/testing/testing.go:1392 +0x39f
panic({0xc67f40, 0x10b4960})
C:/Program Files/Go/src/runtime/panic.go:838 +0x207
github.com/Microsoft/windows-container-networking/test/utilities.(*PluginUnitTest).RunUnitTest(0xc000318780, 0xc000324820)
C:/github/windows-container-networking/test/utilities/plugin_testing.go:228 +0x57
github.com/Microsoft/windows-container-networking/test/utilities.(*PluginUnitTest).RunAll(0x0?, 0x0?)
C:/github/windows-container-networking/test/utilities/plugin_testing.go:311 +0x34
github.com/Microsoft/windows-container-networking/plugins/nat_test.TestNatCmdAdd(0x7a5d3d?)
C:/github/windows-container-networking/plugins/nat/nat_windows_test.go:25 +0xcf
testing.tRunner(0xc000324820, 0xd1ffa0)
C:/Program Files/Go/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
C:/Program Files/Go/src/testing/testing.go:1486 +0x35f
FAIL github.com/Microsoft/windows-container-networking/plugins/nat 0.692s
=== RUN TestBridgeCmdAdd
connectivity_testing.go:320: Interface Found: [&{0 0 0}] with ip []
plugin_testing.go:49: Setup for Network Plugin of type: L2Bridge ...
plugin_testing.go:50: [DEBUG] Using Host IP: []
plugin_testing.go:54: Error while creating supplied network: hcnCreateNetwork failed in Win32: Access is denied. (0x5)
plugin_testing.go:227: Running Unit Test
--- FAIL: TestBridgeCmdAdd (0.06s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0x884557]
goroutine 18 [running]:
testing.tRunner.func1.2({0x8e8f40, 0xd36960})
C:/Program Files/Go/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
C:/Program Files/Go/src/testing/testing.go:1392 +0x39f
panic({0x8e8f40, 0xd36960})
C:/Program Files/Go/src/runtime/panic.go:838 +0x207
github.com/Microsoft/windows-container-networking/test/utilities.(*PluginUnitTest).RunUnitTest(0xc0000a6780, 0xc0000849c0)
C:/github/windows-container-networking/test/utilities/plugin_testing.go:228 +0x57
github.com/Microsoft/windows-container-networking/test/utilities.(*PluginUnitTest).RunAll(0x9770c7?, 0xc?)
C:/github/windows-container-networking/test/utilities/plugin_testing.go:311 +0x34
github.com/Microsoft/windows-container-networking/plugins/sdnbridge_test.TestBridgeCmdAdd(0x425d3d?)
C:/github/windows-container-networking/plugins/sdnbridge/sdnbridge_windows_test.go:30 +0x112
testing.tRunner(0xc0000849c0, 0x9a0fc8)
C:/Program Files/Go/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
C:/Program Files/Go/src/testing/testing.go:1486 +0x35f
FAIL github.com/Microsoft/windows-container-networking/plugins/sdnbridge 8.458s
=== RUN TestOverlayCmdAdd
connectivity_testing.go:320: Interface Found: [&{0 0 0}] with ip []
plugin_testing.go:49: Setup for Network Plugin of type: Overlay ...
plugin_testing.go:50: [DEBUG] Using Host IP: []
plugin_testing.go:54: Error while creating supplied network: hcnCreateNetwork failed in Win32: Access is denied. (0x5)
plugin_testing.go:227: Running Unit Test
--- FAIL: TestOverlayCmdAdd (0.09s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0x1074557]
Random Seed: 1693719592
Will run 15 of 15 specs
+++++++++++++++
Ran 15 of 15 Specs in 19.675 seconds
SUCCESS! -- 15 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestAutogenCniConf (19.68s)
PASS
ok github.com/Microsoft/windows-container-networking/scripts/autogencniconf/test (cached)
? github.com/Microsoft/windows-container-networking/test/container [no test files]
? github.com/Microsoft/windows-container-networking/test/utilities [no test files]
FAIL
Expected behavior
'make test' command should pass.
CNI Version
v0.3.0
Additional context
NA
Due to the scale at which this is being run, tests need to be added thast ensure we handle errors and incorrect input gracefully.
Moving CI/CD system to azure pipelines from as it proivides similar functionality, but is much less of a security risk and gives nicer output formatting for testing of PRs. Also corporate encouraged me to.
I see this conf in many of the examples available in this repo:
"capabilities": {
"portMappings": true,
"dnsCapabilities": true
},
This should be fixed like this:
"capabilities": {
"portMappings": true,
"dns": true
},
I create a new Windows Server Core 2022 VM and follow Get started: Prep Windows for containers - Windows Server - Containerd script
Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/Windows-Containers/Main/helpful_tools/Install-ContainerdRuntime/install-containerd-runtime.ps1" -o install-containerd-runtime.ps1
.\install-containerd-runtime.ps1
Run script log:
PS C:\> .\install-containerd-runtime.ps1
Querying status of Windows feature: Containers...
Feature Containers is already enabled.
Downloading containerd, nerdCTL, and Windows CNI binaries...
x bin/
x bin/containerd.exe
x bin/containerd-shim-runhcs-v1.exe
x bin/containerd-stress.exe
x bin/ctr.exe
Containerd binaries added to C:\Program Files\containerd
x nerdctl.exe
NerdCTL binary added to C:\Program Files\nerdctl
x sdnoverlay.exe
x sdnbridge.exe
x nat.exe
CNI plugin binaries added to C:\Program Files\containerd\cni\bin
Adding C:\Program Files\containerd, C:\Program Files\nerdctl, C:\Program Files\containerd\cni\bin to the path
Configuring the containerd service
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "C:\\ProgramData\\containerd\\root"
state = "C:\\ProgramData\\containerd\\state"
temp = ""
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "\\\\.\\pipe\\containerd-containerd"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_ca = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
[metrics]
address = ""
grpc_histogram = false
[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
deletion_threshold = 0
mutation_threshold = 100
pause_threshold = 0.02
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
device_ownership_from_security_context = false
disable_apparmor = false
disable_cgroup = false
disable_hugetlb_controller = false
disable_proc_mount = false
disable_tcp_service = true
enable_selinux = false
enable_tls_streaming = false
enable_unprivileged_icmp = false
enable_unprivileged_ports = false
ignore_image_defined_volumes = false
max_concurrent_downloads = 3
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "k8s.gcr.io/pause:3.6"
selinux_category_range = 0
stats_collect_period = 10
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
systemd_cgroup = false
tolerate_missing_hugetlb_controller = false
unset_seccomp_profile = ""
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "C:\\Program Files\\containerd\\cni\\bin"
conf_dir = "C:\\Program Files\\containerd\\cni\\conf"
conf_template = ""
ip_pref = ""
max_conf_num = 1
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runhcs-wcow-process"
disable_snapshot_annotations = false
discard_unpacked_layers = false
ignore_rdt_not_enabled_errors = false
no_pivot = false
snapshotter = "windows"
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-process]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = "io.containerd.runhcs.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runhcs-wcow-process.options]
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = "node"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "C:\\ProgramData\\containerd\\root\\opt"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.internal.v1.tracing"]
sampling_ratio = 1.0
service_name = "containerd"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.runtime.v2.task"]
platforms = ["windows/amd64", "linux/amd64"]
sched_core = false
[plugins."io.containerd.service.v1.diff-service"]
default = ["windows", "windows-lcow"]
[plugins."io.containerd.service.v1.tasks-service"]
rdt_config_file = ""
[plugins."io.containerd.tracing.processor.v1.otlp"]
endpoint = ""
insecure = false
protocol = ""
[proxy_plugins]
[stream_processors]
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
args = ["--decryption-keys-path", "C:\\Program Files\\containerd\\ocicrypt\\keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=C:\\Program Files\\containerd\\ocicrypt\\ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar"
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
args = ["--decryption-keys-path", "C:\\Program Files\\containerd\\ocicrypt\\keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=C:\\Program Files\\containerd\\ocicrypt\\ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar+gzip"
[timeouts]
"io.containerd.timeout.bolt.open" = "0s"
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
[ttrpc]
address = ""
gid = 0
uid = 0
Waiting for Containerd daemon...
Successfully connected to Containerd Daemon.
The following images are present on this machine:
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
Script complete!
Try run without --cni
is work and with --cni
is failing, error message: 「ctr: no network config found in /etc/cni/net.d: cni plugin not initialized」, see picture:
Search solution and create a nat
network. (The install-containerd-runtime.ps1
does not set this part.)
curl.exe -LO https://raw.githubusercontent.com/microsoft/SDN/master/Kubernetes/windows/hns.psm1
ipmo ./hns.psm1
$subnet="10.0.0.0/16"
$gateway="10.0.0.1"
New-HNSNetwork -Type Nat -AddressPrefix $subnet -Gateway $gateway -Name "nat"
Set up the Containerd network config using the same gateway and subnet.
@"
{
"cniVersion": "0.2.0",
"name": "nat",
"type": "nat",
"master": "Ethernet",
"ipam": {
"subnet": "$subnet",
"routes": [
{
"gateway": "$gateway"
}
]
},
"capabilities": {
"portMappings": true,
"dns": true
}
}
"@ | Set-Content "C:\Program Files\containerd\cni\conf\0-containerd-nat.conf" -Force
It will get the same error message with --cni
.
If I use this config file below :
{
"cniVersion": "0.3.0",
"ApiVersion": 2,
"name": "transparent",
"type": "sdnbridge",
"Master": "Ethernet",
"ipam": {
"type": "host-local",
"ranges": [[
{
"subnet": "10.249.5.0/24",
"rangeStart": "10.249.5.220",
"rangeEnd": "10.249.5.240",
"gateway": "10.249.5.254"
}
]]
},
"loopbackDSR": true,
"capabilities": {
"dns": true
}
}
I have this error : time="2021-10-05T22:45:21+02:00" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d2bfd729d7396dc8ffdae7d6c4ead23f5fba919e3dcb24da122320449da0a8df": no IP ranges specified"
Using config: https://paste.ubuntu.com/p/c8YmjnjSSs/
Pods don't communicate across nodes in a scenario using flannel in host-gw mode + sdnbridge . I've noticed that sdnbridge sets the endpoint default gateway to x.y.z.1 however, it seems it should be x.y.z.2 ( at least that is what a similar setup using win-bridge does: https://github.com/containernetworking/plugins/blob/master/plugins/main/windows/win-bridge/win-bridge_windows.go#L82-L83 ).
Setting gateway via IPAM is not possible, as flannel does not allow custom ipam field when using delegate: https://github.com/containernetworking/plugins/blob/master/plugins/meta/flannel/flannel.go#L206-L208
I've tested with x.y.z.2 gateway on endpoints and it seems networking works across nodes.
Describe the bug
Running many pods on a Windows node at the same will lead to failures of CNI.
E0413 13:28:59.458937 3596 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"sample-bb44b9dff-kwwnk_default(e4dae002-ca09-4514-9c47-153c5dce79fd)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"f9090b91333f019d8b8ed793bf105436188bbe879bc1ba6c8de41fd534c53cc8\\\": plugin type=\\\"azure-vnet\\\" failed (add): Failed to initialize key-value store of network plugin: timed out locking store\"" pod="default/sample-bb44b9dff-kwwnk" podUID=e4dae002-ca09-4514-9c47-153c5dce79fd
To Reproduce
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample
labels:
app: sample
spec:
replicas: 100
selector:
matchLabels:
app: sample
template:
metadata:
name: sample
labels:
app: sample
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: sample
nodeSelector:
"kubernetes.io/os": windows
containers:
- name: sample
image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp
resources:
limits:
cpu: 100m
memory: 300M
requests:
cpu: 100m
memory: 300M
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: sample
labels:
app: sample
spec:
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 80
selector:
app: sample
myVMSize="Standard_D16_v3"
az group create --name $myResourceGroup --location $myLocation
az aks create \
--resource-group $myResourceGroup \
--name $myAKSCluster \
--ssh-key-value $mySSHKeyFilePath \
--windows-admin-username $myWindowsUserName \
--windows-admin-password $myWindowsPassword \
--network-plugin azure \
--vm-set-type VirtualMachineScaleSets \
--node-count 1
az aks nodepool add \
--resource-group $myResourceGroup \
--cluster-name $myAKSCluster \
--name $myWindowsNodePool \
--os-type Windows \
--os-sku Windows2019 \
--node-vm-size $myVMSize \
--node-count 1 \
--max-pods 105
az aks get-credentials -g $myResourceGroup -n $myAKSCluster --overwrite-existing
kubectl apply -f https://raw.githubusercontent.com/ShiqianTao/k8s-public-yaml/main/Windows-Stress-Test/aspnetapp.stress.test.yaml
kubectl get nodes -owide
kubectl get pods -owide
Expected behavior
Node has sufficient resources and should run these pods without the above issues.
I want to use sdnbridge with transparent mode and an external DHCP.
Below my cni configuration file :
{
"cniVersion": "0.2.0",
"name": "transparent",
"apiVersion":2,
"type": "sdnbridge",
"Master": "Ethernet"
}
I created a pod with containerd and crictl with the command line below:
When I execute the command to create the pod my network is create like this :
ActivityId : 0E550024-EA46-4A66-A049-2C9AED114943
AdditionalParams :
CurrentEndpointCount : 0
Extensions : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Plateforme de filtrage Microsoft Windows}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=False; Name=Microsoft Azure VFP Switch Extension},
@{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Capture NDIS Microsoft}}
Flags : 0
Health : @{LastErrorCode=0; LastUpdateTime=132775553966352126}
ID : 60B2C11D-381C-4F84-91F1-DB1FB2BD9447
IPv6 : False
InterfaceConstraint : @{InterfaceGuid=00000000-0000-0000-0000-000000000000}
LayeredOn : 83B8B8AA-F92D-4451-857E-F0FE36698358
MacPools : {@{EndMacAddress=00-15-5D-15-CF-FF; StartMacAddress=00-15-5D-15-C0-00}}
MaxConcurrentEndpoints : 0
Name : transparent
Policies : {}
Resources : @{AdditionalParams=; AllocationOrder=0; Health=; ID=0E550024-EA46-4A66-A049-2C9AED114943; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=6267B624-6EF8-42EE-8DF9-5EE3044E8707}
State : 1
TotalEndpoints : 0
Type : Transparent
Version : 38654705669
But my pods cannot connect to this network.
The command exit with this error :
time="2021-10-01T11:50:00+02:00" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7f493333cfbfe44d82a7e5b87f2050558d25885d1266cc69822c2a5828766c30": netplugin failed: "{\"level\":\"debug\",\"msg\":\"[cni-net] Plugin wcn-net version .\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[net] Network interface: {Index:19 MTU:1500 Name:Ethernet HardwareAddr:c4:37:72:08:35:6f Flags:up|broadcast|multicast} with IP addresses: [fe80::b1df:fcb3:e3fb:cc43/64 10.249.5.50/24]\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[net] Network interface: {Index:23 MTU:1500 Name:Ethernet 3 HardwareAddr:c4:37:72:f9:92:22 Flags:up|broadcast|multicast} with IP addresses: [fe80::f40e:d97e:3699:d3d5/64 10.249.3.249/24]\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[net] Network interface: {Index:1 MTU:-1 Name:Loopback Pseudo-Interface 1 HardwareAddr: Flags:up|loopback|multicast} with IP addresses: [::1/128 127.0.0.1/8]\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[cni-net] Plugin started.\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[cni-net] Processing ADD command with args {ContainerID:7f493333cfbfe44d82a7e5b87f2050558d25885d1266cc69822c2a5828766c30 Netns:057764d1-434b-4c86-8c7e-6329e33bce53 IfName:eth0 Args:K8S_POD_NAMESPACE=default;K8S_POD_NAME=Windows;K8S_POD_INFRA_CONTAINER_ID=7f493333cfbfe44d82a7e5b87f2050558d25885d1266cc69822c2a5828766c30;K8S_POD_UID=hdishd83djaidwnduwk28bcsb;IgnoreUnknown=1 Path:C:/opt/cni/bin}.\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"[cni-net] Read network configuration \\u0026{CniVersion:0.2.0 Name:transparent Type:sdnbridge Ipam:{Type: Environment: AddrSpace: Subnet: Address: QueryInterval: Routes:[]} DNS:{Nameservers:[] Domain: Search:[] Options:[]} OptionalFlags:{LocalRoutePortMapping:false AllowAclPortMapping:false} RuntimeConfig:{PortMappings:[] DNS:{Servers:[] Searches:[] Options:[]}} AdditionalArgs:[]}.\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"Parsing port mappings from []\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"info\",\"msg\":\"[cni-net] Creating network.\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"hcn::HostComputeNetwork::Create id=\",\"time\":\"2021-10-01T11:49:56+02:00\"}\n{\"level\":\"debug\",\"msg\":\"hcn::HostComputeNetwork::Create JSON: {\\\"Name\\\":\\\"transparent\\\",\\\"Type\\\":\\\"transparent\\\",\\\"MacPool\\\":{},\\\"Dns\\\":{},\\\"SchemaVersion\\\":{\\\"Major\\\":2,\\\"Minor\\\":0}}\",\"time\":\"2021-10-01T11:49:56+02:00\"}\npanic: runtime error: index out of range\n\ngoroutine 1 [running]:\ngithub.com/Microsoft/windows-container-networking/network.GetNetworkInfoFromHostComputeNetwork(0xc0000961e0, 0xc0000961e0)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/network/network.go:93 +0x612\ngithub.com/Microsoft/windows-container-networking/network.(*networkManager).CreateNetwork(0xc000098060, 0xc000052dd0, 0x0, 0x0, 0x0)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/network/manager.go:68 +0xe2\ngithub.com/Microsoft/windows-container-networking/common/core.getOrCreateNetwork(0xc000044620, 0xc000052dd0, 0xc00007c180, 0x40, 0x0, 0x0)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/common/core/network.go:241 +0xe2\ngithub.com/Microsoft/windows-container-networking/common/core.(*netPlugin).Add(0xc000044620, 0xc00009c310, 0x0, 0x0)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/common/core/network.go:127 +0x57e\ngithub.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc000077e98, 0xc00009c310, 0x5cc600, 0xc000098690, 0xc000077e80, 0x0, 0x28)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:185 +0x260\ngithub.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc000077e98, 0xc000077e80, 0x0, 0xc000077e68, 0x5cc600, 0xc000098690, 0x59d28d, 0x11, 0x53df1a)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:221 +0x550\ngithub.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:286\ngithub.com/Microsoft/windows-container-networking/cni.(*Plugin).Execute(0xc000078028, 0x5cc580, 0xc000044620, 0x0, 0xc000078020)\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/cni/plugin.go:49 +0x225\ngithub.com/Microsoft/windows-container-networking/common/core.Core()\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/common/core/core.go:47 +0x29c\nmain.main()\n\t/home/nagiesek/repo/gopath/src/github.com/Microsoft/windows-container-networking/plugins/sdnbridge/sdnbridge_windows.go:16 +0x27\n"".
I can the an error like index out of range, I think it's because my network doesn't have an IP range.
Currently, the network.NetworkInfo.Type
value passed through to HCS (which must be one of network.NetworkType
, i.e. hcn.NetworkType
values) is simply the network name. This is pretty surprising, and also breaks the CNI spec if you want multiple networks of the same type, as each plugin or plugin-list must have a host-unique name.
It also means none of the example configs work, as they will all complain with something like the below:
hcnCreateNetwork failed in Win32: Invalid JSON document string. (0x803b001b) {"Success":false,"Error":"Invalid JSON document string. {{Type,UnknownEnumValue}}","ErrorCode":2151350299}
This appears to be deliberate, but it's not very user-friendly.
The obvious approach seems to me to have each of the executables specify their own NetworkType
when they call into core.Core
, since the network type used appears to be the only intended distinction between the executables. I started trying to plumb this approach through, but got lost in the abstraction layers, trying to work out how to carry that value without needing to put a hard-coded [string]NetworkType
map somewhere in the system, or pull a bunch of stuff into the core
package.
Another option would be to expose the network type as an additional parameter, and just produce one executable that covers all the use-cases, which is really what we have now anyway, but be intentional about it.
Another option is to have the executables named to match the NetworkType
, and then we can just use config.Type
instead of config.Name
, and there's little-to-no surprise, as long as no one renames their own executable. Again, this is really just one executable compiled three times, so not really a great approach, I feel.
So I'm interested in guidance and suggestions. This has come up while working on porting BuildKit to Windows, as the only network layer it supports natively is CNI, so I needed minimally-functional CNI with NAT, ala the nat
CNI plugin I found here. I used the lattermost option as my workaround, but without renaming binaries it only works for nat
, not sdnoverlay
or sdnbridge
, as those are not NetworkType
values.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.