openshift / node-feature-discovery Goto Github PK
View Code? Open in Web Editor NEWNode feature discovery, detects the available hardware features and configuration in a cluster.
License: Apache License 2.0
Node feature discovery, detects the available hardware features and configuration in a cluster.
License: Apache License 2.0
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.17
release-4.18
For more information, see the branching documentation.
What happened:
Queried kernel parameters in NFD, got mostly good results, but some errors:
2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true
What you expected to happen:
Get complete kernel info from NFD kernel query without errors.
How to reproduce it (as minimally and precisely as possible):
Query kernel parameters in NFD on an IBM Mainframe cluster
Anything else we need to know?:
Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5
Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install
There is a virtual PCI bus on Z systems, but the kernel needs to be recent, and tooled to expose it/use it.
What happened:
No PCI features or facts about the virtual PCI bus on system Z are exposed
What you expected to happen:
NFD should at least tell us the status of the Z node's virtual PCI bus.
How to reproduce it (as minimally and precisely as possible):
Environment:
kubectl version
): browsercat /etc/os-release
): RHEL 8.3What happened:
Ran NFD Operator instantiation on an OCP 4.7 KVM cluster under RHEL 8.3
result:
[root@bastion NFD-test]# oc logs !$
oc logs nfd-worker-87fgl
2021/01/06 15:28:31 Node Feature Discovery Worker 1.15
2021/01/06 15:28:31 NodeName: 'worker-0.pok-242.ocptest.pok.stglabs.ibm.com'
INFO: 2021/01/06 15:28:31 parsed scheme: ""
INFO: 2021/01/06 15:28:31 scheme "" not registered, fallback to default scheme
INFO: 2021/01/06 15:28:31 ccResolverWrapper: sending update to cc: {[{172.30.248.112:12000 0 <nil>}] <nil>}
INFO: 2021/01/06 15:28:31 ClientConn switching balancer to "pick_first"
2021/01/06 15:28:31 Configuration successfully loaded from "/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
2021/01/06 15:28:31 cpu-cpuid.GS = true
2021/01/06 15:28:31 cpu-cpuid.DFP = true
2021/01/06 15:28:31 cpu-cpuid.MSA = true
2021/01/06 15:28:31 cpu-cpuid.ETF3EH = true
2021/01/06 15:28:31 cpu-cpuid.VXD = true
2021/01/06 15:28:31 cpu-cpuid.VXE2 = true
2021/01/06 15:28:31 cpu-cpuid.ESAN3 = true
2021/01/06 15:28:31 cpu-cpuid.STFLE = true
2021/01/06 15:28:31 cpu-cpuid.EDAT = true
2021/01/06 15:28:31 cpu-cpuid.DFLT = true
2021/01/06 15:28:31 cpu-cpuid.VXE = true
2021/01/06 15:28:31 cpu-cpuid.ZARCH = true
2021/01/06 15:28:31 cpu-cpuid.VX = true
2021/01/06 15:28:31 cpu-cpuid.SORT = true
2021/01/06 15:28:31 cpu-cpuid.VXP = true
2021/01/06 15:28:31 cpu-cpuid.HIGHGPRS = true
2021/01/06 15:28:31 cpu-cpuid.LDISP = true
2021/01/06 15:28:31 cpu-cpuid.TE = true
2021/01/06 15:28:31 cpu-cpuid.EIMM = true
2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true
2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 system-os_release.VERSION_ID.minor = 7
2021/01/06 15:28:31 system-os_release.ID = rhcos
2021/01/06 15:28:31 system-os_release.VERSION_ID = 4.7
2021/01/06 15:28:31 system-os_release.VERSION_ID.major = 4
2021/01/06 15:28:31 INFO: Custom features: [{Name:rdma.capable MatchOn:[{PciID:0xc0000b0460 UsbID:<nil> LoadedKMod:<nil>}]} {Name:rdma.available MatchOn:[{PciID:<nil> UsbID:<nil> LoadedKMod:0xc0000bf620}]}]
2021/01/06 15:28:31 Sending labeling request to nfd-master
There are no s390x memory features discovered.
What you expected to happen:
Discover s389x memory features
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5
uname -a
): RHEL 8.3The Dockerfile hardcodes the GRPC_HEALTH_PROBE_VERSION at 0.3.1 while (upstream)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.4.6] has 0.4.6.
(0.3.3)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.3.3] fixed non-exploitable security issues.
What happened:
Queried Network features on NFD installed on s390x node. Got:
2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory
What you expected to happen:
Obtain z/VM and/or zKVM specific virtual network device features
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5
Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install
What would you like to be added: Support for taints (as with -enable-taints switch in upstream)
Why is this needed: To dedicate nodes with specific features only to particular application workload. Otherwise, we cannot mix different types of workload in one cluster.
What happened:
s390x Storage Features not discovered by NFD
What you expected to happen:
Expect to see how the non-volatile storage is configured on the node when query node features using NFD Operator
Environment:
kubectl version
): browsercat /etc/os-release
): RHEL 8.3What would you like to be added:
GPU discovery on nodes
Why is this needed:
workloads which would utilize GPU would benefit from scheduling affinity to nodes which have GPUs
Upstream support for IBM Power LE and IBM Z architectures just landed in kubernetes-sigs/node-feature-discovery#262. In order to support these in OpenShift 4.2+, a merge from upstream master is needed.
Status
The Dockerfile downloads the amd64 version of the grpc-health-probe.
RFE
Detect the arch with either "uname -m" or "arch" and download the (arch-specific)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/] version. Caveat: both tools output x86_64 but the grpc-health-probe binaries are named with amd64, etc, so some shell conditionals will be needed.
What happened:
Opened https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md
What you expected to happen:
Expect to see CPU feature documentation for s390x architecture
How to reproduce it (as minimally and precisely as possible):
Open https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md
Anything else we need to know?:
The s390x CPU features are defined in https://github.com/openshift/node-feature-discovery/blob/master/source/internal/cpuidutils/cpuid_s390x.go
Environment:
kubectl version
): browsercat /etc/os-release
): RHEL 8.3uname -a
): N/AThe following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.6
Contact the Test Platform or Automated Release teams for more information.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.