GithubHelp home page GithubHelp logo

openshift / node-feature-discovery Goto Github PK

View Code? Open in Web Editor NEW
14.0 12.0 25.0 135.26 MB

Node feature discovery, detects the available hardware features and configuration in a cluster.

License: Apache License 2.0

Dockerfile 0.09% Makefile 0.75% Shell 2.12% Go 96.79% Assembly 0.11% Python 0.15%

node-feature-discovery's Issues

s390x kernel feature somewhat incomplete

What happened:

Queried kernel parameters in NFD, got mostly good results, but some errors:

2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true

What you expected to happen:

Get complete kernel info from NFD kernel query without errors.

How to reproduce it (as minimally and precisely as possible):

Query kernel parameters in NFD on an IBM Mainframe cluster

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install

  • Network plugin and version (if this is a network-related bug):
  • Others:

s390x virtual PCI bus not discovered in PCI feature discovery

There is a virtual PCI bus on Z systems, but the kernel needs to be recent, and tooled to expose it/use it.

What happened:

No PCI features or facts about the virtual PCI bus on system Z are exposed

What you expected to happen:

NFD should at least tell us the status of the Z node's virtual PCI bus.

How to reproduce it (as minimally and precisely as possible):

  1. Install and run NFD Operator on OCP on System Z, either KVM or z/VM.
  2. Observe that no PCI features are discovered

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3

s390x memory management feature discovery not implemented

What happened:

Ran NFD Operator instantiation on an OCP 4.7 KVM cluster under RHEL 8.3

result:

[root@bastion NFD-test]# oc logs !$
oc logs nfd-worker-87fgl
2021/01/06 15:28:31 Node Feature Discovery Worker 1.15
2021/01/06 15:28:31 NodeName: 'worker-0.pok-242.ocptest.pok.stglabs.ibm.com'
INFO: 2021/01/06 15:28:31 parsed scheme: ""
INFO: 2021/01/06 15:28:31 scheme "" not registered, fallback to default scheme
INFO: 2021/01/06 15:28:31 ccResolverWrapper: sending update to cc: {[{172.30.248.112:12000 0  <nil>}] <nil>}
INFO: 2021/01/06 15:28:31 ClientConn switching balancer to "pick_first"
2021/01/06 15:28:31 Configuration successfully loaded from "/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
2021/01/06 15:28:31 cpu-cpuid.GS = true
2021/01/06 15:28:31 cpu-cpuid.DFP = true
2021/01/06 15:28:31 cpu-cpuid.MSA = true
2021/01/06 15:28:31 cpu-cpuid.ETF3EH = true
2021/01/06 15:28:31 cpu-cpuid.VXD = true
2021/01/06 15:28:31 cpu-cpuid.VXE2 = true
2021/01/06 15:28:31 cpu-cpuid.ESAN3 = true
2021/01/06 15:28:31 cpu-cpuid.STFLE = true
2021/01/06 15:28:31 cpu-cpuid.EDAT = true
2021/01/06 15:28:31 cpu-cpuid.DFLT = true
2021/01/06 15:28:31 cpu-cpuid.VXE = true
2021/01/06 15:28:31 cpu-cpuid.ZARCH = true
2021/01/06 15:28:31 cpu-cpuid.VX = true
2021/01/06 15:28:31 cpu-cpuid.SORT = true
2021/01/06 15:28:31 cpu-cpuid.VXP = true
2021/01/06 15:28:31 cpu-cpuid.HIGHGPRS = true
2021/01/06 15:28:31 cpu-cpuid.LDISP = true
2021/01/06 15:28:31 cpu-cpuid.TE = true
2021/01/06 15:28:31 cpu-cpuid.EIMM = true
2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true
2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 system-os_release.VERSION_ID.minor = 7
2021/01/06 15:28:31 system-os_release.ID = rhcos
2021/01/06 15:28:31 system-os_release.VERSION_ID = 4.7
2021/01/06 15:28:31 system-os_release.VERSION_ID.major = 4
2021/01/06 15:28:31 INFO: Custom features: [{Name:rdma.capable MatchOn:[{PciID:0xc0000b0460 UsbID:<nil> LoadedKMod:<nil>}]} {Name:rdma.available MatchOn:[{PciID:<nil> UsbID:<nil> LoadedKMod:0xc0000bf620}]}]
2021/01/06 15:28:31 Sending labeling request to nfd-master

There are no s390x memory features discovered.

What you expected to happen:

Discover s389x memory features

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

  • Kernel (e.g. uname -a): RHEL 8.3
  • Install tools: KVM UPI Install
  • Network plugin and version (if this is a network-related bug):
  • Others:

GRPC_HEALTH_PROBE_VERSION is outdated

The Dockerfile hardcodes the GRPC_HEALTH_PROBE_VERSION at 0.3.1 while (upstream)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.4.6] has 0.4.6.

(0.3.3)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.3.3] fixed non-exploitable security issues.

s390x Network features not discovered

What happened:

Queried Network features on NFD installed on s390x node. Got:

2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory

What you expected to happen:

Obtain z/VM and/or zKVM specific virtual network device features

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install
  • Network plugin and version (if this is a network-related bug):
  • Others:

Please add taints and tolerations to the nfd-master and workers.

What would you like to be added: Support for taints (as with -enable-taints switch in upstream)

Why is this needed: To dedicate nodes with specific features only to particular application workload. Otherwise, we cannot mix different types of workload in one cluster.

s390x Storage Features not discovered by NFD

What happened:
s390x Storage Features not discovered by NFD

What you expected to happen:

Expect to see how the non-volatile storage is configured on the node when query node features using NFD Operator

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3

GPU discovery

What would you like to be added:

GPU discovery on nodes

Why is this needed:

workloads which would utilize GPU would benefit from scheduling affinity to nodes which have GPUs

RFE: download arch-specific grpc_health_probe

Status
The Dockerfile downloads the amd64 version of the grpc-health-probe.

RFE
Detect the arch with either "uname -m" or "arch" and download the (arch-specific)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/] version. Caveat: both tools output x86_64 but the grpc-health-probe binaries are named with amd64, etc, so some shell conditionals will be needed.

s390x CPU features not documented in 'quick start'

What happened:

Opened https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md

What you expected to happen:

Expect to see CPU feature documentation for s390x architecture

How to reproduce it (as minimally and precisely as possible):

Open https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md

Anything else we need to know?:

The s390x CPU features are defined in https://github.com/openshift/node-feature-discovery/blob/master/source/internal/cpuidutils/cpuid_s390x.go

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Network plugin and version (if this is a network-related bug): N/A
  • Others:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.