GithubHelp home page GithubHelp logo

openshift / node-feature-discovery Goto Github PK

View Code? Open in Web Editor NEW
14.0 12.0 25.0 135.26 MB

Node feature discovery, detects the available hardware features and configuration in a cluster.

License: Apache License 2.0

Dockerfile 0.09% Makefile 0.75% Shell 2.12% Go 96.79% Assembly 0.11% Python 0.15%

node-feature-discovery's Introduction

Node Feature Discovery

Go Report Card Prow Build Prow E2E-Test

Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!

See our Documentation for detailed instructions and reference

Quick-start – the short-short version

$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.14.3
  namespace/node-feature-discovery created
...

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.6.0/nfd-worker-daemonset.yaml.template
  daemonset.apps/nfd-worker created

kubectl -n node-feature-discovery get all
  NAME                              READY   STATUS    RESTARTS   AGE
  pod/nfd-master-555458dbbc-sxg6w   1/1     Running   0          56s
  pod/nfd-worker-mjg9f              1/1     Running   0          17s
...

$ kubectl get no -o json | jq '.items[].metadata.labels'
  {
    "kubernetes.io/arch": "amd64",
    "kubernetes.io/os": "linux",
    "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
    "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...

Building NFD operand for ARM locally

Currently the build process must be run on the ARM server. In addition, before running the build process Dockerfile.arm must be copied into Dockerfile

node-feature-discovery's People

Contributors

adrianchiris avatar arangogutierrez avatar balajismaniam avatar chr15p avatar connordoyle avatar ffromani avatar fidencio avatar fmuyassarov avatar jjacobelli avatar jlojosnegros avatar jschintag avatar k8s-ci-robot avatar marquiz avatar mbssaiakhil avatar mythi avatar nfd-merge-bot avatar okartau avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar piotrprokop avatar psap-ci-robot avatar spiffxp avatar swatisehgal avatar tal-or avatar tessaio avatar uniemimu avatar vaibhav2107 avatar yevgeny-shnaidman avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-feature-discovery's Issues

s390x Network features not discovered

What happened:

Queried Network features on NFD installed on s390x node. Got:

2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory

What you expected to happen:

Obtain z/VM and/or zKVM specific virtual network device features

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install
  • Network plugin and version (if this is a network-related bug):
  • Others:

s390x CPU features not documented in 'quick start'

What happened:

Opened https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md

What you expected to happen:

Expect to see CPU feature documentation for s390x architecture

How to reproduce it (as minimally and precisely as possible):

Open https://github.com/openshift/node-feature-discovery/blob/master/docs/get-started/features.md

Anything else we need to know?:

The s390x CPU features are defined in https://github.com/openshift/node-feature-discovery/blob/master/source/internal/cpuidutils/cpuid_s390x.go

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Network plugin and version (if this is a network-related bug): N/A
  • Others:

s390x kernel feature somewhat incomplete

What happened:

Queried kernel parameters in NFD, got mostly good results, but some errors:

2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true

What you expected to happen:

Get complete kernel info from NFD kernel query without errors.

How to reproduce it (as minimally and precisely as possible):

Query kernel parameters in NFD on an IBM Mainframe cluster

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

Kernel (e.g. uname -a): RHEL 8.3
Install tools: KVM UPI Install

  • Network plugin and version (if this is a network-related bug):
  • Others:

RFE: download arch-specific grpc_health_probe

Status
The Dockerfile downloads the amd64 version of the grpc-health-probe.

RFE
Detect the arch with either "uname -m" or "arch" and download the (arch-specific)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/] version. Caveat: both tools output x86_64 but the grpc-health-probe binaries are named with amd64, etc, so some shell conditionals will be needed.

s390x memory management feature discovery not implemented

What happened:

Ran NFD Operator instantiation on an OCP 4.7 KVM cluster under RHEL 8.3

result:

[root@bastion NFD-test]# oc logs !$
oc logs nfd-worker-87fgl
2021/01/06 15:28:31 Node Feature Discovery Worker 1.15
2021/01/06 15:28:31 NodeName: 'worker-0.pok-242.ocptest.pok.stglabs.ibm.com'
INFO: 2021/01/06 15:28:31 parsed scheme: ""
INFO: 2021/01/06 15:28:31 scheme "" not registered, fallback to default scheme
INFO: 2021/01/06 15:28:31 ccResolverWrapper: sending update to cc: {[{172.30.248.112:12000 0  <nil>}] <nil>}
INFO: 2021/01/06 15:28:31 ClientConn switching balancer to "pick_first"
2021/01/06 15:28:31 Configuration successfully loaded from "/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
2021/01/06 15:28:31 cpu-cpuid.GS = true
2021/01/06 15:28:31 cpu-cpuid.DFP = true
2021/01/06 15:28:31 cpu-cpuid.MSA = true
2021/01/06 15:28:31 cpu-cpuid.ETF3EH = true
2021/01/06 15:28:31 cpu-cpuid.VXD = true
2021/01/06 15:28:31 cpu-cpuid.VXE2 = true
2021/01/06 15:28:31 cpu-cpuid.ESAN3 = true
2021/01/06 15:28:31 cpu-cpuid.STFLE = true
2021/01/06 15:28:31 cpu-cpuid.EDAT = true
2021/01/06 15:28:31 cpu-cpuid.DFLT = true
2021/01/06 15:28:31 cpu-cpuid.VXE = true
2021/01/06 15:28:31 cpu-cpuid.ZARCH = true
2021/01/06 15:28:31 cpu-cpuid.VX = true
2021/01/06 15:28:31 cpu-cpuid.SORT = true
2021/01/06 15:28:31 cpu-cpuid.VXP = true
2021/01/06 15:28:31 cpu-cpuid.HIGHGPRS = true
2021/01/06 15:28:31 cpu-cpuid.LDISP = true
2021/01/06 15:28:31 cpu-cpuid.TE = true
2021/01/06 15:28:31 cpu-cpuid.EIMM = true
2021/01/06 15:28:31 Failed to read /proc/config.gz: open /proc/config.gz: no such file or directory
2021/01/06 15:28:31 ERROR: Failed to read kconfig: open /host-boot/config-4.18.0-240.8.1.el8_3.s390x: no such file or directory
2021/01/06 15:28:31 kernel-version.full = 4.18.0-240.8.1.el8_3.s390x
2021/01/06 15:28:31 kernel-version.major = 4
2021/01/06 15:28:31 kernel-version.minor = 18
2021/01/06 15:28:31 kernel-version.revision = 0
2021/01/06 15:28:31 kernel-selinux.enabled = true
2021/01/06 15:28:31 SR-IOV not supported for network interface: enc1: open /host-sys/class/net/enc1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: tun0: open /host-sys/class/net/tun0/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0bb2fb68: open /host-sys/class/net/veth0bb2fb68/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth0e40ae3d: open /host-sys/class/net/veth0e40ae3d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth1d59273b: open /host-sys/class/net/veth1d59273b/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth353fa8f1: open /host-sys/class/net/veth353fa8f1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth543eb339: open /host-sys/class/net/veth543eb339/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth6cba9b9e: open /host-sys/class/net/veth6cba9b9e/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth7a56ad17: open /host-sys/class/net/veth7a56ad17/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth868c78d4: open /host-sys/class/net/veth868c78d4/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth89453813: open /host-sys/class/net/veth89453813/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: veth9106a257: open /host-sys/class/net/veth9106a257/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha2cd6200: open /host-sys/class/net/vetha2cd6200/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vetha73c05b1: open /host-sys/class/net/vetha73c05b1/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethd0c07305: open /host-sys/class/net/vethd0c07305/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe3398524: open /host-sys/class/net/vethe3398524/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vethe935563d: open /host-sys/class/net/vethe935563d/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 SR-IOV not supported for network interface: vxlan_sys_4789: open /host-sys/class/net/vxlan_sys_4789/device/sriov_totalvfs: no such file or directory
2021/01/06 15:28:31 system-os_release.VERSION_ID.minor = 7
2021/01/06 15:28:31 system-os_release.ID = rhcos
2021/01/06 15:28:31 system-os_release.VERSION_ID = 4.7
2021/01/06 15:28:31 system-os_release.VERSION_ID.major = 4
2021/01/06 15:28:31 INFO: Custom features: [{Name:rdma.capable MatchOn:[{PciID:0xc0000b0460 UsbID:<nil> LoadedKMod:<nil>}]} {Name:rdma.available MatchOn:[{PciID:<nil> UsbID:<nil> LoadedKMod:0xc0000bf620}]}]
2021/01/06 15:28:31 Sending labeling request to nfd-master

There are no s390x memory features discovered.

What you expected to happen:

Discover s389x memory features

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:
[root@bastion ~]# oc version
Client Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Server Version: 4.7.0-0.nightly-s390x-2020-12-21-160105
Kubernetes Version: v1.20.0+87544c5

  • Kernel (e.g. uname -a): RHEL 8.3
  • Install tools: KVM UPI Install
  • Network plugin and version (if this is a network-related bug):
  • Others:

GRPC_HEALTH_PROBE_VERSION is outdated

The Dockerfile hardcodes the GRPC_HEALTH_PROBE_VERSION at 0.3.1 while (upstream)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.4.6] has 0.4.6.

(0.3.3)[https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.3.3] fixed non-exploitable security issues.

GPU discovery

What would you like to be added:

GPU discovery on nodes

Why is this needed:

workloads which would utilize GPU would benefit from scheduling affinity to nodes which have GPUs

Please add taints and tolerations to the nfd-master and workers.

What would you like to be added: Support for taints (as with -enable-taints switch in upstream)

Why is this needed: To dedicate nodes with specific features only to particular application workload. Otherwise, we cannot mix different types of workload in one cluster.

s390x Storage Features not discovered by NFD

What happened:
s390x Storage Features not discovered by NFD

What you expected to happen:

Expect to see how the non-volatile storage is configured on the node when query node features using NFD Operator

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3

s390x virtual PCI bus not discovered in PCI feature discovery

There is a virtual PCI bus on Z systems, but the kernel needs to be recent, and tooled to expose it/use it.

What happened:

No PCI features or facts about the virtual PCI bus on system Z are exposed

What you expected to happen:

NFD should at least tell us the status of the Z node's virtual PCI bus.

How to reproduce it (as minimally and precisely as possible):

  1. Install and run NFD Operator on OCP on System Z, either KVM or z/VM.
  2. Observe that no PCI features are discovered

Environment:

  • Kubernetes version (use kubectl version): browser
  • Cloud provider or hardware configuration: Z15 KVM Cluster
  • OS (e.g: cat /etc/os-release): RHEL 8.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.