GithubHelp home page GithubHelp logo

d34dc3n73r / netdata-glibc Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 4.0 91 KB

netdata with glibc package for use with nvidia-docker2

License: GNU General Public License v3.0

Dockerfile 100.00%
netdata nvidia-docker nvidia-container-toolkit docker

netdata-glibc's Introduction

⚠️ DEPRECATED via Netdata v1.43.0 ⚠️

Netdata can now utilize GPUs in the native image since it's now based on Debian. This image is no longer needed. Examples using the netdata/netdata image below

Docker & nvidia-container-toolkit

docker run -d --name=netdata \
  -p 19999:19999 \
  -v <YOUR DOCKER CONFIGS>/netdata/config:/etc/netdata \
  -v netdatalib:/var/lib/netdata \
  -v netdatacache:/var/lib/cache/netdata \
  -v /etc/passwd:/host/etc/passwd:ro \
  -v /etc/group:/host/etc/group:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /etc/os-release:/host/etc/os-release:ro \
  -e PGID=<HOST_DOCKER_PGID> \
  -e DO_NOT_TRACK= \
  -e NETDATA_CLAIM_TOKEN= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker \
  -e NETDATA_CLAIM_URL=https://app.netdata.cloud \
  -e NETDATA_CLAIM_ROOMS= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker \
  --gpus all \
  --restart unless-stopped \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  netdata/netdata:stable

Docker Compose

version: '3.8'
services:
  netdata:
    image: netdata/netdata:stable
    container_name: netdata
    hostname: netdata.example.com
    ports:
      - 19999:19999
    restart: unless-stopped
    depends_on:
      - proxy
    cap_add:
      - SYS_PTRACE
    security_opt:
      - apparmor:unconfined
    environment:
      - DOCKER_HOST=proxy:2375
      - NETDATA_CLAIM_TOKEN= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker
      - NETDATA_CLAIM_URL=https://app.netdata.cloud
      - NETDATA_CLAIM_ROOMS= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker
    volumes:
      - <YOUR DOCKER CONFIGS>/netdata/config:/etc/netdata
      - netdatalib:/var/lib/netdata
      - netdatacache:/var/lib/cache/netdata
      - /etc/passwd:/host/etc/passwd:ro
      - /etc/group:/host/etc/group:ro
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
      - /var/log/journal:/var/log/journal:ro
      - /run/systemd/private:/run/systemd/private:ro
      - /mnt/media:/mnt/media:ro
    labels:
      - swag=enable
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  proxy:
    container_name: proxy
    image: tecnativa/docker-socket-proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - CONTAINERS=1

volumes:
  netdatalib:
  netdatacache:

netdata-glibc

This is an automated build of netdata with glibc package for use with nvidia-container-toolkit. Also available in Unraid Community Applications.

Netdata with Nvidia GPU monitoring in a container. This image was created due to netdata/netdata using Alpine, a musl distribution, as a base. Nvidia drivers are only compatible with glibc distributions. This image uses netdata/netdata as a base and adds a GNU C library to run binaries linked against glibc. This image does not contain nvidia-smi, but is compatible with nvidia-container-toolkit and the Unraid Nvidia Plugin.

nvidia-smi_netdata

Docker & nvidia-container-toolkit

docker run -d --name=netdata \
  -p 19999:19999 \
  -v <YOUR DOCKER CONFIGS>/netdata/config:/etc/netdata \
  -v <YOUR DOCKER CONFIGS>/netdata/lib:/var/lib/netdata \
  -v <YOUR DOCKER CONFIGS>/netdata/cache:/var/cache/netdata \
  -v /etc/passwd:/host/etc/passwd:ro \
  -v /etc/group:/host/etc/group:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /etc/os-release:/host/etc/os-release:ro \
  -e PGID=<HOST_DOCKER_PGID> \
  -e DO_NOT_TRACK= \
  -e NETDATA_CLAIM_TOKEN= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker \
  -e NETDATA_CLAIM_URL=https://app.netdata.cloud \
  -e NETDATA_CLAIM_ROOMS= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker \
  --gpus all \
  --restart unless-stopped \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  d34dc3n73r/netdata-glibc

Docker Compose

version: '3.8'
services:
  netdata:
    image: d34dc3n73r/netdata-glibc
    container_name: netdata
    hostname: example.com # set to fqdn of host
    ports:
      - 19999:19999
    restart: unless-stopped
    depends_on:
      - proxy
    cap_add:
      - SYS_PTRACE
    security_opt:
      - apparmor:unconfined
    environment:
      - DOCKER_HOST=proxy:2375
      - NETDATA_CLAIM_TOKEN= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker
      - NETDATA_CLAIM_URL=https://app.netdata.cloud
      - NETDATA_CLAIM_ROOMS= # See https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-docker
    volumes:
      - <YOUR DOCKER CONFIGS>/netdata/config:/etc/netdata
      - <YOUR DOCKER CONFIGS>/netdata/lib:/var/lib/netdata
      - <YOUR DOCKER CONFIGS>/netdata/cache:/var/lib/cache/netdata
      - /etc/passwd:/host/etc/passwd:ro
      - /etc/group:/host/etc/group:ro
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  proxy:
    container_name: proxy
    image: tecnativa/docker-socket-proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - CONTAINERS=1

Available Tags

  • d34dc3n73r/netdata-glibc:stable
    • built from netdata/netdata:stable and updated with new netdata official releases
  • d34dc3n73r/netdata-glibc:latest
    • an automated nightly build using netdata/netdata:latest

Prerequisites

  • Nvidia container toolkit installed on the host system
  • Nvidia drivers installed on the host system

Container Name Resolution

docker run

  • Use the host docker PGID environment variable. To get this value run grep docker /etc/group | cut -d ':' -f 3 on the host system.

docker-compose

  • Container name resolution no longer requires the host docker PGID and mounting docker.sock. Instead this is handled by HAProxy so that connections are restricted to read-only access. For more information check out the Netdata Docker Installation Page.

Override Directory

Netdata now has override support built into their docker images. See Configure Agent Containers for more information. Vi is the default editor, but I like nano so this image includes nano. Use it with ./edit-config --editor nano <config filename>.

Notes

  • Netdata collects anonymous statistics. If you wish to opt-out, set the envionrment varible DO_NOT_TRACK=1.
  • This image uses the default python.d.conf with nvidia_smi: yes uncommented. Use ./edit-config for futher customization.

netdata-glibc's People

Contributors

d34dc3n73r avatar joly0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

netdata-glibc's Issues

libnvidia-ml.so

Having trouble getting netdata to work with nvidia. I am able to run nvidia-smi on the host machine (openmediavault), as well as another docker container (plex media server). I was getting the same error in the plex container as netdata, editing config.toml to use ldconfig = "/sbin/ldconfig.real" fixed the issue with plex, and doesn't help netdata.

Here's my kernal version and docker version:
Linux 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10) x86_64 GNU/Linux

Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:37 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:43:46 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
nvidia:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0

I'm getting this error when running nvidia-smi in the container:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

As well as error's like this in the error log:

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] Get "http://127.0.0.1:9445/metrics": dial tcp 127.0.0.1:9445: connect: connection refused

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_gpu_exporter_local] check failed

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] Get "http://127.0.0.1:9454/metrics": dial tcp 127.0.0.1:9454: connect: connection refused

2022-01-13 21:05:35: go.d ERROR: prometheus[nvidia_smi_exporter_local] check failed

2022-01-13 21:05:35: python.d INFO: plugin[main] : [nvidia_smi] built 1 job(s) configs

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-smi' (disk '_usr_bin_nvidia-smi', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-smi') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/bin/nvidia-debugdump' (disk '_usr_bin_nvidia-debugdump', filesystem 'ext4', root '/usr/lib/nvidia/current/nvidia-debugdump') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ml.so.460.73.01' (disk '_usr_lib64_libnvidia-ml.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.73.01') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libcuda.so.460.73.01' (disk '_usr_lib64_libcuda.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.73.01') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/usr/lib64/libnvidia-ptxjitcompiler.so.460.73.01' (disk '_usr_lib64_libnvidia-ptxjitcompiler.so.460.73.01', filesystem 'ext4', root '/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.73.01') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidiactl' (disk '_dev_nvidiactl', filesystem 'devtmpfs', root '/nvidiactl') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm' (disk '_dev_nvidia-uvm', filesystem 'devtmpfs', root '/nvidia-uvm') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia-uvm-tools' (disk '_dev_nvidia-uvm-tools', filesystem 'devtmpfs', root '/nvidia-uvm-tools') is not a directory.

2022-01-13 21:05:36: netdata ERROR : PLUGIN[diskspace] : DISKSPACE: Mount point '/dev/nvidia0' (disk '_dev_nvidia0', filesystem 'devtmpfs', root '/nvidia0') is not a directory. (errno 22, Invalid argument)

2022-01-13 21:06:06: python.d ERROR: nvidia_smi[nvidia_smi] : xml parse failed: "b"NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.\nPlease also try adding directory that contains libnvidia-ml.so to your system PATH.\n"", error: syntax error: line 1, column 0

2022-01-13 21:06:06: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed

netdata cloud?

Any idea what im doing wrong? Ive setup an account and added

--runtime=nvidia --cap-add SYS_PTRACE --security-opt apparmor=unconfined -e NETDATA_CLAIM_TOKEN=XXX -e NETDATA_CLAIM_URL=https://app.netdata.cloud

it spams the log, but netdata says it doesnt get data.. might have to do with that it has a new unqiue name after each restart...? (and thus a new claim token... what have i been doing wrong?

image

2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1526/3946 bytes -61%, prep/sent/total = 0.15/0.15/0.30 ms) 200 '/api/v1/data?chart=system.net&_=1678401198273&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=sent' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1534/3885 bytes -61%, prep/sent/total = 0.12/0.12/0.24 ms) 200 '/api/v1/data?chart=system.io&_=1678401198276&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=out' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1689/4267 bytes -60%, prep/sent/total = 0.47/0.11/0.58 ms) 200 '/api/v1/data?chart=system.cpu&_=1678401198279&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1554/4056 bytes -62%, prep/sent/total = 0.09/0.10/0.19 ms) 200 '/api/v1/data?chart=system.net&_=1678401198282&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=received' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 993/3028 bytes -67%, prep/sent/total = 0.12/0.08/0.20 ms) 200 '/api/v1/data?chart=system.io&_=1678401198285&format=array&points=364&group=average&gtime=0&options=absolute|jsonwrap|nonzero&after=-780&dimensions=in' 2023-03-09 22:33:27: 10: 350 '[192.168.0.100]:61915' 'DATA' (sent/all = 1451/4258 bytes -66%, prep/sent/total = 0.30/0.10/0.41 ms) 200 '/api/v1/dat

Can't run nvidia-smi in container

Hello

first of all, thanks for figuring out a way to have NVIDIA GPU benchmarking working by just extending the base netdata image 🙏

I followed the instructions as reported on the DockerHub page.
I can start the container , and then access the webserver running at :19999.
However, I can't see any section hinting at a GPU / nvidia-smi benchmarking.

Not seeing any stats, I thought that maybe there was some issue with the execution of nvidia-smi (if they use it internally in netdata).

I tried executing nvidia-smi in the container:

docker exec netdata  nvidia-smi

but received this error:

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

The only way that I found for having nvidia-smi successfully executing via docker exec was the following:

docker exec netdata bash -c 'LD_PRELOAD=$(find /usr/lib64/ -name "libnvidia-ml.so.*")  nvidia-smi'

based on this StackOverflow answer

Any clues about how this issue could be solved?

Maybe I'll try to give a peek at netdata's sources to see if I can "patch" the system (supposing that the solution is indeed using LD_PRELOAD).

Best regards.

Best regards.

latest netdata releases not working?

I see the latest versioned tag on this image is v1.31.0 and the latest tag uses netdata v1.32.1-7-nightly, while the netdata/netdata image has v1.33.0. Where are the latest releases?

Symbol not found /usr/bin/nvidia-smi

Hello,

I just updated my Netdata container this morning after upgrading to v6.10-RC3 of Unraid and noticed the Nvidia Graphs were no longer loading. Looking in the Netdata container logs I am seeing this plugin load error:

Error relocating /usr/bin/nvidia-smi: __strtok_r: symbol not found Error relocating /usr/bin/nvidia-smi: __strdup: symbol not found 2022-03-17 15:24:04: python.d ERROR: nvidia_smi[nvidia_smi] : failed to invoke 'nvidia-smi' binary 2022-03-17 15:24:04: python.d INFO: plugin[main] : nvidia_smi[nvidia_smi] : check failed

I'm pretty sure the v6.10-RC3 update to Unraid didn't affect this, as Netdata was working after the RC3 update. This started happening after clicking the "apply update" button for the Netdata container in Unraid.

Potentially unrelated: I notice this container image says there is an update almost every day in Unraid. Is that normal?

unknow nvidia runtime

thanks for integrating nvidia-smi in netdata
I tried many times to reproduce it, but no luck.

Neither docker run/nvidia-docker run with --runtime option or docker-compose worked for me.
I am getting unknow runtime specified nvidia.

I have also added the configuration in daemon.json
Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.