GithubHelp home page GithubHelp logo

Comments (4)

ilyam8 avatar ilyam8 commented on July 22, 2024

Hi, @gregjhogan. I can't reproduce the issue.

Running go.d.plugin in debug mode has nothing to do with working/not working when running as a systemd service.

You are right:

  • the "nvidia_smi" module is disabled by default and needs to be manually enabled in "go.d.conf".
  • you need to restart Netdata after modifying changes to "go.d.conf".

These are your go.d/nvidia_smi related logs:

Jun 07 20:32:44 gpu-server-11 netdata[45841]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 07 20:32:44 gpu-server-11 netdata[45841]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 07 21:30:01 gpu-server-11 netdata[45841]: level=info msg=stopped plugin=go.d collector=nvidia_smi job=nvidia_smi

Jun 12 18:19:17 gpu-server-11 netdata[403331]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:19:17 gpu-server-11 netdata[403331]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:55:41 gpu-server-11 netdata[403331]: level=info msg=stopped plugin=go.d collector=nvidia_smi job=nvidia_smi

Jun 12 18:56:08 gpu-server-11 netdata[435423]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:56:08 gpu-server-11 netdata[435423]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi

As you see the collector was stopped on Jun 7 an hour after the start. I don't know why, you need to check other logs at the time.

from netdata.

gregjhogan avatar gregjhogan commented on July 22, 2024

Yeah it was a weird one, as I haven't had this problem in the past. As you can see from the UI screen shots, even if it was running it wasn't collecting any data. If you want any additional logs, just let me know what to send and where to send them.

from netdata.

Thendon avatar Thendon commented on July 22, 2024

Had the same issue, restart the docker container ~10 times did not help. Only solution was running the debug command for some time and restart the container again as described above.

The logs always contained the lines, even when nvidia_smi didn't show up:
netdata | time=2024-06-19T21:54:11.614Z level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi netdata | time=2024-06-19T21:54:11.614Z level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi

from netdata.

gregjhogan avatar gregjhogan commented on July 22, 2024

@ilyam8 I just had it happen again and this time it was on a server that was previously working.

  • UI last shows data for nvidia smi on June 19, 2024 (but noticed today)
  • Today I try restarting the netdata service several times but still no data showing up in the UI for nvidia smi
  • Run the command ./go.d.plugin -d -m nvidia_smi and everything looks good
  • Restart the netdata service again and I see new data coming in now for nvidia smi

That was everything I did to get it working again, so no config changed between it not working and then working again.

from netdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.