Comments (4)
Hi, @gregjhogan. I can't reproduce the issue.
Running go.d.plugin in debug mode has nothing to do with working/not working when running as a systemd service.
You are right:
- the "nvidia_smi" module is disabled by default and needs to be manually enabled in "go.d.conf".
- you need to restart Netdata after modifying changes to "go.d.conf".
These are your go.d/nvidia_smi related logs:
Jun 07 20:32:44 gpu-server-11 netdata[45841]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 07 20:32:44 gpu-server-11 netdata[45841]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 07 21:30:01 gpu-server-11 netdata[45841]: level=info msg=stopped plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:19:17 gpu-server-11 netdata[403331]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:19:17 gpu-server-11 netdata[403331]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:55:41 gpu-server-11 netdata[403331]: level=info msg=stopped plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:56:08 gpu-server-11 netdata[435423]: level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi
Jun 12 18:56:08 gpu-server-11 netdata[435423]: level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
As you see the collector was stopped on Jun 7 an hour after the start. I don't know why, you need to check other logs at the time.
from netdata.
Yeah it was a weird one, as I haven't had this problem in the past. As you can see from the UI screen shots, even if it was running it wasn't collecting any data. If you want any additional logs, just let me know what to send and where to send them.
from netdata.
Had the same issue, restart the docker container ~10 times did not help. Only solution was running the debug command for some time and restart the container again as described above.
The logs always contained the lines, even when nvidia_smi didn't show up:
netdata | time=2024-06-19T21:54:11.614Z level=info msg="check success" plugin=go.d collector=nvidia_smi job=nvidia_smi netdata | time=2024-06-19T21:54:11.614Z level=info msg="started, data collection interval 10s" plugin=go.d collector=nvidia_smi job=nvidia_smi
from netdata.
@ilyam8 I just had it happen again and this time it was on a server that was previously working.
- UI last shows data for nvidia smi on June 19, 2024 (but noticed today)
- Today I try restarting the netdata service several times but still no data showing up in the UI for nvidia smi
- Run the command
./go.d.plugin -d -m nvidia_smi
and everything looks good - Restart the netdata service again and I see new data coming in now for nvidia smi
That was everything I did to get it working again, so no config changed between it not working and then working again.
from netdata.
Related Issues (20)
- [Bug]: HOT 1
- [Bug]: Chart no longer working HOT 5
- [Bug]: Docker: ndsudo cannot find fail2ban executables on host from within container HOT 4
- [Bug]: level=error msg="check failed" plugin=go.d collector=megacli job=megacli HOT 5
- [Bug]: pgbouncer plugin results in log spam after updating to PgBouncer 1.23.0 HOT 2
- [O365/Teams Connector EOL]: O365 connectors within Teams will be deprecated and notifications from this service will stop. HOT 1
- [Bug]: Streaming child nodes, parent is not catching up HOT 13
- [Bug]: netdata agent on windows can no longer start as a service HOT 2
- [Bug]: Netdata v4.46.2 fails to build on FreeBSD 14.1 amd64
- [Bug]: logstash collector overwrites URL path HOT 1
- [Bug]: MySQL auto detect does not match documentation
- [Bug]: New smartctl plugin doesn't work with manual configuration HOT 8
- [Bug]: HTTP response NOT_MOFILED closes the browser socket
- [Feat]: Use device names with serial numbers for HDDs and SSDs instead of sdX and nvmeXnX
- [Bug]: pager duty integration doesn't update resolved incident HOT 4
- [Bug]: error on '/usr/libexec/netdata/plugins.d/ndsudo ssacli-controllers-info': exit status 4 HOT 4
- [Feat]: Enhance Parent servers to be a proxy for agent auto-updates HOT 1
- [Bug]: ebpf plugin memory leak and docker impact HOT 1
- [Bug]: Docker: Invalid pid NNNN read (expected 0 to 32768). Ignoring process. HOT 6
- [Feat]: Netdata hot backup
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from netdata.