GithubHelp home page GithubHelp logo

Comments (90)

KeithMyers avatar KeithMyers commented on August 28, 2024

Here you go Rick.
keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-ls --debug

Ubuntu: Validated
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 08:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0a:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0b:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0

keith@Serenity:~/Downloads/amdgpu-utils-extended$

DEBUG:gpu-utils:env.set_args:Command line arguments:
  Namespace(about=False, clinfo=False, debug=True, no_fan=False, ppm=False, pstates=False, short=False, table=False)
DEBUG:gpu-utils:env.set_args:Local TZ: PDT
DEBUG:gpu-utils:amdgpu-ls.main:########## amdgpu-ls v3.3.0
DEBUG:gpu-utils:env.check_env:Using python: 3.8.2
DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.4.0-37-generic
DEBUG:gpu-utils:env.check_env:Using Linux Distro: Ubuntu
DEBUG:gpu-utils:env.check_env:Linux Distro Description: Ubuntu 20.04 LTS
DEBUG:gpu-utils:env.check_env:Ubuntu package query tool: /usr/bin/dpkg
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [08:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2092515328]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:cl_index: {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [0a:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2091696128]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:cl_index: {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2091696128', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [GeForce RTX 2080]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.2 CUDA]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [440.64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.2]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:NV ocl_pcie_id [0b:00.0]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [46]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [1024 1024 64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [1024]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [32]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [2092515328]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {'08:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}, '0a:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2091696128', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}, '0b:00.0': {'prf_wg_multiple': '32', 'max_wg_size': '1024', 'prf_wg_size': None, 'max_wi_sizes': '1024 1024 64', 'max_wi_dim': '3', 'max_mem_allocation': '2092515328', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '46', 'device_name': 'GeForce RTX 2080', 'opencl_version': 'OpenCL C 1.2', 'driver_version': '440.64', 'device_version': 'OpenCL 1.2 CUDA'}}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 3 GPUs
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 0a4eaef50da94c43aa9680928bfbe96f to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 08:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card0/device] contents:
['uevent', 'resource3_wc', 'resource5', 'resource3', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'i2c-17', 'iommu_group', 'local_cpus', 'firmware_node', 'power', 'class', 'reset', 'i2c-15', 'numa_node', 'resource', 'rescan', 'max_link_width', 'msi_bus', 'device', 'i2c-13', 'boot_vga', 'aer_dev_nonfatal', 'current_link_width', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'i2c-18', 'resource1_wc', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'i2c-16', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'iommu', 'aer_dev_fatal', 'enable', 'link', 'i2c-14', 'modalias', 'i2c-12', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '08:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card0/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card0/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 66652196e2d44a2aa079683b0605d8a3 to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0a:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card1/device] contents:
['uevent', 'resource3_wc', 'resource5', 'i2c-10', 'resource3', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'iommu_group', 'local_cpus', 'firmware_node', 'i2c-8', 'power', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'i2c-6', 'max_link_width', 'msi_bus', 'device', 'boot_vga', 'aer_dev_nonfatal', 'current_link_width', 'i2c-11', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'resource1_wc', 'i2c-9', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'i2c-7', 'iommu', 'aer_dev_fatal', 'enable', 'link', 'i2c-5', 'modalias', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0a:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card1/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card1/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item aec7577d8dd847c78a9b8755c9b22321 to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0b:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0b:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. TU104 [GeForce RTX 2080 Rev. A]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
device_dir: /sys/class/drm/card1/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
device_dir: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path set to: /sys/class/drm/card2/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card dir [/sys/class/drm/card2/device] contents:
['uevent', 'resource3_wc', 'resource5', 'i2c-20', 'resource3', 'i2c-19', 'broken_parity_status', 'subsystem_device', 'rom', 'dma_mask_bits', 'vendor', 'resource1', 'iommu_group', 'local_cpus', 'firmware_node', 'power', 'i2c-25', 'class', 'reset', 'numa_node', 'resource', 'rescan', 'max_link_width', 'msi_bus', 'i2c-23', 'device', 'boot_vga', 'aer_dev_nonfatal', 'i2c-21', 'current_link_width', 'driver', 'max_link_speed', 'local_cpulist', 'driver_override', 'subsystem', 'd3cold_allowed', 'irq', 'revision', 'current_link_speed', 'resource1_wc', 'aer_dev_correctable', 'consistent_dma_mask_bits', 'resource0', 'config', 'ari_enabled', 'msi_irqs', 'remove', 'iommu', 'aer_dev_fatal', 'i2c-24', 'enable', 'link', 'i2c-22', 'modalias', 'subsystem_vendor', 'drm']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:HW file search: []
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0b:00.0', 'model': 'NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', 'model_short': 'UNKNOWN', 'vendor': <vendor.NVIDIA: 4>, 'driver': 'nvidiafb, nouveau, nvidia_drm, nvidia', 'card_path': '/sys/class/drm/card2/device', 'sys_card_path': '/sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': True, 'compute_platform': 'OpenCL 1.2 CUDA'}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor:read_gpu_sensor set to [/sys/class/drm/card2/device]
DEBUG:gpu-utils:GPUmodule.read_pciid_model:Logger active in module

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Thanks for the details! This confirms that NV only uses the generic PCIe sensors in the card path. Looks like no GPU specific sensors. In that case, perhaps the nvidia.smi is the only choice for reading card details. I was hoping to first just read the details necessary for monitor and plot utilities and use only those parameters in ls. Here is a list of those sensors for AMD:

SensorSet.Monitor: {'HWMON':  ['power', 'power_cap', 'temperatures', 'voltages',
                               'frequencies', 'fan_pwm'],
                    'DEVICE': ['loading', 'mem_loading', 'mem_gtt_used', 'mem_vram_used',
                               'sclk_ps', 'mclk_ps', 'ppm']},

Can you help provide the nvidia command line and sample output for this set?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Here is the code we worked out for benchMT for power reading:

try:
    nsmi_items = subprocess.check_output(
        '{} -i {} --query-gpu=power.draw --format=csv,noheader,nounits'.format(
         MB_CONST.cmd_nvidia_smi, self.pcie_id), shell=True).decode().split('\n')
    power_reading = float(nsmi_items[0].strip())
except (subprocess.CalledProcessError, OSError) as except_err:
    power_reading = None

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I can't seem to make any sensible output from that command stack. Nothing but syntax errors.

keith@Serenity:~$ nvidia-smi nsmi_items = subprocess.check_output(
bash: syntax error near unexpected token `('
keith@Serenity:~$         '{} -i {} --query-gpu=power.draw --format=csv,noheader,nounits'.format(
bash: syntax error near unexpected token `newline'
keith@Serenity:~$          MB_CONST.cmd_nvidia_smi, self.pcie_id), shell=True).decode().split('\n')
bash: syntax error near unexpected token `)'
keith@Serenity:~$     power_reading = float(nsmi_items[0].strip())
bash: syntax error near unexpected token `('
keith@Serenity:~$ except (subprocess.CalledProcessError, OSError) as except_err:
bash: syntax error near unexpected token `subprocess.CalledProcessError,'
keith@Serenity:~$     power_reading = None

If I break it down to just:
nvidia-smi --query-gpu=power.draw --format=csv,noheader,nounits
I get something:
133.91
189.84
104.34

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I can't seem to find any way to get voltages out of nvidia-smi. Also can't get fan_pwm, just fan.speed.

nvidia-smi --query-gpu=power.limit --format=csv,noheader,nounits
200.00
200.00
200.00
nvidia-smi --query-gpu=power.max_limit --format=csv,noheader,nounits
292.00
292.00
292.00
nvidia-smi --query-gpu=power.default_limit --format=csv,noheader,nounits
225.00
225.00
225.00
nvidia-smi --query-gpu=power.min_limit --format=csv,noheader,nounits
105.00
105.00
105.00
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits
45
43
37
nvidia-smi --query-gpu=temperature.memory --format=csv,noheader,nounits
N/A
N/A
N/A
nvidia-smi --query-gpu=clocks.current.graphics --format=csv,noheader,nounits
1515
1965
1995
nvidia-smi --query-gpu=clocks.sm --format=csv,noheader,nounits
2010
1965
1995
nvidia-smi --query-gpu=clocks.mem --format=csv,noheader,nounits
7199
7199
7199
nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits
88
1
98
nvidia-smi --query-gpu=utilization.memory --format=csv,noheader,nounits
6
0
3
nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits
100
100
100
nvidia-smi --query-gpu=pcie.link.width.max --format=csv,noheader,nounits
16
16
16
nvidia-smi --query-gpu=pcie.link.width.current --format=csv,noheader,nounits
4
8
8

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

@KeithMyers Thanks for the details. Do you know if several query items can be provided in a single call? Something like:

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

The latest on extended branch includes one read statement which is printed in raw form. Let me know when you have a chance to test. Just execute amdgpu-ls.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

@KeithMyers Thanks for the details. Do you know if several query items can be provided in a single call? Something like:

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits

Should be able. Supposed to separate arguments with just a comma like your example.

nvidia-smi --query-gpu=pcie.link.width.current,pcie.link.width.max,power.draw  --format=csv,noheader,nounits
4, 16, 99.70
8, 16, 192.69
8, 16, 195.36

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have a single read implemented. Let me know the output. Should just be a single string before the ls output.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I'm not sure what you want executed. I thought you had updated the repo. I see a commit that is two hours old. I just downloaded the repo again and don't see any change from the last test. Same output:

Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
/bin/sh: 1: None: not found
/bin/sh: 1: None: not found
/bin/sh: 1: None: not found
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I had a typo in command name. Fixed that and printing out full command for examination.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Ok, here is something different after the typo fix.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['98.49, 42, N/A, 2010, 2010, 7199, 100, 4, 100, 4', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['157.72, 42, N/A, 1980, 1980, 7199, 98, 49, 100, 8', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current --format=csv,noheader,nounits
NV query result: [['196.12, 47, N/A, 1935, 1935, 7199, 96, 24, 100, 8', '']]
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 3 r-only, 0 w-only

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Looks good! Do you know if the current p-state is available? It would be easiest if I can fit NV cards into some of the reports I have already developed.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

nvidia-smi --query-gpu=pstate --format=csv,noheader,nounits
P2
P2
P2

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Are memory pstates also available?

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Would the help output from nvidia-smi --help-query-gpu be helpful? Basically covers all parameters available from the query-gpu function.

nvidia-smi --help-query-gpu
List of valid properties to query for the switch "--query-gpu=":

"timestamp"
The timestamp of where the query was made in format "YYYY/MM/DD HH:MM:SS.msec".

"driver_version"
The version of the installed NVIDIA display driver. This is an alphanumeric string.

"count"
The number of NVIDIA GPUs in the system.

"name" or "gpu_name"
The official product name of the GPU. This is an alphanumeric string. For all products.

"serial" or "gpu_serial"
This number matches the serial number physically printed on each board. It is a globally unique immutable alphanumeric value.

"uuid" or "gpu_uuid"
This value is the globally unique immutable alphanumeric identifier of the GPU. It does not correspond to any physical label on the board.

"pci.bus_id" or "gpu_bus_id"
PCI bus id as "domain:bus:device.function", in hex.

"pci.domain"
PCI domain number, in hex.

"pci.bus"
PCI bus number, in hex.

"pci.device"
PCI device number, in hex.

"pci.device_id"
PCI vendor device id, in hex

"pci.sub_device_id"
PCI Sub System id, in hex

"pcie.link.gen.current"
The current PCI-E link generation. These may be reduced when the GPU is not in use.

"pcie.link.gen.max"
The maximum PCI-E link generation possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.

"pcie.link.width.current"
The current PCI-E link width. These may be reduced when the GPU is not in use.

"pcie.link.width.max"
The maximum PCI-E link width possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.

"index"
Zero based index of the GPU. Can change at each boot.

"display_mode"
A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU's connectors. "Enabled" indicates an attached display. "Disabled" indicates otherwise.

"display_active"
A flag that indicates whether a display is initialized on the GPU's (e.g. memory is allocated on the device for display). Display can be active even when no monitor is physically attached. "Enabled" indicates an active display. "Disabled" indicates otherwise.

"persistence_mode"
A flag that indicates whether persistence mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. Linux only.

"accounting.mode"
A flag that indicates whether accounting mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When accounting is enabled statistics are calculated for each compute process running on the GPU.Statistics can be queried during the lifetime or after termination of the process.The execution time of process is reported as 0 while the process is in running state and updated to actualexecution time after the process has terminated. See --help-query-accounted-apps for more info.

"accounting.buffer_size"
The size of the circular buffer that holds list of processes that can be queried for accounting stats. This is the maximum number of processes that accounting information will be stored for before information about oldest processes will get overwritten by information about new processes.

Section about driver_model properties
On Windows, the TCC and WDDM driver models are supported. The driver model can be changed with the (-dm) or (-fdm) flags. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of "N/A". Only for selected products. Please see feature matrix in NVML documentation.

"driver_model.current"
The driver model currently in use. Always "N/A" on Linux.

"driver_model.pending"
The driver model that will be used on the next reboot. Always "N/A" on Linux.

"vbios_version"
The BIOS of the GPU board.

Section about inforom properties
Version numbers for each object in the GPU board's inforom storage. The inforom is a small, persistent store of configuration and state data for the GPU. All inforom version fields are numerical. It can be useful to know these version numbers because some GPU features are only available with inforoms of a certain version or higher.

"inforom.img" or "inforom.image"
Global version of the infoROM image. Image version just like VBIOS version uniquely describes the exact version of the infoROM flashed on the board in contrast to infoROM object version which is only an indicator of supported features.

"inforom.oem"
Version for the OEM configuration data.

"inforom.ecc"
Version for the ECC recording data.

"inforom.pwr" or "inforom.power"
Version for the power management data.

Section about gom properties
GOM allows to reduce power usage and optimize GPU throughput by disabling GPU features. Each GOM is designed to meet specific user needs.
In "All On" mode everything is enabled and running at full speed.
The "Compute" mode is designed for running only compute tasks. Graphics operations are not allowed.
The "Low Double Precision" mode is designed for running graphics applications that don't require high bandwidth double precision.
GOM can be changed with the (--gom) flag.

"gom.current" or "gpu_operation_mode.current"
The GOM currently in use.

"gom.pending" or "gpu_operation_mode.pending"
The GOM that will be used on the next reboot.

"fan.speed"
The fan speed value is the percent of maximum speed that the device's fan is currently intended to run at. It ranges from 0 to 100 %. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure.

"pstate"
The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).

Section about clocks_throttle_reasons properties
Retrieves information about factors that are reducing the frequency of clocks. If all throttle reasons are returned as "Not Active" it means that clocks are running as high as possible.

"clocks_throttle_reasons.supported"
Bitmask of supported clock throttle reasons. See nvml.h for more details.

"clocks_throttle_reasons.active"
Bitmask of active clock throttle reasons. See nvml.h for more details.

"clocks_throttle_reasons.gpu_idle"
Nothing is running on the GPU and the clocks are dropping to Idle state. This limiter may be removed in a later release.

"clocks_throttle_reasons.applications_clocks_setting"
GPU clocks are limited by applications clocks setting. E.g. can be changed by nvidia-smi --applications-clocks=

"clocks_throttle_reasons.sw_power_cap"
SW Power Scaling algorithm is reducing the clocks below requested clocks because the GPU is consuming too much power. E.g. SW power cap limit can be changed with nvidia-smi --power-limit=

"clocks_throttle_reasons.hw_slowdown"
HW Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of:
 HW Thermal Slowdown: temperature being too high
 HW Power Brake Slowdown: External Power Brake Assertion is triggered (e.g. by the system power supply)
 * Power draw is too high and Fast Trigger protection is reducing the clocks
 * May be also reported during PState or clock change
 * This behavior may be removed in a later release

"clocks_throttle_reasons.hw_thermal_slowdown"
HW Thermal Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of temperature being too high

"clocks_throttle_reasons.hw_power_brake_slowdown"
HW Power Brake Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of External Power Brake Assertion being triggered (e.g. by the system power supply)

"clocks_throttle_reasons.sw_thermal_slowdown"
SW Thermal capping algorithm is reducing clocks below requested clocks because GPU temperature is higher than Max Operating Temp.

"clocks_throttle_reasons.sync_boost"
Sync Boost This GPU has been added to a Sync boost group with nvidia-smi or DCGM in
 * order to maximize performance per watt. All GPUs in the sync boost group
 * will boost to the minimum possible clocks across the entire group. Look at
 * the throttle reasons for other GPUs in the system to see why those GPUs are
 * holding this one at lower clocks.

Section about memory properties
On-board memory information. Reported total memory is affected by ECC state. If ECC is enabled the total available memory is decreased by several percent, due to the requisite parity bits. The driver may also reserve a small amount of memory for internal use, even without active work on the GPU.

"memory.total"
Total installed GPU memory.

"memory.used"
Total memory allocated by active contexts.

"memory.free"
Total free memory.

"compute_mode"
The compute mode flag indicates whether individual or multiple compute applications may run on the GPU.
"Default" means multiple contexts are allowed per device.
"Exclusive_Process" means only one context is allowed per device, usable from multiple threads at a time.
"Prohibited" means no contexts are allowed per device (no compute apps).

Section about utilization properties
Utilization rates report how busy each GPU is over time, and can be used to determine how much an application is using the GPUs in the system.

"utilization.gpu"
Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.

"utilization.memory"
Percent of time over the past sample period during which global (device) memory was being read or written.
The sample period may be between 1 second and 1/6 second depending on the product.

Section about encoder.stats properties
Encoder stats report number of encoder sessions, average FPS and average latency in ms for given GPUs in the system.

"encoder.stats.sessionCount"
Number of encoder sessions running on the GPU.

"encoder.stats.averageFps"
Average FPS of all sessions running on the GPU.

"encoder.stats.averageLatency"
Average latency in microseconds of all sessions running on the GPU.

Section about ecc.mode properties
A flag that indicates whether ECC support is enabled. May be either "Enabled" or "Disabled". Changes to ECC mode require a reboot. Requires Inforom ECC object version 1.0 or higher.

"ecc.mode.current"
The ECC mode that the GPU is currently operating under.

"ecc.mode.pending"
The ECC mode that the GPU will operate under after the next reboot.

Section about ecc.errors properties
NVIDIA GPUs can provide error counts for various types of ECC errors. Some ECC errors are either single or double bit, where single bit errors are corrected and double bit errors are uncorrectable. Texture memory errors may be correctable via resend or uncorrectable if the resend fails. These errors are available across two timescales (volatile and aggregate). Single bit ECC errors are automatically corrected by the HW and do not result in data corruption. Double bit errors are detected but not corrected. Please see the ECC documents on the web for information on compute application behavior when double bit errors occur. Volatile error counters track the number of errors detected since the last driver load. Aggregate error counts persist indefinitely and thus act as a lifetime counter.

"ecc.errors.corrected.volatile.device_memory"
Errors detected in global device memory.

"ecc.errors.corrected.volatile.register_file"
Errors detected in register file memory.

"ecc.errors.corrected.volatile.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.corrected.volatile.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.corrected.volatile.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.corrected.volatile.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.corrected.aggregate.device_memory"
Errors detected in global device memory.

"ecc.errors.corrected.aggregate.register_file"
Errors detected in register file memory.

"ecc.errors.corrected.aggregate.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.corrected.aggregate.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.corrected.aggregate.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.corrected.aggregate.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.uncorrected.volatile.device_memory"
Errors detected in global device memory.

"ecc.errors.uncorrected.volatile.register_file"
Errors detected in register file memory.

"ecc.errors.uncorrected.volatile.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.uncorrected.volatile.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.uncorrected.volatile.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.uncorrected.volatile.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

"ecc.errors.uncorrected.aggregate.device_memory"
Errors detected in global device memory.

"ecc.errors.uncorrected.aggregate.register_file"
Errors detected in register file memory.

"ecc.errors.uncorrected.aggregate.l1_cache"
Errors detected in the L1 cache.

"ecc.errors.uncorrected.aggregate.l2_cache"
Errors detected in the L2 cache.

"ecc.errors.uncorrected.aggregate.texture_memory"
Parity errors detected in texture memory.

"ecc.errors.uncorrected.aggregate.total"
Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory.

Section about retired_pages properties
NVIDIA GPUs can retire pages of GPU device memory when they become unreliable. This can happen when multiple single bit ECC errors occur for the same page, or on a double bit ECC error. When a page is retired, the NVIDIA driver will hide it such that no driver, or application memory allocations can access it.

"retired_pages.single_bit_ecc.count" or "retired_pages.sbe"
The number of GPU device memory pages that have been retired due to multiple single bit ECC errors.

"retired_pages.double_bit.count" or "retired_pages.dbe"
The number of GPU device memory pages that have been retired due to a double bit ECC error.

"retired_pages.pending"
Checks if any GPU device memory pages are pending retirement on the next reboot. Pages that are pending retirement can still be allocated, and may cause further reliability issues.

"temperature.gpu"
 Core GPU temperature. in degrees C.

"temperature.memory"
 HBM memory temperature. in degrees C.

"power.management"
A flag that indicates whether power management is enabled. Either "Supported" or "[Not Supported]". Requires Inforom PWR object version 3.0 or higher or Kepler device.

"power.draw"
The last measured power draw for the entire board, in watts. Only available if power management is supported. This reading is accurate to within +/- 5 watts.

"power.limit"
The software power limit in watts. Set by software like nvidia-smi. On Kepler devices Power Limit can be adjusted using [-pl | --power-limit=] switches.

"enforced.power.limit"
The power management algorithm's power ceiling, in watts. Total board power draw is manipulated by the power management algorithm such that it stays under this value. This value is the minimum of various power limiters.

"power.default_limit"
The default power management algorithm's power ceiling, in watts. Power Limit will be set back to Default Power Limit after driver unload.

"power.min_limit"
The minimum value in watts that power limit can be set to.

"power.max_limit"
The maximum value in watts that power limit can be set to.

"clocks.current.graphics" or "clocks.gr"
Current frequency of graphics (shader) clock.

"clocks.current.sm" or "clocks.sm"
Current frequency of SM (Streaming Multiprocessor) clock.

"clocks.current.memory" or "clocks.mem"
Current frequency of memory clock.

"clocks.current.video" or "clocks.video"
Current frequency of video encoder/decoder clock.

Section about clocks.applications properties
User specified frequency at which applications will be running at. Can be changed with [-ac | --applications-clocks] switches.

"clocks.applications.graphics" or "clocks.applications.gr"
User specified frequency of graphics (shader) clock.

"clocks.applications.memory" or "clocks.applications.mem"
User specified frequency of memory clock.

Section about clocks.default_applications properties
Default frequency at which applications will be running at. Application clocks can be changed with [-ac | --applications-clocks] switches. Application clocks can be set to default using [-rac | --reset-applications-clocks] switches.

"clocks.default_applications.graphics" or "clocks.default_applications.gr"
Default frequency of applications graphics (shader) clock.

"clocks.default_applications.memory" or "clocks.default_applications.mem"
Default frequency of applications memory clock.

Section about clocks.max properties
Maximum frequency at which parts of the GPU are design to run.

"clocks.max.graphics" or "clocks.max.gr"
Maximum frequency of graphics (shader) clock.

"clocks.max.sm" or "clocks.max.sm"
Maximum frequency of SM (Streaming Multiprocessor) clock.

"clocks.max.memory" or "clocks.max.mem"
Maximum frequency of memory clock.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Are memory pstates also available?

Apparently not.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have implemented much of the core code to support NV, but it currently will just display a dictionary of raw data. Let me know if it works. It was a lot of code to write with no ability to test it out...

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Something regressed in the e3cec7b commit.

 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 98, in main
    gpu_list.set_gpu_list(clinfo_flag=True)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1675, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1126, in read_gpu_sensor_set_nv
    raise TypeError('Invalid SensorSet value: [{}]'.format(data_type))
TypeError: Invalid SensorSet value: [set.All]

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

It was an error in my error checking! I just pushed a change.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Still don't think this is what you expected. No values for parameters you were able to retrieve previously.

keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-ls

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72b78c440>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04e40>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04640>]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04640>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04700>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7fb72bd04600>]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

The errors in amdgpu-monitor might point out something.
keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-monitor

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba25680>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba25700>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c240>]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c240>]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c6c0>]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [<zip object at 0x7f5ddba2c480>]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Traceback (most recent call last):
  File "./amdgpu-monitor", line 384, in <module>
    main()
  File "./amdgpu-monitor", line 366, in main
    com_gpu_list.read_gpu_sensor_set(data_type=Gpu.GpuItem.SensorSet.Monitor)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1918, in read_gpu_sensor_set
    gpu.read_gpu_sensor_set(data_type)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1115, in read_gpu_sensor_set
    return self.read_gpu_sensor_set_nv(data_type)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1126, in read_gpu_sensor_set_nv
    raise TypeError('Invalid SensorSet value: [{}]'.format(data_type))
TypeError: Invalid SensorSet value: [set.Monitor]

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I still have lots of work to do before any of the utilities are functional. I am currently just making sure I have a way to read sensors into a dictionary. I had an error in the zip statement meant to accomplish that. Please check out the latest. The only place that you will see the parameters is in the display of the dictionary for now.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

OK, here is the dictionary.
./amdgpu-ls

OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 159.92, 52, N/A, 1995, 1995, 7199, 87, 6, 100, 4, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 173.23, 46, N/A, 1965, 1965, 7199, 75, 51, 100, 8, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 103.02, 41, N/A, 1980, 1980, 7199, 98, 4, 100, 8, P2', 'power.min_limit': ''}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 157.27, 53, N/A, 1995, 1995, 7199, 87, 6, 100, 4, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 172.96, 47, N/A, 1965, 1965, 7199, 75, 51, 100, 8, P2', 'power.min_limit': ''}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV query result: [{'power.limit': '200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 103.02, 41, N/A, 1980, 1980, 7199, 98, 4, 100, 8, P2', 'power.min_limit': ''}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have added some more debug. The output is different than expected, so need to workout how to get into a dictionary.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Do you want just the normal dictionary output from amdgpu-ls? Or do you want the full debug output?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Just amdgpu-ls normal output is needed.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 98.11, 43, N/A, 1995, 1995, 7199, 100, 4, 100, 4, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', '98.11,', '43,', 'N/A,', '1995,', '1995,', '7199,', '100,', '4,', '100,', '4,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', 'temperature.memory': '98.11,', 'clocks.current.graphics': '43,', 'clocks.sm': 'N/A,', 'clocks.mem': '1995,', 'utilization.gpu': '1995,', 'utilization.memory': '7199,', 'fan.speed': '100,', 'pcie.link.width.current': '4,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 150.82, 45, N/A, 1965, 1965, 7199, 86, 55, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7979,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', '150.82,', '45,', 'N/A,', '1965,', '1965,', '7199,', '86,', '55,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7979,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', 'temperature.memory': '150.82,', 'clocks.current.graphics': '45,', 'clocks.sm': 'N/A,', 'clocks.mem': '1965,', 'utilization.gpu': '1965,', 'utilization.memory': '7199,', 'fan.speed': '86,', 'pcie.link.width.current': '55,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 200.21, 49, N/A, 1950, 1950, 7199, 97, 19, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', '200.21,', '49,', 'N/A,', '1950,', '1950,', '7199,', '97,', '19,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', 'temperature.memory': '200.21,', 'clocks.current.graphics': '49,', 'clocks.sm': 'N/A,', 'clocks.mem': '1950,', 'utilization.gpu': '1950,', 'utilization.memory': '7199,', 'fan.speed': '97,', 'pcie.link.width.current': '19,', 'pstate': '100,'}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 98.44, 43, N/A, 1995, 1995, 7199, 100, 4, 100, 4, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', '98.44,', '43,', 'N/A,', '1995,', '1995,', '7199,', '100,', '4,', '100,', '4,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c,', 'temperature.memory': '98.44,', 'clocks.current.graphics': '43,', 'clocks.sm': 'N/A,', 'clocks.mem': '1995,', 'utilization.gpu': '1995,', 'utilization.memory': '7199,', 'fan.speed': '100,', 'pcie.link.width.current': '4,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 176.98, 45, N/A, 1965, 1965, 7199, 86, 55, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7979,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', '176.98,', '45,', 'N/A,', '1965,', '1965,', '7199,', '86,', '55,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7979,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a,', 'temperature.memory': '176.98,', 'clocks.current.graphics': '45,', 'clocks.sm': 'N/A,', 'clocks.mem': '1965,', 'utilization.gpu': '1965,', 'utilization.memory': '7199,', 'fan.speed': '86,', 'pcie.link.width.current': '55,', 'pstate': '100,'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 200.97, 49, N/A, 1935, 1935, 7199, 97, 19, 100, 8, P2', '']
nsmi_items: [21]
['200.00,', '105.00,', '292.00,', '7982,', '90.04.23.00.5F,', '440.64,', 'GeForce', 'RTX', '2080,', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', '200.97,', '49,', 'N/A,', '1935,', '1935,', '7199,', '97,', '19,', '100,', '8,', 'P2']
NV query result: [{'power.limit': '200.00,', 'power.min_limit': '105.00,', 'power.max_limit': '292.00,', 'memory.total': '7982,', 'vbios_version': '90.04.23.00.5F,', 'driver_version': '440.64,', 'name': 'GeForce', 'gpu_uuid': 'RTX', 'power.draw': '2080,', 'temperature.gpu': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d,', 'temperature.memory': '200.97,', 'clocks.current.graphics': '49,', 'clocks.sm': 'N/A,', 'clocks.mem': '1935,', 'utilization.gpu': '1935,', 'utilization.memory': '7199,', 'fan.speed': '97,', 'pcie.link.width.current': '19,', 'pstate': '100,'}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Hi Keith, did the dictionary info get displayed? Nevermind, I see it now.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Looks like I did not split the results correctly. I have pushed another version with more debug print statements. Can you run again? Thanks!

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 96.73, 41, N/A, 1995, 1995, 7199, 99, 4, 100, 4, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-089608fe-cba5-4711-bf68-085fd0711d8c', ' 96.73', ' 41', ' N/A', ' 1995', ' 1995', ' 7199', ' 99', ' 4', ' 100', ' 4', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', '96.73', '41', 'N/A', '1995', '1995', '7199', '99', '4', '100', '4', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', 'power.draw': '96.73', 'temperature.gpu': '41', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1995', 'clocks.sm': '1995', 'clocks.mem': '7199', 'utilization.gpu': '99', 'utilization.memory': '4', 'fan.speed': '100', 'pcie.link.width.current': '4', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 163.16, 42, N/A, 1965, 1965, 7199, 88, 56, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7979', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-22b2c6ac-2d49-4863-197c-9c469071178a', ' 163.16', ' 42', ' N/A', ' 1965', ' 1965', ' 7199', ' 88', ' 56', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7979', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', '163.16', '42', 'N/A', '1965', '1965', '7199', '88', '56', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7979', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', 'power.draw': '163.16', 'temperature.gpu': '42', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '88', 'utilization.memory': '56', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 201.40, 47, N/A, 1950, 1950, 7199, 97, 17, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', ' 201.40', ' 47', ' N/A', ' 1950', ' 1950', ' 7199', ' 97', ' 17', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', '201.40', '47', 'N/A', '1950', '1950', '7199', '97', '17', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', 'power.draw': '201.40', 'temperature.gpu': '47', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1950', 'clocks.sm': '1950', 'clocks.mem': '7199', 'utilization.gpu': '97', 'utilization.memory': '17', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 96.01, 41, N/A, 1995, 1995, 7199, 99, 4, 100, 4, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-089608fe-cba5-4711-bf68-085fd0711d8c', ' 96.01', ' 41', ' N/A', ' 1995', ' 1995', ' 7199', ' 99', ' 4', ' 100', ' 4', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', '96.01', '41', 'N/A', '1995', '1995', '7199', '99', '4', '100', '4', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-089608fe-cba5-4711-bf68-085fd0711d8c', 'power.draw': '96.01', 'temperature.gpu': '41', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1995', 'clocks.sm': '1995', 'clocks.mem': '7199', 'utilization.gpu': '99', 'utilization.memory': '4', 'fan.speed': '100', 'pcie.link.width.current': '4', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 174.71, 42, N/A, 1965, 1965, 7199, 88, 56, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7979', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-22b2c6ac-2d49-4863-197c-9c469071178a', ' 174.71', ' 42', ' N/A', ' 1965', ' 1965', ' 7199', ' 88', ' 56', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7979', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', '174.71', '42', 'N/A', '1965', '1965', '7199', '88', '56', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7979', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-22b2c6ac-2d49-4863-197c-9c469071178a', 'power.draw': '174.71', 'temperature.gpu': '42', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '88', 'utilization.memory': '56', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
nsmi_items: [2]
['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 201.40, 47, N/A, 1965, 1965, 7199, 97, 17, 100, 8, P2', '']
nsmi_items: [19]
['200.00', ' 105.00', ' 292.00', ' 7982', ' 90.04.23.00.5F', ' 440.64', ' GeForce RTX 2080', ' GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', ' 201.40', ' 47', ' N/A', ' 1965', ' 1965', ' 7199', ' 97', ' 17', ' 100', ' 8', ' P2']
new_nsmi_items: [19]
['200.00', '105.00', '292.00', '7982', '90.04.23.00.5F', '440.64', 'GeForce RTX 2080', 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', '201.40', '47', 'N/A', '1965', '1965', '7199', '97', '17', '100', '8', 'P2']
query_list: [19]
['power.limit', 'power.min_limit', 'power.max_limit', 'memory.total', 'vbios_version', 'driver_version', 'name', 'gpu_uuid', 'power.draw', 'temperature.gpu', 'temperature.memory', 'clocks.current.graphics', 'clocks.sm', 'clocks.mem', 'utilization.gpu', 'utilization.memory', 'fan.speed', 'pcie.link.width.current', 'pstate']
NV query result: [{'power.limit': '200.00', 'power.min_limit': '105.00', 'power.max_limit': '292.00', 'memory.total': '7982', 'vbios_version': '90.04.23.00.5F', 'driver_version': '440.64', 'name': 'GeForce RTX 2080', 'gpu_uuid': 'GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d', 'power.draw': '201.40', 'temperature.gpu': '47', 'temperature.memory': 'N/A', 'clocks.current.graphics': '1965', 'clocks.sm': '1965', 'clocks.mem': '7199', 'utilization.gpu': '97', 'utilization.memory': '17', 'fan.speed': '100', 'pcie.link.width.current': '8', 'pstate': 'P2'}]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
   Current SCLK P-State: ['', '']
      SCLK Range: ['', '']
   Current MCLK P-State: ['', '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have pushed a version with basic amdgpu-ls functionality. Still needs a lot of work. Let me know if it works on your system.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

No readings in this one. Is that what you wanted?

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Error getting p-states: /sys/class/drm/card0/device/pp_od_clk_voltage
Error getting p-states: /sys/class/drm/card1/device/pp_od_clk_voltage
Error getting p-states: /sys/class/drm/card2/device/pp_od_clk_voltage
Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 08:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 0a:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   PCIe ID: 0b:00.0
   Driver: 440.64
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I pushed a new version. It was still checking an AMD specific file and reset readable to False. Hopefully this will work. Will need major refactoring afterward.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.current.graphics,clocks.sm,clocks.mem,utilization.gpu,utilization.memory,fan.speed,pcie.link.width.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 164.400
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 82
   Current Memory Loading (%): 23
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7982
   Current  Temps (C): {'temperature.gpu': 55.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 0a:00.0
      Link Speed: None
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 166.300
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 98
   Current Memory Loading (%): 42
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7979
   Current  Temps (C): {'temperature.gpu': 44.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: None
   PCIe ID: 0b:00.0
      Link Speed: None
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 200.800
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Enable: None
   Fan PWM Mode: [None, 'UNK']
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
   Current Fan PWM (%): None
      Fan Speed Range (rpm): [None, None]
      Fan PWM Range (%): [None, None]
   ##################################################
   Current GPU Loading (%): 97
   Current Memory Loading (%): 8
   Current GTT Memory Usage (%): None
      Current GTT Memory Used (GB): None
      Total GTT Memory (GB): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): 7982
   Current  Temps (C): {'temperature.gpu': 52.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.current.graphics': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Current MCLK P-State: [2, '']
      MCLK Range: ['', '']
   Power Profile Mode: None
   Power DPM Force Performance Level: None

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I just pushed a version that reads more items and skips those not applicable in the amdgpu-ls output. Still may be able to implement clock ranges, but need to know which is most relevant. Please post output.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

OK, here is your latest.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   Display Card Model: None
   PCIe ID: 08:00.0
      Link Speed: None
      Link Width: None
   ##################################################
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   vBIOS Version: None
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: None
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
      Power Cap Range (W): [None, None]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): None
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): None
   Current Memory Loading (%): None
   Current VRAM Usage (%): None
      Current VRAM Used (GB): None
      Total VRAM (GB): None
   Current  Temps (C): None
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): None
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 145, in main
    gpu_list.print(short=args.short, clflag=args.clinfo)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 2023, in print
    gpu.print(short=short, clflag=clflag)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1366, in print
    if isinstance(self.get_params_value(k), float):
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 612, in get_params_value
    return self.prm[name]
KeyError: 'frequencies_max'

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

just pushed a fix

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

not sure why the params are all None before the syntax error. Maybe something when wrong with the query string...

Can you try the command that is printed before the utility output:

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,clocks.max.video,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
Field "clocks.max.video" is not a valid field to query.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Removed the invalid clocks.max.video

/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 169.38, 51, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 84, 16, 305, 100, [N/A], 4, 2, P2


from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I just pushed a fix.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
./amdgpu-ls
Warning: could not read AMD Featuremask [[Errno 2] No such file or directory: '/sys/module/amdgpu/parameters/ppfeaturemask']
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 08:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0

Card Number: 1
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0a:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0

Card Number: 2
   Vendor: NVIDIA
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
   PCIe ID: 0b:00.0
   Driver: nvidiafb, nouveau, nvidia_drm, nvidia
   GPU Frequency/Voltage Control Type: Unsupported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Are you sure that is from extended branch?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

If you want to git clone of the repo, you will need to do git checkout extended after you clone while in the project directory. Then you can do a git pull for the latest.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

No, the page refreshed back to master and I didn't notice. Will test again.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 129.79, 46, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 85, 4, 283, 100, [N/A], 4, 2, P2', '']]
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 98, in main
    gpu_list.set_gpu_list(clinfo_flag=True)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1788, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1229, in read_gpu_sensor_set_nv
    mem_value = int(results[param_name]) if results[param_name].isnumeric else None
KeyError: 'mem_vram_total'

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Just pushed a fix.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024
./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 185.95, 56, N/A, 1980, 1980, 7199, 1830, 2160, 2160, 7000, 89, 22, 325, 100, [N/A], 4, 2, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 170.53, 45, N/A, 1965, 1965, 7199, 1815, 2160, 2160, 7000, 88, 57, 3728, 100, [N/A], 8, 3, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 106.01, 41, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 100, 4, 908, 100, [N/A], 8, 3, P2', '']]
Detected GPUs: NVIDIA: 3
NV command:
/usr/bin/nvidia-smi -i 08:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-089608fe-cba5-4711-bf68-085fd0711d8c, 202.65, 56, N/A, 1980, 1980, 7199, 1830, 2160, 2160, 7000, 89, 22, 325, 100, [N/A], 4, 2, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0a:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7979, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-22b2c6ac-2d49-4863-197c-9c469071178a, 170.53, 45, N/A, 1965, 1965, 7199, 1815, 2160, 2160, 7000, 88, 57, 3728, 100, [N/A], 8, 3, P2', '']]
NV command:
/usr/bin/nvidia-smi -i 0b:00.0 --query-gpu=power.limit,power.min_limit,power.max_limit,memory.total,vbios_version,driver_version,compute_mode,name,gpu_uuid,power.draw,temperature.gpu,temperature.memory,clocks.gr,clocks.sm,clocks.mem,clocks.video,clocks.max.gr,clocks.max.sm,clocks.max.mem,utilization.gpu,utilization.memory,memory.used,fan.speed,gom.current,pcie.link.width.current,pcie.link.gen.current,pstate --format=csv,noheader,nounits
NV query result: [['200.00, 105.00, 292.00, 7982, 90.04.23.00.5F, 440.64, Default, GeForce RTX 2080, GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d, 101.62, 41, N/A, 1995, 1995, 7199, 1845, 2160, 2160, 7000, 100, 4, 908, 100, [N/A], 8, 3, P2', '']]
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 08:00.0
      Link Speed: 2
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 202.700
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 89
   Current Memory Loading (%): 22
   Current VRAM Usage (%): 4.072
      Current VRAM Used (GB): 0.317
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 56.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0, 'clocks.video': 1830.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0a:00.0
      Link Speed: 3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 170.500
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 88
   Current Memory Loading (%): 57
   Current VRAM Usage (%): 46.723
      Current VRAM Used (GB): 3.641
      Total VRAM (GB): 7.792
   Current  Temps (C): {'temperature.gpu': 45.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0, 'clocks.video': 1815.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0b:00.0
      Link Speed: 3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 101.600
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 100
   Current Memory Loading (%): 4
   Current VRAM Usage (%): 11.376
      Current VRAM Used (GB): 0.887
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 41.0, 'temperature.memory': None}
   Critical Temps (C): None
   Current Voltages (V): None
      Vddc Range: ['', '']
   Current Clk Frequencies (MHz): {'clocks.gr': 1995.0, 'clocks.mem': 7199.0, 'clocks.sm': 1995.0, 'clocks.video': 1845.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
      SCLK Range: ['', '']
   Power Profile Mode: [N/A]
   Power DPM Force Performance Level: None

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Can you try amdgpu-monitor. If it looks good, then try with --gui option. Trying it with --plot option would be pushing it. You need to make sure you have loaded requirements as defined in the UsersGuide.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have just pushed a minor update to eliminate some of the not relevant ls output and tuned the monitor set of parameters.

To run with gui and plotting, you need several packages, including pandas, Gtk, and matplotlib. You can pip install to meet all requirements by running:

sudo -H pip3 install --no-cache-dir -r requirements.txt

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Looks like amdgpu-monitor needs some work.

┌─────────────┬────────────────┬────────────────┬────────────────┐
│Card #       │card0           │card1           │card2           │
├─────────────┼────────────────┼────────────────┼────────────────┤
│Model        │GeForce RTX 2080│GeForce RTX 2080│GeForce RTX 2080│
│GPU Load %   │100             │98              │96              │
│Mem Load %   │4               │48              │12              │
│VRAM Usage % │10.085          │14.989          │3.621           │
│GTT Usage %  │None            │None            │None            │
│Power (W)    │96.2            │165.8           │191.6           │
│Power Cap (W)│200.00          │200.00          │200.00          │
│Energy (kWh) │0.0             │0.0             │0.0             │
│T (C)        Traceback (most recent call last):
  File "./amdgpu-monitor", line 384, in <module>
    main()
  File "./amdgpu-monitor", line 373, in main
    com_gpu_list.print_table()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 2087, in print_table
    data_value_raw = gpu.get_params_value(table_item)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 604, in get_params_value
    return self.prm['temperatures'].keys()[0]
TypeError: 'dict_keys' object is not subscriptable

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I just pushed a fix. Can you also provide the ls output? I tuned the list of parameters displayed.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Looks like you went backwards for both.

 ./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Traceback (most recent call last):
  File "./amdgpu-ls", line 150, in <module>
    main()
  File "./amdgpu-ls", line 98, in main
    gpu_list.set_gpu_list(clinfo_flag=True)
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1798, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1274, in read_gpu_sensor_set_nv
    self.prm.fan_pwm = self.prm.fanspeed
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 75, in __getattr__
    raise AttributeError('No such attribute: ' + name)
AttributeError: No such attribute: fanspeed
keith@Serenity:~/Downloads/amdgpu-utils-extended$ ./amdgpu-monitor
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Traceback (most recent call last):
  File "./amdgpu-monitor", line 384, in <module>
    main()
  File "./amdgpu-monitor", line 289, in main
    gpu_list.set_gpu_list()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1798, in set_gpu_list
    self[gpu_uuid].read_gpu_sensor_set_nv()
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 1274, in read_gpu_sensor_set_nv
    self.prm.fan_pwm = self.prm.fanspeed
  File "/home/keith/Downloads/amdgpu-utils-extended/GPUmodules/GPUmodule.py", line 75, in __getattr__
    raise AttributeError('No such attribute: ' + name)
AttributeError: No such attribute: fanspeed

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Actually, I add new code in anticipation of issues in running plot. The key should have been fan_speed instead of fanspeed. Just pushed the fix.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

First the monitor since the session erased the ls output.

┌─────────────┬────────────────┬────────────────┬────────────────┐
│Card #       │card0           │card1           │card2           │
├─────────────┼────────────────┼────────────────┼────────────────┤
│Model        │GeForce RTX 2080│GeForce RTX 2080│GeForce RTX 2080│
│GPU Load %   │85              │98              │94              │
│Mem Load %   │16              │49              │4               │
│VRAM Usage % │3.821           │13.849          │8.394           │
│GTT Usage %  │None            │None            │None            │
│Power (W)    │173.2           │165.8           │101.3           │
│Power Cap (W)│200.00          │200.00          │200.00          │
│Energy (kWh) │0.001           │0.001           │0.0             │
│T (C)        │52.0            │40.0            │37.0            │
│VddGFX (mV)  │0               │0               │0               │
│Fan Spd (%)  │100.0           │100.0           │100.0           │
│Sclk (MHz)   │1980            │1980            │1995            │
│Sclk Pstate  │2               │2               │2               │
│Mclk (MHz)   │7199            │7199            │7199            │
│Mclk Pstate  │2               │2               │2               │
│Perf Mode    │[N/A]           │[N/A]           │[N/A]           │
└─────────────┴────────────────┴────────────────┴────────────────┘

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

And here is the ls output. Looks like everything works. Do you want the plot output next?

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 08:00.0
      Link Speed: GEN2
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 166.200
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100.000
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 84
   Current Memory Loading (%): 16
   Current VRAM Usage (%): 3.821
      Current VRAM Used (GB): 0.298
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 51.0, 'temperature.memory': None}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'clocks.gr': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0, 'clocks.video': 1830.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0a:00.0
      Link Speed: GEN3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 167.000
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100.000
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 98
   Current Memory Loading (%): 45
   Current VRAM Usage (%): 14.726
      Current VRAM Used (GB): 1.147
      Total VRAM (GB): 7.792
   Current  Temps (C): {'temperature.gpu': 40.0, 'temperature.memory': None}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'clocks.gr': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0, 'clocks.video': 1830.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   PCIe ID: 0b:00.0
      Link Speed: GEN3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Frequency/Voltage Control Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 102.300
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan Speed (rpm): 100.000
      Fan Speed Range (rpm): [None, None]
   ##################################################
   Current GPU Loading (%): 98
   Current Memory Loading (%): 4
   Current VRAM Usage (%): 11.376
      Current VRAM Used (GB): 0.887
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 37.0, 'temperature.memory': None}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'clocks.gr': 1995.0, 'clocks.mem': 7199.0, 'clocks.sm': 1995.0, 'clocks.video': 1845.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Forgot you wanted the --gui option too.
Screenshot from 2020-06-18 19-42-43

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Can you try amdgpu-monitor --plot

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

So here is your --plot image. Looks like everything works as far as I can tell.
Screenshot from 2020-06-18 19-59-16

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

It would be great if there was a way to get the Voltage values. Do you know if there is any other way to access them? If not, I will have to find a way to remove Vddgfx from the plot, since it makes the Y axis label impossible to read. In the meantime, pressing the VddGfx button will toggle the plot.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I just pushed an update with dynamic tick spacing for plot y-axes.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I just noticed that the power_cap line is not on your plot. Did you disable by pressing the PowerCap button or is it just missing? Looks like it is in the gui table, so I think there is no issue getting the data.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

No I did not push any buttons. Didn't know there were buttons to push. I just looked at the "pretty picture" and took a screenshot for you. So there must be something wrong in the plot code.

No, as far as I know or have read anywhere in the Linux universe, there is no way to access the Nvidia card voltages. Or at least for any hardware past Maxwell.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Maybe it is there but it isn't being shown because your vertical scaling ends at 195W while the power is limited to 200W. Was it just off the top of the charts?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

The upper limit is set by this code:

y1lim_max_val = 10*(ldf.loc[:, ['loading', 'power_cap', 'power', 'temp_val']].max().max() // 10) + 5

I guess there is a possibility it is cut off under certain values. I have pushed a version with the last 5 increased to 15. Can you give it a try?

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I don't think the PowerCap is enabled by default. When I restarted the plot it was not showing. Only when I pushed its button did it show up. Or it could have been hidden behind the MCLK trace.

The PowerCap trace of 200 is almost exactly underneath the MCLK trace of 7200(7199).
I toggled MCLK off for the PowerCap trace to be easily seen.
Screenshot from 2020-06-18 22-32-49

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Yes, it must have been underneath Mclk. Maybe I can offset the left axis plot line labels...

How does the usefulness of this tool compared to existing capability for NV GPUs?

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

It is definitely directly underneath the MCLK line. Look very closely.
Screenshot from 2020-06-18 22-40-34

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Yes, it must have been underneath Mclk. Maybe I can offset the left axis plot line labels...

How does the usefulness of this tool compared to existing capability for NV GPUs?

Ha ha ha LOL. WHAT!!!??? existing capability for NV GPUS?? There is none in the Linux world as far as I know, unless it is available in the developer-professional-scientific user space which I don't have any direct knowledge of.

Definitely none in the consumer - hobbyist space. Yours is the first. Congratz!

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Thanks! Now I just need to think of another name for the project...

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

OK, I dropped the memory OC on GPU1 for a bit to show the power cap trace running as it scaled below the power cap trace.

Screenshot from 2020-06-18 22-50-32

The only flaw is something you can't get around and that is identifying the cards. Unfortunately, nvidia-smi and Nvidia X Server Settings number the cards differently. Nvidia Settings used decimal and nvidia-smi used hexadecimal. The only way to absolutely identify the card is to use the GPU UUID value. If you used that to identify the card, there can be no confusion. I adjusted the Nvidia Settings GPU#0 cards memory clock and it showed up as the GPU #1 plot.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I am using the Card number from the pcie card directory. It is assigned by the system, not the driver. Perhaps I can add another card indicator in the plot title bar. The uuid is quite long, so would the pcie address be useful instead? That is what I use to read the cards.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Yes, the PCI ID would be useful. The problem arises every time you apply the coolbits tweak. It normally swaps the card positions between the card driving the display and one of other non-display cards. Depends on whether it's an Intel mobo or an AMD one which card gets swapped. Has to do with how the PCI slots are enumerated. On Intel the PCIE slots normally increment from slot closest to the cpu socket outward to the bottom of the board. But on AMD boards, the PCIe slots normally enumerate from bottom of the board toward the cpu socket.

Applying the coolbits tweak almost always catches people out when they reboot and are presented with a black display. They think they have broken the system and often just blow the installation away and needlessly start from scratch. What really happens is that the cards got renumbered and the display output got switched to one of of the other cards in the system. All you have to do is move the display cable to one of the other cards to find the display output. Or, before you apply the coolbits tweak you print out the /etc/X11/xorg.conf file and after you apply the tweak and BEFORE you reboot, you edit the xorg.conf file to swap the PCIE BusId's back to what they were originally. Problem solved before it begins.

But this then confuses Nvidia X Server Settings as to which cards are numbered GPU#0, GPU#1, GPU#2 etc.

As long as someone is familiar with the output of nvidia-smi and knows that that never changes and the BusID is in hex, that would work for labeling the cards in Plot.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have pushed a version that will include PCIe ID in plot titles and tuned parameters displayed for ls.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Hey Rick, explain what the energy value displays in the plot. Is it watt-hrs per task or something?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

It is the cumulative Energy in KWh since the utility started running. It is meant to be similar to the energy meter of your home. To make better use of it, I would typically import the log file from amdgpu-monitor --gui --log into excel for analysis.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I added units in the latest push to extended.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Ahhh, thanks for the explanation. Or better still, the example would be a Kill-a-Watt meter since that only measures the device load you plug into it, like a crunching host.
Thanks I will grab it now.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Also, can you post the latest from amdgpu-ls? I want to make sure I resolved all the issues in the displayed values.

Do you have or know anyone with both AMD and NV GPUs in a system? It would be cool to get a plot image showing both.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

On a aside. Have you ever played around with the Zenpower driver or Zen Monitor application?
I have it installed on both the 2920X Threadripper and the 3950X. Works great. But the TR installation shows some few niggles of issues because the developer doesn't have a TR to test with.

Latest thing I noticed was that the labels for Tdie and Tctrl are swapped for their respective values. I'm still trying to get the developers attention to look at the issue I opened.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

Tom M. from our GPUUG team would be the best choice. He has been experimenting running both AMD and Nvidia cards in his mining rigs.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
Vendor: NVIDIA
Readable: True
Writable: False
Compute: True
GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
Card Model: GeForce RTX 2080
Display Card Model: GeForce RTX 2080
PCIe ID: 08:00.0
Link Speed: GEN2
Link Width: 4
##################################################
Driver: 440.64
vBIOS Version: 90.04.23.00.5F
Compute Platform: OpenCL 1.2 CUDA
Compute Mode: Default
GPU Frequency/Voltage Control Type: Supported
HWmon: None
Card Path: /sys/class/drm/card0/device
System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
##################################################
Current Power (W): 168.670
Power Cap (W): 200.00
Power Cap Range (W): [105.0, 292.0]
Fan Target Speed (rpm): None
Current Fan PWM (%): 100.000
##################################################
Current GPU Loading (%): 85
Current Memory Loading (%): 16
Current VRAM Usage (%): 3.846
Current VRAM Used (GB): 0.300
Total VRAM (GB): 7.795
Current Temps (C): {'temperature.gpu': 49.0, 'temperature.memory': None}
Current Clk Frequencies (MHz): {'clocks.gr': 1995.0, 'clocks.mem': 7199.0, 'clocks.sm': 1995.0, 'clocks.video': 1845.0}
Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
Current SCLK P-State: [2, '']
Power Profile Mode: [N/A]

Card Number: 1
Vendor: NVIDIA
Readable: True
Writable: False
Compute: True
GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
Card Model: GeForce RTX 2080
Display Card Model: GeForce RTX 2080
PCIe ID: 0a:00.0
Link Speed: GEN3
Link Width: 8
##################################################
Driver: 440.64
vBIOS Version: 90.04.23.00.5F
Compute Platform: OpenCL 1.2 CUDA
Compute Mode: Default
GPU Frequency/Voltage Control Type: Supported
HWmon: None
Card Path: /sys/class/drm/card1/device
System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
##################################################
Current Power (W): 157.880
Power Cap (W): 200.00
Power Cap Range (W): [105.0, 292.0]
Fan Target Speed (rpm): None
Current Fan PWM (%): 100.000
##################################################
Current GPU Loading (%): 98
Current Memory Loading (%): 44
Current VRAM Usage (%): 16.318
Current VRAM Used (GB): 1.271
Total VRAM (GB): 7.792
Current Temps (C): {'temperature.gpu': 41.0, 'temperature.memory': None}
Current Clk Frequencies (MHz): {'clocks.gr': 1980.0, 'clocks.mem': 7199.0, 'clocks.sm': 1980.0, 'clocks.video': 1830.0}
Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
Current SCLK P-State: [2, '']
Power Profile Mode: [N/A]

Card Number: 2
Vendor: NVIDIA
Readable: True
Writable: False
Compute: True
GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
Card Model: GeForce RTX 2080
Display Card Model: GeForce RTX 2080
PCIe ID: 0b:00.0
Link Speed: GEN3
Link Width: 8
##################################################
Driver: 440.64
vBIOS Version: 90.04.23.00.5F
Compute Platform: OpenCL 1.2 CUDA
Compute Mode: Default
GPU Frequency/Voltage Control Type: Supported
HWmon: None
Card Path: /sys/class/drm/card2/device
System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
##################################################
Current Power (W): 198.850
Power Cap (W): 200.00
Power Cap Range (W): [105.0, 292.0]
Fan Target Speed (rpm): None
Current Fan PWM (%): 100.000
##################################################
Current GPU Loading (%): 98
Current Memory Loading (%): 27
Current VRAM Usage (%): 4.097
Current VRAM Used (GB): 0.319
Total VRAM (GB): 7.795
Current Temps (C): {'temperature.gpu': 47.0, 'temperature.memory': None}
Current Clk Frequencies (MHz): {'clocks.gr': 1920.0, 'clocks.mem': 7199.0, 'clocks.sm': 1920.0, 'clocks.video': 1785.0}
Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
Current SCLK P-State: [2, '']
Power Profile Mode: [N/A]

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I will reach out to Tom to get him to try it. Also need to find someone with an Intel GPU used for compute.

I just checked out ZenPower. I am building a 3990X system for my daughter. Maybe I can get her to take on the challenge of developing a Python version of the app.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

What I like about it that the telemetry comes directly from the cpu. That is the SVI2 TFN data that the Windows utilities like hwinfo use. No mobo SIO chip involved. So very accurate. I have the out of tree asus-wmi-sensors driver on this 3950X host that gets everything else since my mobo is one of the few that ASUS developed an WMI BIOS for. The WMI values are exported. Much better than the old of of tree it87 driver that Guenter Roeck developed back in the beginning.
Screenshot from 2020-06-20 18-05-45

keith@Serenity:~$ sensors
asuswmisensors-isa-0000
Adapter: ISA adapter
CPU Core Voltage:          1.22 V  
CPU SOC Voltage:           1.07 V  
DRAM Voltage:              1.42 V  
VDDP Voltage:            589.00 mV 
1.8V PLL Voltage:          2.14 V  
+12V Voltage:             11.50 V  
+5V Voltage:               4.74 V  
3VSB Voltage:              3.33 V  
VBAT Voltage:              3.18 V  
AVCC3 Voltage:             3.33 V  
SB 1.05V Voltage:          1.08 V  
CPU Core Voltage:          1.24 V  
CPU SOC Voltage:           1.09 V  
DRAM Voltage:              1.47 V  
CPU Fan:                 1934 RPM
Chassis Fan 1:              0 RPM
Chassis Fan 2:              0 RPM
Chassis Fan 3:              0 RPM
HAMP Fan:                   0 RPM
Water Pump:                 0 RPM
CPU OPT:                    0 RPM
Water Flow:                 0 RPM
AIO Pump:                   0 RPM
CPU Temperature:          +71.0°C  
CPU Socket Temperature:   +52.0°C  
Motherboard Temperature:  +31.0°C  
Chipset Temperature:      +50.0°C  
Tsensor 1 Temperature:   +216.0°C  
CPU VRM Temperature:      +63.0°C  
Water In:                +216.0°C  
Water Out:                +33.0°C  
CPU VRM Output Current:   90.00 A  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.24 V  
SVI2_SoC:      1.09 V  
Tdie:         +72.0°C  (high = +95.0°C)
Tctl:         +72.0°C  
Tccd1:        +72.0°C  
Tccd2:        +69.2°C  
SVI2_P_Core:  84.42 W  
SVI2_P_SoC:   18.03 W  
SVI2_C_Core:  67.86 A  
SVI2_C_SoC:   16.48 A  

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have added card index and serial number to ls output. Please try the latest on extended for verification.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I don't see the S/N. But I have never seen the S/N reported in any software before. Only place I have ever seen the S/N is on a sticker on the card.

Also I am not sure what the card index is supposed to represent. It looks like it is linked to Card #. If it is supposed to agree with what Nvidia X Server Settings labels the cards, it is wrong.
Card0 and Card1 are swapped PCI BusID's. This is the issue I wrote about previously that is caused by enabling the coolbits tweak.

./amdgpu-ls
OS command [nvidia-smi] executable found: [/usr/bin/nvidia-smi]
Detected GPUs: NVIDIA: 3
3 total GPUs, 0 rw, 3 r-only, 0 w-only

Card Number: 0
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-089608fe-cba5-4711-bf68-085fd0711d8c
   GPU S/N: [N/A]
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   Card Index: 0
   PCIe ID: 08:00.0
      Link Speed: GEN2
      Link Width: 4
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.2/0000:03:04.0/0000:08:00.0
   ##################################################
   Current Power (W): 100.700
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan PWM (%): 100.000
   ##################################################
   Current GPU Loading (%): 100
   Current Memory Loading (%): 4
   Current VRAM Usage (%): 12.791
      Current VRAM Used (GB): 0.997
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 42.0, 'temperature.memory': None}
   Current Clk Frequencies (MHz): {'clocks.gr': 2010.0, 'clocks.mem': 7199.0, 'clocks.sm': 2010.0, 'clocks.video': 1860.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

Card Number: 1
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-22b2c6ac-2d49-4863-197c-9c469071178a
   GPU S/N: [N/A]
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   Card Index: 1
   PCIe ID: 0a:00.0
      Link Speed: GEN3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card1/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0
   ##################################################
   Current Power (W): 174.520
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan PWM (%): 100.000
   ##################################################
   Current GPU Loading (%): 87
   Current Memory Loading (%): 62
   Current VRAM Usage (%): 47.136
      Current VRAM Used (GB): 3.673
      Total VRAM (GB): 7.792
   Current  Temps (C): {'temperature.gpu': 41.0, 'temperature.memory': None}
   Current Clk Frequencies (MHz): {'clocks.gr': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0, 'clocks.video': 1815.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

Card Number: 2
   Vendor: NVIDIA
   Readable: True
   Writable: False
   Compute: True
   GPU UID: GPU-6933bcb7-7072-2181-a92f-0e5c42c3fa0d
   GPU S/N: [N/A]
   Device ID: {'device': '0x1e87', 'subsystem_device': '0x2184', 'subsystem_vendor': '0x3842', 'vendor': '0x10de'}
   Decoded Device ID: TU104 [GeForce RTX 2080 Rev. A]
   Card Model: GeForce RTX 2080
   Display Card Model: GeForce RTX 2080
   Card Index: 2
   PCIe ID: 0b:00.0
      Link Speed: GEN3
      Link Width: 8
   ##################################################
   Driver: 440.64
   vBIOS Version: 90.04.23.00.5F
   Compute Platform: OpenCL 1.2 CUDA
   Compute Mode: Default
   GPU Type: Supported
   HWmon: None
   Card Path: /sys/class/drm/card2/device
   System Card Path: /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0
   ##################################################
   Current Power (W): 201.270
   Power Cap (W): 200.00
      Power Cap Range (W): [105.0, 292.0]
   Fan Target Speed (rpm): None
   Current Fan PWM (%): 100.000
   ##################################################
   Current GPU Loading (%): 96
   Current Memory Loading (%): 14
   Current VRAM Usage (%): 3.646
      Current VRAM Used (GB): 0.284
      Total VRAM (GB): 7.795
   Current  Temps (C): {'temperature.gpu': 47.0, 'temperature.memory': None}
   Current Clk Frequencies (MHz): {'clocks.gr': 1965.0, 'clocks.mem': 7199.0, 'clocks.sm': 1965.0, 'clocks.video': 1815.0}
   Maximum Clk Frequencies (MHz): {'clocks.max.gr': 2160.0, 'clocks.max.mem': 7000.0, 'clocks.max.sm': 2160.0}
   Current SCLK P-State: [2, '']
   Power Profile Mode: [N/A]

Nvidia X Server Settings labels my cards as:
GPU0 = PCI BusId = PCI:10:0:0
GPU1 = PCI BusID = PCI:8:0:0
GPU2 = PCI BusID = PCI:11:0:0

This the same ordering that BOINC lists the cards in the startup of the Event Log also.
It also matches what card is carrying which task load

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Looks like card index is the same as the card number from the device directory. I was hoping it would align with what you see with X Server Setting labels. I will need to stick with the system device number since it is common method across all vendors. I will leave both the index and serial numbers in as it looks like they are there for some capability.

from gpu-utils.

KeithMyers avatar KeithMyers commented on August 28, 2024

I was reading through the readme and saw the disclaimer that the utilities can only write to AMD cards. Would you consider adding that function to Nvidia cards?
There are two Nvidia applications that are used to write to Nvidia cards. One of them, nvidia-smi you already know about and use. You can set power levels with nvidia-smi for example.

The other Nvidia utility that can write to their cards is nvidia-settings. It is included with the driver package also like nvidia-smi.
Nvidia-settings can be used to control fans and core clocks and memory clocks. I don't know if you realized that was possible in Linux. This is an example of how I set up my cards on my daily driver. It is simply a bash file that I run every time I boot.

#!/bin/bash

nvidia-smi -i 0 -pl 200
nvidia-smi -i 1 -pl 200
nvidia-smi -i 2 -pl 200

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=100"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=40"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=40"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=40"

I first limit the power level to 200 W down from stock 225W or my 2080's. Then I set all the fans on the cards to 100% fan speed. Then I increase the memory clocks by 800Mhz and the core clocks to 40Mhz.

The reason I increase the memory clocks is that all Nvidia consumer cards above entry level are hamstrung and penalized by the Nvidia drivers to run in P2 power state when the driver detects the card running a compute load. So I add the 800Mhz of memory clock on top of the default penalized P2 memory clock to get the card running at its normal P0 power state clocks if the card was simply driving a gaming graphics load. The additional 40Mhz of core clocks is really just a token bump. The card will already run at its highest core clock based on the thermal and power limit headroom the card has and is already managed by the default firmware GPU Boost 3.0 algorithm running.

I am just providing some examples of the possibility of controlling and writing parameter changes to Nvidia cards. Not actually requesting you add those functions. Up to you to decide if you want to tackle that.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

It looks like it will be possible to include NV in gpu-pac, but I will delay work on this until after release of the package with the name changed.

from gpu-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.