Comments (22)
Hi, Just to let you know, I added NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 to kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild as you suggested and it is working fine with NVIDIA-GRID-Linux-KVM-525.60.12.
Thanks 0:)
from vgpu_unlock-rs.
Hi, it's some time since I visited this issue as the need has gone away for me. I originally wanted this feature to enable live snapshots on my Proxmox host. Some time ago, I upgraded my drivers to 525.85.07 and everything seems very stable (using the patched driver from Polloloco). Hope that helps.
from vgpu_unlock-rs.
mmm, it seem that the option has been removed in 510.85.03,
if (event_type == NV_VFIO_VGPU_EVENT_MIGRATION_STATE) {
#if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
vgpu_dev->migration_enabled = NV_TRUE;
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
"vGPU migration enabled with v3.2 Kernel UAPI\n");
#else
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
"vGPU migration disabled\n");
#endif
but exist in previous 460.73.01.
#if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
vgpu_dev->migration_enabled = NV_TRUE;
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
"vGPU migration enabled with v3.2 Kernel UAPI\n");
#elif defined(NV_KVM_MIGRATION_UAPI)
vgpu_dev->migration_enabled = NV_TRUE;
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
"vGPU migration enabled with upstreamed Kernel UAPI\n");
#else
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
"vGPU migration disabled\n");
with last version, with default compilation,
kernel send a
[nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000101: vGPU migration disabled
but vgpu-mgr service show
notice: vmiop_log: (0x0): vGPU migration enabled
So, I'm don't known if it's enable or not. (kernel log seem to said that old path start_pfn is not used).
but when I'm trying to live migrate a qemu 6.2 machine with vfio x-enable-migration=on
: -device vfio-pci,x-enable-migration=on,sysfsdev=/sys/bus/pci/devices/0000:02:00.0/00000000-0000-0000-0000-000000000101,id=hostpci0,bus=pci.0,addr=0x10
``
I got
```VM 101 qmp command 'migrate' failed - VFIO device doesn't support migration"```
from vgpu_unlock-rs.
Ahh, so it was removed. That kind of sucks, but it was to be expected.
from vgpu_unlock-rs.
Ok, NVIDIA documented how the NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN
flag is set in conftest.sh
. It requires specific support to be in the kernel for it to work. It appears NVIDIA redid their approach to be based off what they sent upstream for inclusion into the mainline Linux kernel.
from vgpu_unlock-rs.
ok thanks !
I can get live migration working with qemu 6.2/7.0 with NV_KVM_MIGRATION_UAPI with old driver.
I'll try with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN with newer driver to see if it's working fine too with kernel 5.15.
(I'm currently working to add live vgpu support to proxmox hypervisor)
from vgpu_unlock-rs.
ok thanks !
I can get live migration working with qemu 6.2/7.0 with NV_KVM_MIGRATION_UAPI with old driver.
I'll try with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN with newer driver to see if it's working fine too with kernel 5.15.
(I'm currently working to add live vgpu support to proxmox hypervisor)
Hi. Could you make it work with 510 driver?
from vgpu_unlock-rs.
NV_KVM_MIGRATION_UAPI is still present in 510.73.06
but seem to be removed from 510.85.03
(I don't known if they have forget to add support to this version ? I have tried with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN, but it's not compatible with current kernel/qemu implementation)
from vgpu_unlock-rs.
I don't have tested live migration with 510.73.06 yet.
from vgpu_unlock-rs.
I don't have tested live migration with 510.73.06 yet.
I have tried it and after saving the vm, When i try to loadvm I get the following error:
qemu-system-x86_64: vfio_region_read(60b0a8e4-1696-4668-badf-9e43a8b34d9d:region0+0x9410, 4) failed: Bad address
not sure if the issue is with driver version tho.
from vgpu_unlock-rs.
@theonlyfoxy
I don't have tested live migration with 510.73.06 yet.I have tried it and after saving the vm, When i try to loadvm I get the following error: qemu-system-x86_64: vfio_region_read(60b0a8e4-1696-4668-badf-9e43a8b34d9d:region0+0x9410, 4) failed: Bad address
not sure if the issue is with driver version tho.
mmm, so it's seem to be enabled. (or migration whould have hanged with another error message like" vfio migration not enabled...don't remember exactly'. I never have seen this one. I'll try to test it next week.
from vgpu_unlock-rs.
Hi, just wondering if there is any update on this? It would great to get it working with the new 15.0 (525.60.12) patch.
from vgpu_unlock-rs.
Hi,
I have looked at 525.60.12,
NV_KVM_MIGRATION_UAPI (missing from last version), has been replaced by NV_VFIO_DEVICE_MIG_STATE_PRESENT
if (event_type == NV_VFIO_VGPU_EVENT_MIGRATION_STATE) {
#if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
vgpu_dev->migration_enabled = NV_TRUE;
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
"vGPU migration enabled with v3.2 Kernel UAPI\n");
#elif defined(NV_VFIO_DEVICE_MIG_STATE_PRESENT)
vgpu_dev->migration_enabled = NV_TRUE;
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
"vGPU migration enabled with upstream V2 migration protocol\n");
#else
NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
"vGPU migration disabled\n");
#endif
}
from vgpu_unlock-rs.
Thanks very much for this. So what would be the right way way to compile this?
Is it as easy as adding: NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 to kernel/conftest.sh?
Thanks for all your help.
from vgpu_unlock-rs.
I don't have tested it yet, but with previous NV_KVM_MIGRATION_UAPI,
it was working with adding in
kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild
NV_KVM_MIGRATION_UAPI=1
(tested on proxmox with a live migration)
So, it should be the same. (maybe it's working too in conftest.sh).
from vgpu_unlock-rs.
Was there anything else you needed to do recompile nvidia-vgpu-vfio with NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 besides just putting it in the kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild file?
I'm trying to replicate this setup with NVIDIA-Linux-x86_64-525.60.12-vgpu-kvm on the latest Proxmox build, and no matter what I try, the kernel driver just doesn't seem to take NV_VFIO_DEVICE_MIG_STATE_PRESENT into consideration when building. dmesg still shows vGPU migration disabled.
If all you did was add that line to the Kbuild file, would you mind posting a copy/screenshot of the edit? This feels like it should be a simple change, but it just doesn't want to cooperate.
from vgpu_unlock-rs.
@CornHead764 Have you solved the problem yet?
I'm also trying to replicate this but get the same issue. I found in conftest.sh that NV_VFIO_DEVICE_MIG_STATE_PRESENT requires a kernel patch in v5.18, but I don't have environment on hand. This could be the reason that NV_VFIO_DEVICE_MIG_STATE_PRESENT does not take effect.
vfio_device_mig_state)
#
# Determine if vfio_device_mig_state enum is present or not
#
# Added by commit 115dcec65f61d ("vfio: Define device
# migration protocol v2") in v5.18
#
CODE="
#include <linux/pci.h>
#include <linux/vfio.h>
enum vfio_device_mig_state device_state;
"
compile_check_conftest "$CODE" "NV_VFIO_DEVICE_MIG_STATE_PRESENT" "" "types"
from vgpu_unlock-rs.
Unfortunately I have not. I lost track of this project as other things came up. Best of luck, and I'll be keeping an eye on this thread for when I get back to it!
from vgpu_unlock-rs.
Thanks @CornHead764 , will keep updating here if I get any progress.
from vgpu_unlock-rs.
@susadmin Have you succeeded in vGPU live migration with NVIDIA-GRID-Linux-KVM-525.60.12? Could you share your linux kernel version and card name (Tesla A100, etc.) for me? Thanks a lot!
from vgpu_unlock-rs.
Thanks for your helpful reply!
from vgpu_unlock-rs.
I am closing this because the migration API is more feature complete in newer vGPU KVM host versions and there is better upstream support in Linux. Please do not hesitate to comment on this issue if there are still issues.
from vgpu_unlock-rs.
Related Issues (20)
- Same issue as #5 - Fails to start VM HOT 21
- use of unstable library feature 'renamed_spin_loop' Error. HOT 4
- mdev specific nvidia profile overrides HOT 5
- Changing the name of the graphics card does not take effect? HOT 2
- The card_name parameter is invalid, the virtual graphics card name will still be displayed HOT 3
- Can we support RTX 30 Series (Ampere, GA102) ? HOT 3
- unable to compile --noob issue HOT 2
- vGPU License HOT 2
- Unable to install driver on Debian HOT 1
- CUDA in win/linux guest not available when device override used HOT 4
- Unable to override with two mdev profiles within one VM HOT 1
- have anyone gotten rtx 30 series card to work yet? HOT 2
- Failed to override nvidia profile HOT 2
- RTX 2060 In proxmox 7.3 not work HOT 3
- VM won't start after shutdown: error allocating framebuffer HOT 3
- nvidia-vgpud.service wont activate HOT 4
- [PROMOX][OVERRIDE]mdev uuid to vmid fail HOT 1
- VM will not start and nvidia-vgpu-mgr spits errors HOT 1
- Kernel NULL pointer dereference HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vgpu_unlock-rs.