GithubHelp home page GithubHelp logo

Comments (22)

susadmin avatar susadmin commented on September 22, 2024 2

Hi, Just to let you know, I added NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 to kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild as you suggested and it is working fine with NVIDIA-GRID-Linux-KVM-525.60.12.

Thanks 0:)

from vgpu_unlock-rs.

susadmin avatar susadmin commented on September 22, 2024 1

Hi, it's some time since I visited this issue as the need has gone away for me. I originally wanted this feature to enable live snapshots on my Proxmox host. Some time ago, I upgraded my drivers to 525.85.07 and everything seems very stable (using the patched driver from Polloloco). Hope that helps.

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

mmm, it seem that the option has been removed in 510.85.03,

    if (event_type == NV_VFIO_VGPU_EVENT_MIGRATION_STATE) {
#if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
        vgpu_dev->migration_enabled = NV_TRUE;
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
                        "vGPU migration enabled with v3.2 Kernel UAPI\n");
#else
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
                        "vGPU migration disabled\n");
#endif

but exist in previous 460.73.01.

  #if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
        vgpu_dev->migration_enabled = NV_TRUE;
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
                        "vGPU migration enabled with v3.2 Kernel UAPI\n");
  #elif defined(NV_KVM_MIGRATION_UAPI)
        vgpu_dev->migration_enabled = NV_TRUE;
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
                        "vGPU migration enabled with upstreamed Kernel UAPI\n");
  #else
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev->mdev,
                        "vGPU migration disabled\n");

with last version, with default compilation,
kernel send a

[nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000101: vGPU migration disabled

but vgpu-mgr service show

notice: vmiop_log: (0x0): vGPU migration enabled

So, I'm don't known if it's enable or not. (kernel log seem to said that old path start_pfn is not used).

but when I'm trying to live migrate a qemu 6.2 machine with vfio x-enable-migration=on

: -device vfio-pci,x-enable-migration=on,sysfsdev=/sys/bus/pci/devices/0000:02:00.0/00000000-0000-0000-0000-000000000101,id=hostpci0,bus=pci.0,addr=0x10
``

I got

```VM 101 qmp command 'migrate' failed - VFIO device doesn't support migration"```

from vgpu_unlock-rs.

mbilker avatar mbilker commented on September 22, 2024

Ahh, so it was removed. That kind of sucks, but it was to be expected.

from vgpu_unlock-rs.

mbilker avatar mbilker commented on September 22, 2024

Ok, NVIDIA documented how the NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN flag is set in conftest.sh. It requires specific support to be in the kernel for it to work. It appears NVIDIA redid their approach to be based off what they sent upstream for inclusion into the mainline Linux kernel.

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

ok thanks !

I can get live migration working with qemu 6.2/7.0 with NV_KVM_MIGRATION_UAPI with old driver.

I'll try with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN with newer driver to see if it's working fine too with kernel 5.15.

(I'm currently working to add live vgpu support to proxmox hypervisor)

from vgpu_unlock-rs.

theonlyfoxy avatar theonlyfoxy commented on September 22, 2024

ok thanks !

I can get live migration working with qemu 6.2/7.0 with NV_KVM_MIGRATION_UAPI with old driver.

I'll try with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN with newer driver to see if it's working fine too with kernel 5.15.

(I'm currently working to add live vgpu support to proxmox hypervisor)

Hi. Could you make it work with 510 driver?

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

NV_KVM_MIGRATION_UAPI is still present in 510.73.06

https://github.com/VGPU-Community-Drivers/NV-VGPU-Driver/releases/download/1.0.1/NVIDIA-Linux-x86_64-510.73.06-vgpu-kvm.run

but seem to be removed from 510.85.03

https://github.com/VGPU-Community-Drivers/NV-VGPU-Driver/releases/download/1.0.2/NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run

(I don't known if they have forget to add support to this version ? I have tried with NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN, but it's not compatible with current kernel/qemu implementation)

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

@theonlyfoxy

I don't have tested live migration with 510.73.06 yet.

from vgpu_unlock-rs.

theonlyfoxy avatar theonlyfoxy commented on September 22, 2024

@theonlyfoxy

I don't have tested live migration with 510.73.06 yet.

I have tried it and after saving the vm, When i try to loadvm I get the following error:
qemu-system-x86_64: vfio_region_read(60b0a8e4-1696-4668-badf-9e43a8b34d9d:region0+0x9410, 4) failed: Bad address

not sure if the issue is with driver version tho.

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

@theonlyfoxy
I don't have tested live migration with 510.73.06 yet.

I have tried it and after saving the vm, When i try to loadvm I get the following error: qemu-system-x86_64: vfio_region_read(60b0a8e4-1696-4668-badf-9e43a8b34d9d:region0+0x9410, 4) failed: Bad address

not sure if the issue is with driver version tho.

mmm, so it's seem to be enabled. (or migration whould have hanged with another error message like" vfio migration not enabled...don't remember exactly'. I never have seen this one. I'll try to test it next week.

from vgpu_unlock-rs.

susadmin avatar susadmin commented on September 22, 2024

Hi, just wondering if there is any update on this? It would great to get it working with the new 15.0 (525.60.12) patch.

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

Hi,
I have looked at 525.60.12,

NV_KVM_MIGRATION_UAPI (missing from last version), has been replaced by NV_VFIO_DEVICE_MIG_STATE_PRESENT

    if (event_type == NV_VFIO_VGPU_EVENT_MIGRATION_STATE) {
#if defined(NV_VFIO_DEVICE_MIGRATION_HAS_START_PFN)
        vgpu_dev->migration_enabled = NV_TRUE;
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
                        "vGPU migration enabled with v3.2 Kernel UAPI\n");
#elif defined(NV_VFIO_DEVICE_MIG_STATE_PRESENT)
        vgpu_dev->migration_enabled = NV_TRUE;
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
                        "vGPU migration enabled with upstream V2 migration protocol\n");
#else
        NV_VGPU_DEV_LOG(VGPU_ERR, vgpu_dev,
                        "vGPU migration disabled\n");
#endif
    }

from vgpu_unlock-rs.

susadmin avatar susadmin commented on September 22, 2024

Thanks very much for this. So what would be the right way way to compile this?
Is it as easy as adding: NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 to kernel/conftest.sh?

Thanks for all your help.

from vgpu_unlock-rs.

aderumier avatar aderumier commented on September 22, 2024

I don't have tested it yet, but with previous NV_KVM_MIGRATION_UAPI,
it was working with adding in

kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild

NV_KVM_MIGRATION_UAPI=1

(tested on proxmox with a live migration)

So, it should be the same. (maybe it's working too in conftest.sh).

from vgpu_unlock-rs.

CornHead764 avatar CornHead764 commented on September 22, 2024

@susadmin

Was there anything else you needed to do recompile nvidia-vgpu-vfio with NV_VFIO_DEVICE_MIG_STATE_PRESENT=1 besides just putting it in the kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild file?

I'm trying to replicate this setup with NVIDIA-Linux-x86_64-525.60.12-vgpu-kvm on the latest Proxmox build, and no matter what I try, the kernel driver just doesn't seem to take NV_VFIO_DEVICE_MIG_STATE_PRESENT into consideration when building. dmesg still shows vGPU migration disabled.

If all you did was add that line to the Kbuild file, would you mind posting a copy/screenshot of the edit? This feels like it should be a simple change, but it just doesn't want to cooperate.

from vgpu_unlock-rs.

yubo-li avatar yubo-li commented on September 22, 2024

@CornHead764 Have you solved the problem yet?
I'm also trying to replicate this but get the same issue. I found in conftest.sh that NV_VFIO_DEVICE_MIG_STATE_PRESENT requires a kernel patch in v5.18, but I don't have environment on hand. This could be the reason that NV_VFIO_DEVICE_MIG_STATE_PRESENT does not take effect.

        vfio_device_mig_state)
            #
            # Determine if vfio_device_mig_state enum is present or not
            #
            # Added by commit 115dcec65f61d ("vfio: Define device
            # migration protocol v2") in v5.18
            #
            CODE="
            #include <linux/pci.h>
            #include <linux/vfio.h>
            enum vfio_device_mig_state device_state;
            "

            compile_check_conftest "$CODE" "NV_VFIO_DEVICE_MIG_STATE_PRESENT" "" "types"

from vgpu_unlock-rs.

CornHead764 avatar CornHead764 commented on September 22, 2024

@yubo-li

Unfortunately I have not. I lost track of this project as other things came up. Best of luck, and I'll be keeping an eye on this thread for when I get back to it!

from vgpu_unlock-rs.

yubo-li avatar yubo-li commented on September 22, 2024

Thanks @CornHead764 , will keep updating here if I get any progress.

from vgpu_unlock-rs.

yubo-li avatar yubo-li commented on September 22, 2024

@susadmin Have you succeeded in vGPU live migration with NVIDIA-GRID-Linux-KVM-525.60.12? Could you share your linux kernel version and card name (Tesla A100, etc.) for me? Thanks a lot!

from vgpu_unlock-rs.

yubo-li avatar yubo-li commented on September 22, 2024

Thanks for your helpful reply!

from vgpu_unlock-rs.

mbilker avatar mbilker commented on September 22, 2024

I am closing this because the migration API is more feature complete in newer vGPU KVM host versions and there is better upstream support in Linux. Please do not hesitate to comment on this issue if there are still issues.

from vgpu_unlock-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.