GithubHelp home page GithubHelp logo

selkies-project / docker-nvidia-glx-desktop Goto Github PK

View Code? Open in Web Editor NEW
258.0 13.0 58.0 299 KB

KDE Plasma Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open-source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, not requiring /tmp/.X11-unix host sockets or host configuration.

Home Page: https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

License: Mozilla Public License 2.0

Dockerfile 69.67% Shell 30.33%
nvidia-docker nvidia docker-image docker html5 opengl ubuntu gpu vulkan kubernetes

docker-nvidia-glx-desktop's Introduction

docker-nvidia-glx-desktop

KDE Plasma Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open-source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, not requiring /tmp/.X11-unix host sockets or host configuration.

Use docker-nvidia-egl-desktop for a KDE Plasma Desktop container which directly accesses NVIDIA (and unofficially Intel and AMD) GPUs without using an X11 Server, supports sharing a GPU with many containers, and automatically falling back to software acceleration in the absence of GPUs (but with limited graphics performance).

Read the Troubleshooting section first before raising an issue. Support is also available with the Selkies Discord. Please redirect issues or discussions regarding the selkies-gstreamer WebRTC HTML5 interface to the project.

Usage

Container startup could take some time at first launch as it automatically installs NVIDIA drivers compatible with the host.

Wine, Winetricks, Lutris, and PlayOnLinux are bundled by default. Comment out the section where it is installed within Dockerfile if the user wants to remove them from the container.

There are two web interfaces that can be chosen in this container, the first being the default selkies-gstreamer WebRTC HTML5 interface (requires a TURN server or host networking), and the second being the fallback noVNC WebSocket HTML5 interface. While the noVNC interface does not support audio forwarding and remote cursors for gaming, it can be useful for troubleshooting the selkies-gstreamer WebRTC interface or using this container with low bandwidth environments.

The noVNC interface can be enabled by setting NOVNC_ENABLE to true. When using the noVNC interface, all environment variables related to the selkies-gstreamer WebRTC interface are ignored, with the exception of BASIC_AUTH_PASSWORD. As with the selkies-gstreamer WebRTC interface, the noVNC interface password will be set to BASIC_AUTH_PASSWORD, and uses PASSWD by default if not set. The noVNC interface also additionally accepts the NOVNC_VIEWPASS environment variable, where a view only password with only the ability to observe the desktop without controlling can also be set.

The container requires host NVIDIA GPU driver versions of at least 450.80.02 and preferably 470.42.01, with the NVIDIA Container Toolkit to be also configured on the host for allocating GPUs. All Maxwell or later generation GPUs in the consumer, professional, or datacenter lineups will not have significant issues running this container, although the selkies-gstreamer high-performance NVENC backend may not be available (see the next paragraph). Kepler GPUs are untested and likely does not support the NVENC backend, but can be mostly functional using fallback software acceleration.

The high-performance NVENC backend for the selkies-gstreamer WebRTC interface is only supported in GPUs listed as supporting H.264 (AVCHD) under the NVENC - Encoding section of NVIDIA's Video Encode and Decode GPU Support Matrix. If your GPU is not listed as supporting H.264 (AVCHD), add the environment variable WEBRTC_ENCODER with the value x264enc, vp8enc, or vp9enc in your container configuration for falling back to software acceleration, which also has a very good performance depending on your CPU.

The username is user in both the container user account and the web authentication prompt. The environment variable PASSWD is the password of the container user account, and BASIC_AUTH_PASSWORD is the password for the HTML5 interface authentication prompt. If ENABLE_BASIC_AUTH is set to true for selkies-gstreamer (not required for noVNC) but BASIC_AUTH_PASSWORD is unspecified, the HTML5 interface password will default to PASSWD.

NOTES: Only one web browser can be connected at a time with the selkies-gstreamer WebRTC interface. If the signaling connection works, but the WebRTC connection fails, read the Using a TURN Server section.

Running with Docker

  1. Run the container with Docker (or other similar container CLIs like Podman):
docker run --gpus 1 -it --tmpfs /dev/shm:rw -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -p 8080:8080 ghcr.io/selkies-project/nvidia-glx-desktop:latest

NOTES: The container tags available are latest and 22.04 for Ubuntu 22.04, and 20.04 for Ubuntu 20.04. Persistent container tags are available in the form 22.04-20210101010101. Replace all instances of mypasswd with your desired password. BASIC_AUTH_PASSWORD will default to PASSWD if unspecified. The container must not be run in privileged mode.

Change WEBRTC_ENCODER to x264enc, vp8enc, or vp9enc when using the selkies-gstreamer interface if your GPU does not support H.264 (AVCHD) under the NVENC - Encoding section in NVIDIA's Video Encode and Decode GPU Support Matrix.

  1. Connect to the web server with a browser on port 8080. You may also separately configure a reverse proxy to this port for external connectivity.

NOTES: Additional configurations and environment variables for the selkies-gstreamer WebRTC HTML5 interface are listed in lines that start with parser.add_argument within the selkies-gstreamer main script.

  1. (Not Applicable for noVNC) Read carefully if the selkies-gstreamer WebRTC HTML5 interface does not connect. Choose whether to use host networking or a TURN server. The selkies-gstreamer WebRTC HTML5 interface will likely just start working if you add --network host to the above docker run command. However, this may be restricted or be undesired because of security reasons. If so, check if the container starts working after omitting --network host. If it does not work, you need a TURN server. Read the Using a TURN Server section and add the environment variables -e TURN_HOST=, -e TURN_PORT=, and pick one of -e TURN_SHARED_SECRET= or both -e TURN_USERNAME= and -e TURN_PASSWORD= environment variables to the docker run command based on your authentication method.

Running with Kubernetes

  1. Create the Kubernetes Secret with your authentication password:
kubectl create secret generic my-pass --from-literal=my-pass=YOUR_PASSWORD

NOTES: Replace YOUR_PASSWORD with your desired password, and change the name my-pass to your preferred name of the Kubernetes secret with the xgl.yml file changed accordingly as well. It is possible to skip the first step and directly provide the password with value: in xgl.yml, but this exposes the password in plain text.

  1. Create the pod after editing the xgl.yml file to your needs, explanations are available in the file:
kubectl create -f xgl.yml

NOTES: The container tags available are latest and 22.04 for Ubuntu 22.04, and 20.04 for Ubuntu 20.04. Persistent container tags are available in the form 22.04-20210101010101. BASIC_AUTH_PASSWORD will default to PASSWD if unspecified.

Change WEBRTC_ENCODER to x264enc, vp8enc, or vp9enc when using the selkies-gstreamer WebRTC interface if your GPU does not support H.264 (AVCHD) under the NVENC - Encoding section in NVIDIA's Video Encode and Decode GPU Support Matrix.

  1. Connect to the web server spawned at port 8080. You may configure the ingress endpoint or reverse proxy that your Kubernetes cluster provides to this port for external connectivity.

NOTES: Additional configurations and environment variables for the selkies-gstreamer WebRTC HTML5 interface are listed in lines that start with parser.add_argument within the selkies-gstreamer main script.

  1. (Not Applicable for noVNC) Read carefully if the selkies-gstreamer WebRTC HTML5 interface does not connect. Choose whether to use host networking or a TURN server. The selkies-gstreamer WebRTC HTML5 interface will likely just start working if you uncomment hostNetwork: true in xgl.yml. However, this may be restricted or be undesired because of security reasons. If so, check if the container starts working after commenting out hostNetwork: true. If it does not work, you need a TURN server. Read the Using a TURN Server section and fill in the environment variables TURN_HOST and TURN_PORT, then pick one of TURN_SHARED_SECRET or both TURN_USERNAME and TURN_PASSWORD environment variables based on your authentication method.

Using a TURN server

Note that this section is only required for the selkies-gstreamer WebRTC HTML5 interface. For an easy fix to when the signaling connection works, but the WebRTC connection fails, add the option --network host to your Docker command, or uncomment hostNetwork: true in your xgl.yml file when using Kubernetes (note that your cluster may have not allowed this, resulting in an error). This exposes your container to the host network, which disables network isolation. If this does not fix the connection issue (normally when the host is behind another firewall) or you cannot use this fix for security or technical reasons, read the below text.

In most cases when either of your server or client has a permissive firewall, the default Google STUN server configuration will work without additional configuration. However, when connecting from networks that cannot be traversed with STUN, a TURN server is required.

Deploying a TURN server

Read the instructions from selkies-gstreamer if want to deploy a TURN server or use a public TURN server instance.

Configuring with Docker

With Docker (or Podman), use the -e option to add the TURN_HOST, TURN_PORT environment variables. This is the hostname or IP and the port of the TURN server (3478 in most cases).

You may set TURN_PROTOCOL to tcp if you are only able to open TCP ports for the coTURN container to the internet, or if the UDP protocol is blocked or throttled in your client network. You may also set TURN_TLS to true with the -e option if TURN over TLS/DTLS was properly configured.

You also require to provide either just TURN_SHARED_SECRET for time-limited shared secret TURN authentication, or both TURN_USERNAME and TURN_PASSWORD for legacy long-term TURN authentication, depending on your TURN server configuration. Provide just one of these authentication methods, not both.

Configuring with Kubernetes

Your TURN server will use only one out of two ways to authenticate the client, so only provide one type of authentication method. The time-limited shared secret TURN authentication requires to only provide the Base64 encoded TURN_SHARED_SECRET. The legacy long-term TURN authentication requires to provide both TURN_USERNAME and TURN_PASSWORD credentials.

Time-limited shared secret authentication

  1. Create a secret containing the TURN shared secret:
kubectl create secret generic turn-shared-secret --from-literal=turn-shared-secret=MY_TURN_SHARED_SECRET

NOTES: Replace MY_TURN_SHARED_SECRET with the shared secret of the TURN server, then changing the name turn-shared-secret to your preferred name of the Kubernetes secret, with the xgl.yml file also being changed accordingly.

  1. Uncomment the lines in the xgl.yml file related to TURN server usage, updating the TURN_HOST and TURN_PORT environment variable as needed:
- name: TURN_HOST
  value: "turn.example.com"
- name: TURN_PORT
  value: "3478"
- name: TURN_SHARED_SECRET
  valueFrom:
    secretKeyRef:
      name: turn-shared-secret
      key: turn-shared-secret
- name: TURN_PROTOCOL
  value: "udp"
- name: TURN_TLS
  value: "false"

NOTES: It is possible to skip the first step and directly provide the shared secret with value:, but this exposes the shared secret in plain text. Set TURN_PROTOCOL to tcp if you were able to only open TCP ports while creating your own coTURN Deployment/DaemonSet, or if your client network throttles or blocks the UDP protocol.

Legacy long-term authentication

  1. Create a secret containing the TURN password:
kubectl create secret generic turn-password --from-literal=turn-password=MY_TURN_PASSWORD

NOTES: Replace MY_TURN_PASSWORD with the password of the TURN server, then changing the name turn-password to your preferred name of the Kubernetes secret, with the xgl.yml file also being changed accordingly.

  1. Uncomment the lines in the xgl.yml file related to TURN server usage, updating the TURN_HOST, TURN_PORT, and TURN_USERNAME environment variable as needed:
- name: TURN_HOST
  value: "turn.example.com"
- name: TURN_PORT
  value: "3478"
- name: TURN_USERNAME
  value: "username"
- name: TURN_PASSWORD
  valueFrom:
    secretKeyRef:
      name: turn-password
      key: turn-password
- name: TURN_PROTOCOL
  value: "udp"
- name: TURN_TLS
  value: "false"

NOTES: It is possible to skip the first step and directly provide the TURN password with value:, but this exposes the TURN password in plain text. Set TURN_PROTOCOL to tcp if you were able to only open TCP ports while creating your own coTURN Deployment/DaemonSet, or if your client network throttles or blocks the UDP protocol.

Troubleshooting

I have an issue related to the WebRTC HTML5 interface.

Link

I want to use the keyboard layout of my own language.

Run Input Method: Configure Input Method from the start menu, uncheck Only Show Current Language, search and add from available input methods (Hangul, Mozc, Pinyin, and others) by moving to the right, then use Ctrl + Space to switch between the input methods. Raise an issue if you need more layouts.

The container does not work.

Check that the NVIDIA Container Toolkit is properly configured in the host. Next, check whether your host NVIDIA GPU driver is the nvidia-headless variant, which lacks the required display and graphics capabilities for this container.

After that, check the environment variable NVIDIA_DRIVER_CAPABILITIES after starting a shell interface inside the container. NVIDIA_DRIVER_CAPABILITIES should be set to all, or include a comma-separated list of compute (requirement for CUDA and OpenCL, or for the selkies-gstreamer WebRTC remote desktop interface), utility (requirement for nvidia-smi and NVML), graphics (requirement for OpenGL and part of the requirement for Vulkan), video (required for encoding or decoding videos using NVIDIA GPUs, or for the selkies-gstreamer WebRTC remote desktop interface), display (the other requirement for Vulkan), and optionally compat32 if you use Wine or 32-bit graphics applications.

Moreover, if you are using custom configurations, check if your shared memory path /dev/shm has sufficient capacity, where expanding the capacity is done by adding --tmpfs /dev/shm:rw to your Docker command or adding the below lines to your Kubernetes configuration file.

spec:
  template:
    spec:
      containers:
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
      volumes:
      - name: dshm
        emptyDir:
          medium: Memory

If you checked everything here, scroll down.

I want to use systemd, polkit, FUSE mounts, or sandboxed (containerized) application distribution systems like Flatpak, Snapcraft (snap), AppImage, and etc.

Use the option --appimage-extract-and-run or --appimage-extract with your AppImage to run them in a container. Alternatively, set export APPIMAGE_EXTRACT_AND_RUN=1 to your current shell. For controlling PulseAudio, use pactl instead of pacmd as the latter corrupts the audio system within the container. Use sudoedit to edit protected files in the desktop instead of using sudo followed by the name of the editor.

Open Long Answer

For systemd, polkit, FUSE mounts, or sandboxed application distribution systems, do not use them with containers. You can use them if you add unsafe capabilities to your containers, but it will break the isolation of the containers. This is especially bad if you are using Kubernetes. For controlling PulseAudio, use pactl instead of pacmd as the latter corrupts the audio system within the container. Because polkit does not work, use sudoedit to edit protected files with the GUI instead of using sudo followed by the name of the editor. There will likely be an alternative way to install the applications, including Personal Package Archives. For some applications, there will be options to disable sandboxing when running or options to extract files before running.

I want to share one GPU with multiple containers to run GUI workloads.

Note that because of restrictions from Xorg, it is not possible to share one GPU to multiple Xorg servers running in different containers. Use docker-nvidia-egl-desktop if you intend to do this.

The container does not work if an existing GUI, desktop environment, or X server is running in the host outside the container. / I want to use this container in --privileged mode or with --cap-add and do not want other containers to interfere.

Open Answer

In order to use an X server on the host for your monitor with one GPU, and provision the other GPUs to the containers, you must change your /etc/X11/xorg.conf configuration of the host.

First, use sudo nvidia-xconfig --no-probe-all-gpus --busid=$BUS_ID --only-one-x-screen to generate /etc/X11/xorg.conf where BUS_ID is generated with the below script. Set GPU_SELECT to the ID (from nvidia-smi) of the specific GPU you want to provision.

HEX_ID=$(nvidia-smi --query-gpu=pci.bus_id --id="$GPU_SELECT" --format=csv | sed -n 2p)
IFS=":." ARR_ID=($HEX_ID)
unset IFS
BUS_ID=PCI:$((16#${ARR_ID[1]})):$((16#${ARR_ID[2]})):$((16#${ARR_ID[3]}))

Then, edit the /etc/X11/xorg.conf file of your host outside the container and add the below snippet to the end of the file. If you want to use containers in --privileged mode or with --cap-add, add the snippet to the /etc/X11/xorg.conf files of all other containers running an Xorg server as well (has been already added for this container). The exact file location may vary if not using the NVIDIA graphics driver.

Section "ServerFlags"
    Option "AutoAddGPU" "false"
EndSection

The below command adds the above snippet automatically. The exact file location may vary if not using the NVIDIA graphics driver.

echo -e "Section \"ServerFlags\"\n    Option \"AutoAddGPU\" \"false\"\nEndSection" | sudo tee -a /etc/X11/xorg.conf > /dev/null

Reference

If you restart your OS or the Xorg server, you will now be able to use one GPU for your host X server and your real monitor, and use the rest of the GPUs for the containers.

Then, you must avoid the GPU of which you are using for your host X server. Use docker --gpus '"device=1,2"' to provision GPUs with device IDs 1 and 2 to the container, avoiding the GPU with the ID of 0 that is used by the host X server, if you set GPU_SELECT to the ID of 0. Note that --gpus 1 means any single GPU, not the GPU device ID of 1.

Vulkan does not work.

Make sure that the NVIDIA_DRIVER_CAPABILITIES environment variable is set to all, or includes both graphics and display. The display capability is especially crucial to Vulkan, but the container does start without noticeable issues other than Vulkan without display, despite its name.

The container does not work if I set the resolution above 1920 x 1200 or 2560 x 1600 in 60 hz.

If your GPU is a consumer or professional GPU, change the VIDEO_PORT environment variable from DFP to DP-0 if DP-0 is empty, or any empty DP-* port. Set VIDEO_PORT to where your monitor is connected if you want to show the remote desktop in a real monitor. If your GPU is a Datacenter (Tesla) GPU, keep the VIDEO_PORT environment variable to DFP, and your maximum resolution is at 2560 x 1600. To go above this restriction, you may set VIDEO_PORT to none, but you must use borderless window instead of fullscreen, and this may lead to quite a lot of applications not starting, showing errors related to XRANDR or RANDR.

Open Long Answer

The container simulates the GPU to become plugged into a physical DVI-D/HDMI/DisplayPort digital video interface in consumer and professional GPUs with the ConnectedMonitor NVIDIA driver option. The container uses virtualized DVI-D ports for this purpose in Datacenter (Tesla) GPUs.

The ports to be used should only be connected with an actual monitor if the user wants the remote desktop screen to be shown on that monitor. If you want to show the remote desktop screen spawned by the container in a physical monitor, connect the monitor and set VIDEO_PORT to the the video interface identifier that is connected to the monitor. If not, avoid the video interface identifier that is connected to the monitor.

VIDEO_PORT identifiers and their connection states can be obtained by typing xrandr -q when the DISPLAY environment variable is set to the number of the spawned X server display (for example :0). As an alternative, you may set VIDEO_PORT to none (which effectively sets --use-display-device=None), but you must use borderless window instead of fullscreen, and this may lead to quite a lot of applications not starting because the RANDR extension is not available in the X server.

NOTES: Do not start two or more X servers for a single GPU. Use a separate GPU (or use Xvfb/Xdummy/Xvnc without hardware acceleration to use no GPUs at all) if you need a host X server unaffiliated with containers, and do not make the GPU available to the container runtime.

Since this container simulates the GPU being virtually plugged into a physical monitor while it actually does not, make sure the resolutions specified with the environment variables SIZEW and SIZEH are within the maximum size supported by the GPU. The environment variable VIDEO_PORT can override which video port is used (defaults to DFP, the first interface detected in the driver). Therefore, specifying VIDEO_PORT to an unplugged DisplayPort (for example numbered like DP-0, DP-1, and so on) is recommended for resolutions above 1920 x 1200 at 60 hz, because some driver restrictions are applied when the default is set to an unplugged physical DVI-D or HDMI port. The maximum size that should work in all cases is 1920 x 1200 at 60 hz, mainly for when the default VIDEO_PORT identifier DFP is not set to DisplayPort. The screen sizes over 1920 x 1200 at 60 hz but under the maximum supported display size specified for each port (supported by GPU specifications) will be possible if the port is set to DisplayPort (both physically connected or disconnected), or when a physical monitor or dummy plug to any other type of display ports (including DVI-D and HDMI) has been physically connected. If all GPUs in the cluster have at least one DisplayPort and they are not physically connected to any monitors, simply setting VIDEO_PORT to DP-0 is recommended (but this is not set as default because of legacy GPU compatibility reasons).

Datacenter (Tesla) GPUs seem to only support resolutions of up to around 2560 x 1600 at 60 hz (VIDEO_PORT must be kept to DFP instead of changing to DP-0 or other DisplayPort identifiers). The K40 (Kepler) GPU did not support RandR (required for some graphical applications using SDL and other graphical frameworks). Other Kepler generation Datacenter GPUs (maybe except the GRID K1 and K2 GPUs with vGPU capabilities) are also unlikely to support RandR, thus Datacenter GPU RandR support probably starts from Maxwell. Other tested Datacenter GPUs (V100, T4, A40, A100) support all graphical applications that consumer GPUs support. However, the performances were not better than consumer GPUs that usually cost a fraction of Datacenter GPUs, and the maximum supported resolutions were even lower.


This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019, the University of California Office of the President, and the University of California San Diego's California Institute for Telecommunications and Information Technology/Qualcomm Institute. Thanks to CENIC for the 100Gbps networks.

docker-nvidia-glx-desktop's People

Contributors

danisla avatar ehfd avatar justinbowes avatar numerical2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-nvidia-glx-desktop's Issues

Can't connect to hardly any Hosts

Hi.

Please help. I used to use Vast.ai a few months ago & it was hit & miss with guessing which Host will run the GLX Desktop, but at least then I was having more success than now. Used exact same GLX Desktop settings now as back then.
Now it's the worst I've ever seen it. I get like 80% fail rate when trying Hosts. It's just a lotto. I got told to try Hosts running driver 535.129.03, which so far has been the only version that's worked, but in saying that, there's also been Hosts with same driver 535.129.03 & don't work. It's just a mixed bag of shit.

I just use the default GLX Desktop template at Vast. If the driver & cuda versions of 2 different Hosts are the same, what else could cause one to work & the other to fail?
My internet speed itself is not the factor. This is a quick run-down of what I got today. I've also attached 2 log files of Status Logs & Debug Logs.

m-8914.txt
m-11864.txt

SUCCESS ::

** m:88435 - 8x 4090 driver 535.129.03 running cuda 12.2 = YES, working.

** m:14018 - 8x 4090 driver 535.129.03 cuda 12.2 = YES, working.

** m:14791 - 1x 4090 driver 535.129.03 cuda 12.2 = YES, working.

FAIL ::

** m:5150 - 8x 4090 driver 525.105.17 running cuda 12.0 = NO, entered Login info, "Connection Failed" with a RELOAD button.

** m:8914 DataCentre:18 - 8x 4090 driver 525.105.17 cuda 12.0 = NO, entered Login info, "Connection Failed" with a RELOAD button. I have Status Log & Debug Log for this one.

** m:8874 Host 54858 - 12x 4090 driver 535.104.05 cuda 12.2 = NO, error "Site Can't Be Reached".

** m:13471 Host 61247 - 12x 4090 driver 535.113.01 cuda 12.2 = NO, error "Site Can't Be Reached".

** m:11773 Host 61247 - 12x 4090 driver 535.129.03 running cuda 12.2 = NO, error, ""Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option"".

** m:8965 DataCentre:18 - 8x 4090 driver 535.129.03 cuda 12.2 = NO, entered Login info, "Connection Failed" with a RELOAD button. (( 535.129.03 with cuda 12.2 & still didn't work? Something extra odd there as I've succeeded with those driver & cuda versions on other machine. )) I have Status Log & Debug Log for this one.

** m:9111 - 8x 4090 driver 535.129.03 cuda 12.2 = NO, entered Login info, "Connection Failed" with a RELOAD button. I have Status Log & Debug Log for this one. (( 535.129.03 with cuda 12.2 & still didn't work? Something extra odd there as I've succeeded with those driver & cuda versions on other machine. ))

** m:10921 Host 36803 - 8x 4090 driver 535.54.03 cuda 12.2 = NO, error "Site Can't Be Reached".

** m:8335 - 8x 4090 driver 545.23.08 cuda 12.3 = NO, error "Site Can't Be Reached".

Is this something we're going to deal with forever? Just a lotto guessing game trying to find compatible Hosts which always change/get broken because of driver incompatibility ?? Why is it worse right now than ever before? Also doesn't seem like driver version is the only thing at play here, as how can it be if 2 Hosts identical driver & cuda yet one works & other doesn't.

I've spoke with Vast Support at their site & Hosts at Vast Discord but it seems no one can just get this working in a more reliable way. What else can we try?

Thanks for your time.

Runpod Secure Cloud issues

Hello there. I am facing an issue when trying to deploy the container in Secure cloud with runpod. With community cloud, I can mostly connect without any problems, but with every single secure cloud I am getting errors. I am attaching screenshots of all the logs.

Screenshot 2023-12-15 at 15 41 07 Screenshot 2023-12-15 at 15 34 17 Screenshot 2023-12-15 at 15 47 36

Le know if there is something I am missing.

Troubleshooting help - unable to access VNC session in Kubernetes

I need a little help troubleshooting this container.

I got it working with docker using the one line command in provided in the documentation - no problems there.

I decided to set it up on a test (one node) k8s cluster - I set up the nvidia/gpu-operator and the GPU appears as a resource that can be claimed. I created some yaml which enables the container to be deployed and a Nodeport based service to connect to it.

The container comes up and seems to claim the GPU resources correctly (I can see Xorg when I run nvidia-smi and I can see all the X processes running inside the pod). I can connect to guacamole via the Nodeport and I'm able to log in.

When I try to connect to the VNC session via the browser, however, I see the following:

Screenshot from 2021-08-28 17-38-36

I can see this in the tomcat logs:

10.244.0.1 - - [28/Aug/2021:15:39:25 +0000] "POST /tunnel?connect HTTP/1.1" 500 673

which seems to be at the crux of the issue.

However, I don't know where to look for more logs to try to understand why this 500 arises.

Can you tell me where I could find more info on how to determine the reason for this 500 response?

Thanks!

Passthrough of host usb devices to container?

Hello, thanks for maintaining this interesting project/container, which has been awesome for the intended use case.

But an odd admittedly edge case I have use for and can't quite get to work: I've tried using --device and volume options for docker run to give the container access to host (Ubuntu 20.04.1) usb devices (specifically input devices, keyboard etc).

I've tried:
--device /dev/bus/usb
-v /dev/bus/usb:/dev/bus/usb
--privileged

The devices appear to be visible from within a shell in the container but are not available/listed in xinput. I'm guessing there's something I don't understand going on with either the configuration that allows webRTC virtual input devices or just with containerized x in general that's preventing these from being picked up normally by the desktop environment there.

Thanks for any pointers in the right direction but feel free to close if too off-topic.

nvh264enc no longer works after upgrading to 22.04

The 22.04 docker image show black screen (no video) in both Chrome or Firefox.
The web GUI works but the HTTP server seems to suck if there are more than one browsers connecting. (the second connection will time out)

I am using the docker image sha256:23bb0ceae21a6cd0e89f1441bb8d12f751276c653db3c67da2d33b8e809ea770 from 6c6dd17

update-manager crashed

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 177, in activate_name_owner
    return self.get_name_owner(bus_name)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 361, in get_name_owner
    return self.call_blocking(BUS_DAEMON_NAME, BUS_DAEMON_PATH,
  File "/usr/lib/python3/dist-packages/dbus/connection.py", line 652, in call_blocking
    reply_message = self.send_message_with_reply_and_block(
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get owner of name 'org.freedesktop.UpdateManager': no such name

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/UpdateManager/UpdateManager.py", line 458, in _setup_dbus
    proxy_obj = bus.get_object('org.freedesktop.UpdateManager',
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 241, in get_object
    return self.ProxyObjectClass(self, bus_name, object_path,
  File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 250, in __init__
    self._named_service = conn.activate_name_owner(bus_name)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 182, in activate_name_owner
    self.start_service_by_name(bus_name)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 277, in start_service_by_name
    return (True, self.call_blocking(BUS_DAEMON_NAME, BUS_DAEMON_PATH,
  File "/usr/lib/python3/dist-packages/dbus/connection.py", line 652, in call_blocking
    reply_message = self.send_message_with_reply_and_block(
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.UpdateManager was not provided by any .service files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/update-manager", line 117, in <module>
    app = UpdateManager(data_dir, options)
  File "/usr/lib/python3/dist-packages/UpdateManager/UpdateManager.py", line 109, in __init__
    self._setup_dbus()
  File "/usr/lib/python3/dist-packages/UpdateManager/UpdateManager.py", line 469, in _setup_dbus
    self.dbusController = UpdateManagerDbusController(self, bus_name)
  File "/usr/lib/python3/dist-packages/UpdateManager/UpdateManager.py", line 478, in __init__
    self.alert_watcher = AlertWatcher()
  File "/usr/lib/python3/dist-packages/UpdateManager/Core/AlertWatcher.py", line 49, in __init__
    self.bus = dbus.Bus(dbus.Bus.TYPE_SYSTEM)
  File "/usr/lib/python3/dist-packages/dbus/_dbus.py", line 102, in __new__
    bus = BusConnection.__new__(subclass, bus_type, mainloop=mainloop)
  File "/usr/lib/python3/dist-packages/dbus/bus.py", line 124, in __new__
    bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.FileNotFound: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory

X11 not starting

Hello! Kudos for this great image. I've been trying to run a deployment on my Kubernetes cluster with nvidia-glx-desktop. Sometimes it works, but sometimes it doesn't (could be how things are cached by the cloud provider?). According to the logs, it looks like the entrypoint scripts are waiting for X11 to start and are stuck in a perpetual loop. Why do you think X11 isn't starting and how can I manually start it?

Audio through Apache Guacamole

Is it possible to get audio through the web browser session in Apache Guacamole? I can see that pulse audio is started and listening on the right port. I don't know what's the next step to enable audio to go through the VNC session.

Running with non-root user

I'm trying to create a chrome container based on this image (for rendering browser output headlessly), but am running into permission problems. Here's an example Dockerfile:

FROM nvidia/cudagl:11.2.2-runtime-ubuntu18.04

RUN groupadd -g 1000 testgroup && \
    useradd -ms /bin/bash testuser -u 1000 -g 1000 && \
    usermod -a -G adm,audio,cdrom,dialout,dip,fax,floppy,lp,plugdev,sudo,tape,tty,video,voice testuser

USER testuser
ENTRYPOINT "/bin/bash"

Running nvidia-smi gives the following error: Failed to initialize NVML: Insufficient Permissions

My application uses VirtualGL and Xvfb to render Chrome with a GPU if that's relevant. Works perfectly fine with the root user.

What am I missing here? From what I've read, the group video should be adequate, but this is not the case.

Thanks for the great repo! It's been a big help.

XDG_RUNTIME_DIR not set, + vulkan issues

  1. XDG_RUNTIME_DIR is not set. It can be set manually by passing -e XDG_RUNTIME_DIR=/tmp
  2. vkcube and vulkaninfo both fail:
    image
    image
  3. SteamVR fails, probably due to the above ^ issue with Vulkan, but I'm not sure.

GPU is a 3090 on RunPod. Similar issues on Vast.ai.

Can't load fullscreen games

I'm able to get this container up and running and run any 64-bit application. But when I try running 32-bit applications, I can't get anything to display. For example, I tried running 0ad and although it prints out logs to the terminal, I see nothing on the screen. If I run Wine applications that are 32-bit (games mostly) they won't display anything either. Is there something I can do to enable the 32-bit drivers from Nvidia in this container?

For reference: https://github.com/nvidia/nvidia-container-runtime#nvidia_driver_capabilities

I tried setting the environment variable NVIDIA_DRIVER_CAPABILITIES=all but I did not see any effect. I'm not sure how to test whether or not the 32-bit libraries are being included in this container.

Coexist with Host X11 server

Is it possible to run this docker image on desktop linux (which already has a running X11 server on the host)?

How should VIDEO_PORT be set in this case so that the x server in the container won't interference with the host x11 server?

Wine application or game issues

For example, when running winecfg I get such message:

00d0:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
00d0:err:winediag:nodrv_CreateWindow L"The explorer process failed to start."

How can I fix it?

Startup fails because the nvidia part is unable to write into ro filesystem

Just tried to get this running with:

docker run --gpus all --rm -it --name cuda12_vls --tmpfs /dev/shm:rw -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -p 8080:8080 ghcr.io/selkies-project/nvidia-glx-desktop:latest

After some time messages like "... reaped unknown pid ... (exit status1)" appeared.

Inside another Terminal:

docker exec -it cuda12_vls bash followed by tail -f /tmp/entrypoint-stdout---supervisor-fl6x5991.log reveals this:

`...

  • Starting system message bus dbus
    ...done.
    Creating directory NVIDIA-Linux-x86_64-525.125.06
    Verifying archive integrity... OK
    Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.125.06...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
    WARNING: You specified the '--no-kernel-modules' command line option, nvidia-installer will not install any kernel modules as part of this driver installation, and it will not remove existing NVIDIA kernel modules not part of an earlier NVIDIA driver installation. Please ensure that NVIDIA kernel modules matching this driver version are installed separately.
    ERROR: Unable to create '/lib/firmware/nvidia/525.125.06/gsp_tu10x.bin' for copying (Read-only file system)
    ERROR: Unable to create '/lib/firmware/nvidia/525.125.06/gsp_ad10x.bin' for copying (Read-only file system)
    ERROR: Unable to create '/usr/bin/nvidia-smi' for copying (Read-only file system)
    ERROR: Unable to create '/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.125.06' for copying (Read-only file system)
    ERROR: Unable to create '/usr/bin/nvidia-debugdump' for copying (Read-only file system)
    ERROR: Unable to create '/usr/lib/x86_64-linux-gnu/libcuda.so.525.125.06' for copying (Read-only file system)
    ERROR: Unable to create '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.125.06' for copying (Read-only file system)
    ERROR: Unable to create '/usr/bin/nvidia-persistenced' for copying (Read-only file system)
    ...
    WARNING: Unable to locate/open X configuration file.
    Package xorg-server was not found in the pkg-config search path.
    Perhaps you should add the directory containing xorg-server.pc' to the PKG_CONFIG_PATH environment variable No package 'xorg-server' found Option "ProbeAllGpus" "False" added to Screen "Screen0". Option "BaseMosaic" "False" added to Screen "Screen0". Option "AllowEmptyInitialConfiguration" "True" added to Screen "Screen0". New X configuration file written to '/etc/X11/xorg.conf' Waiting for X socket _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created. X.Org X Server 1.21.1.4 X Protocol Version 11, Revision 0 Current Operating System: Linux 451b69ecc3cf 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64 root=UUID=019fa11a-2577-4a77-ad19-36cff7b77e87 ro quiet xorg-server 2:21.1.4-2ubuntu1.7~22.04.5 (For technical support please see http://www.ubuntu.com/support) Current version of pixman: 0.40.0 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Tue Dec 26 14:18:35 2023 (==) Using config file: "/etc/X11/xorg.conf" (==) Using system config directory "/usr/share/X11/xorg.conf.d" (EE) Fatal server error: (EE) no screens found(EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. ...

I tried to start without the "--tmpfs /dev/shm:rw" parameter with same result.
What's wrong? Any pointers?

VP8enc: selkies-gstreamer fails with Missing gstreamer plugins:', ['vpx']

Hello,
when I change the encoding codec from e.g. x264enc to vp8enc and restart the container (docker-compose down && docker-compose up -d) the selkies-gstreamer doesn't start up and fails with the following error:

INFO: gst-python install looks OK
Traceback (most recent call last):
  File "/usr/local/bin/selkies-gstreamer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/selkies_gstreamer/__main__.py", line 508, in main
    app = GSTWebRTCApp(stun_servers, turn_servers, enable_audio, curr_fps, args.encoder, curr_video_bitrate, curr_audio_bitrate)
  File "/usr/local/lib/python3.8/dist-packages/selkies_gstreamer/gstwebrtc_app.py", line 74, in __init__
    self.check_plugins()
  File "/usr/local/lib/python3.8/dist-packages/selkies_gstreamer/gstwebrtc_app.py", line 633, in check_plugins
    raise GSTWebRTCAppError('Missing gstreamer plugins:', missing)
gstwebrtc_app.GSTWebRTCAppError: ('Missing gstreamer plugins:', ['vpx'])
Waiting for X socket
X socket is ready
sed: couldn't open temporary file /opt/gst-web/sed0pNTVu: Permission denied

How its possible to get it to run ?

Docker Compose Example

For my workflow, I prefer managing my containers with docker compose. Is it possible to support a docker-compose configuration? I can show you what I wrote up to get it working on my machine. Just want to make sure it was kept up to date as updates are made.

This basically takes what example docker run you have in the readme,

docker run --gpus 1 -it --tmpfs /dev/shm:rw -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -p 8080:8080 ghcr.io/ehfd/nvidia-glx-desktop:latest

And translates it for docker compose:

version: '3.3'
services:
  ehfd:
    tmpfs: '/dev/shm:rw'
    environment:
      - TZ=UTC
      - SIZEW=1920
      - SIZEH=1080
      - REFRESH=60
      - DPI=96
      - CDEPTH=24
      - PASSWD=mypasswd
      - WEBRTC_ENCODER=nvh264enc
      - BASIC_AUTH_PASSWORD=mypasswd
      - NOVNC_ENABLE=true
    ports:
      - '8080:8080'
    image: 'ghcr.io/ehfd/nvidia-glx-desktop:latest'
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Xorg fails to launch `(EE) No devices detected` when GPU is on nonzero PCI domain

On e.g. Azure A10 instances (Standard_NV36ads_A10_v5), Selkies does not start, with an error like the following:

[  6030.700] xf86EnableIO: failed to enable I/O ports 0000-03ff (Operation not permitted)
[  6030.700] (WW) Falling back to old probe method for modesetting
[  6030.700] (EE) open /dev/dri/card0: No such file or directory
[  6030.700] (WW) Falling back to old probe method for fbdev
[  6030.700] (II) Loading sub module "fbdevhw"
[  6030.700] (II) LoadModule: "fbdevhw"
[  6030.700] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[  6030.700] (II) Module fbdevhw: vendor="X.Org Foundation"
[  6030.700]    compiled for 1.21.1.4, module version = 0.0.2
[  6030.700]    ABI class: X.Org Video Driver, version 25.2
[  6030.700] (EE) open /dev/fb0: No such file or directory
[  6030.700] (WW) Falling back to old probe method for modesetting
[  6030.700] (EE) open /dev/dri/card0: No such file or directory
[  6030.700] (WW) Falling back to old probe method for fbdev
[  6030.700] (II) Loading sub module "fbdevhw"
[  6030.700] (II) LoadModule: "fbdevhw"
[  6030.700] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[  6030.700] (II) Module fbdevhw: vendor="X.Org Foundation"
[  6030.700]    compiled for 1.21.1.4, module version = 0.0.2
[  6030.700]    ABI class: X.Org Video Driver, version 25.2
[  6030.700] (EE) open /dev/fb0: No such file or directory
[  6030.700] (EE) No devices detected.
[  6030.700] (EE)
Fatal server error:
[  6030.700] (EE) no screens found(EE)
[  6030.700] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  6030.700] (EE) Please also check the log file at "/home/user/.local/share/xorg/Xorg.0.log" for additional information.
[  6030.700] (EE)
[  6030.700] (EE) Server terminated with error (1). Closing log file.

This occurs when the PCI domain of the GPU is nonzero. Here is an example with domain=2.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10-24Q                 On  | 00000002:00:00.0 Off |                    0 |
| N/A   N/A    P8              N/A /  N/A |      0MiB / 24512MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Selkies maps the GPU ordinal (via GPU_SELECT) to a bus ID and passes it to nvidia-xconfig. However, it constructs BUS_ID as follows:

HEX_ID="$(sudo nvidia-smi --query-gpu=pci.bus_id --id="$GPU_SELECT" --format=csv | sed -n 2p)"
IFS=":." ARR_ID=($HEX_ID)
unset IFS
BUS_ID="PCI:$((16#${ARR_ID[1]})):$((16#${ARR_ID[2]})):$((16#${ARR_ID[3]}))"

Note that the domain (in ARR_ID[0]) is completely ignored.

In the above example, this would construct PCI:0:0:0. This is partially qualified, but valid syntax; when the domain is omitted, it is assumed to be 0. However, this GPU is in PCI domain 2. In this case, the correct syntax is PCI:0@2:0:0.

By fully qualifying all BUS_IDs, GPUs on any arbitrary PCI domain can be supported.

I will submit a merge request that constructs the fully qualified BUS_ID.

Run desktop on Windows Subsystem for Linux

thanks for these excellent resources

is the glx one working ok at the moment ? It seems with the latest docker and up to date card (3080TI) when running on a windows host with --gpus all or --gpus 1 there are some faults:
on startup :
/proc/driver/nvidia/version is no longer present so DRIVER_VERSION can't be found
if you manually set DRIVER_VERSION to e.g. 510.47.03 - then the nvidia installer fails because the following files are preloaded into the container : libnvidia-ml.so.1, libcuda.so.1, libnvcuvid.so.1, libnvidia-encode.so.1, libnvidia-opticalflow.so.1

finally - if you hack all those so it runs then the nvidia-xconfig command currently there gives 'no screens found' (though I suspect at this point any xconfig command would do the same)

Docker Compose

I'm trying to convert your example of using a docker run command docker run --gpus 1 -it -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e SHARED=TRUE -e PASSWD=mypasswd -e VIDEO_PORT=DFP -p 8080:8080 ehfd/nvidia-glx-desktop:latest into a docker compose file.

Here is my file:

version: '3.8'
services:
  nvidia-glx-desktop:
    image: 'ehfd/nvidia-glx-desktop:latest'
    environment:
      - TZ=UTC
      - SIZEW=1920
      - SIZEH=1080
      - SHARED=TRUE
      - PASSWD=mypasswd
      - VIDEO_PORT=DFP
    ports:
      - '8080:8080'
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu, utility]

When I run it with your command, xrandr returns the correct value, mimicking a screen. However, when I run it with docker compose, xrandr is a virtual screen, at something like 32000 x 32000 screen size. Is there some subtle difference I'm not understanding?

Fatal server error: (EE) Cannot run in framebuffer mode.

I run this command

docker run \
	--name glx \
	--gpus 1 \
	-it \
	\
    --privileged \
    --cap-add=NET_ADMIN \
    --cap-add=SYS_ADMIN \
    --cap-add=SYS_MODULE \
    --cap-add=SYS_NICE \
    --cap-add=SYS_PACCT \
    --cap-add=SYS_PTRACE \
    --cap-add=SYS_RAWIO \
    --cap-add=SYS_RESOURCE \
    --cap-add=SYS_TIME \
    --cap-add=SYS_TTY_CONFIG \
    \
	-v /usr/bin/nvidia-xconfig:/usr/bin/nvidia-xconfig \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
	-e TZ=UTC \
	-e SIZEW=1920 \
	-e SIZEH=1080 \
	-e REFRESH=60 \
	-e DPI=96 \
	-e CDEPTH=24 \
	-e VIDEO_PORT=DFP \
	-e PASSWD=mypasswd \
	-e WEBRTC_ENCODER=nvh264enc \
	-e BASIC_AUTH_PASSWORD=mypasswd \
	-p 9363:8080 ghcr.io/ehfd/nvidia-glx-desktop:latest

but sadly with this error message

cat entrypoint-stdout---supervisor-qwlckvoy.log
 * Starting system message bus dbus
   ...done.
sudo nvidia-xconfig --virtual=1920x1080 --depth=24 --mode=1920x1080R --allow-empty-initial-configuration --no-probe-all-gpus --busid=PCI:2:0:0 --only-one-x-screen --connected-monitor=DFP

WARNING: Unable to locate/open X configuration file.

Option "ProbeAllGpus" "False" added to Screen "Screen0".
Option "AllowEmptyInitialConfiguration" "True" added to Screen "Screen0".
New X configuration file written to '/etc/X11/xorg.conf'

Waiting for X socket

X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
Build Operating System: linux Ubuntu
Current Operating System: Linux 64e366348af6 3.10.0-1160.36.2.el7.x86_64 #1 SMP Wed Jul 21 11:57:15 UTC 2021 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.36.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev
Build Date: 06 July 2021  10:17:51AM
xorg-server 2:1.20.11-1ubuntu1~20.04.2 (For technical support please see http://www.ubuntu.com/support) 
Current version of pixman: 0.38.4
 Before reporting problems, check http://wiki.x.org
 to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
 (++) from command line, (!!) notice, (II) informational,
 (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Dec 31 08:43:31 2021
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
vesa: Ignoring device with a bound kernel driver
(EE) 
Fatal server error:
(EE) Cannot run in framebuffer mode. Please specify busIDs        for all framebuffer devices
(EE) 
(EE) 
Please consult the The X.Org Foundation support 
  at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) Server terminated with error (1). Closing log file.
cat /var/log/Xorg.0.log
[2036415.242] 
X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
[2036415.242] Build Operating System: linux Ubuntu
[2036415.242] Current Operating System: Linux 64e366348af6 3.10.0-1160.36.2.el7.x86_64 #1 SMP Wed Jul 21 11:57:15 UTC 2021 x86_64
[2036415.242] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.36.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev
[2036415.242] Build Date: 06 July 2021  10:17:51AM
[2036415.242] xorg-server 2:1.20.11-1ubuntu1~20.04.2 (For technical support please see http://www.ubuntu.com/support) 
[2036415.242] Current version of pixman: 0.38.4
[2036415.242]  Before reporting problems, check http://wiki.x.org
 to make sure that you have the latest version.
[2036415.242] Markers: (--) probed, (**) from config file, (==) default setting,
 (++) from command line, (!!) notice, (II) informational,
 (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[2036415.242] (==) Log file: "/var/log/Xorg.0.log", Time: Fri Dec 31 08:43:31 2021
[2036415.242] (==) Using config file: "/etc/X11/xorg.conf"
[2036415.242] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[2036415.243] (==) ServerLayout "Layout0"
[2036415.243] (**) |-->Screen "Screen0" (0)
[2036415.243] (**) |   |-->Monitor "Monitor0"
[2036415.243] (**) |   |-->Device "Device0"
[2036415.243] (**) |-->Input Device "Keyboard0"
[2036415.243] (**) |-->Input Device "Mouse0"
[2036415.243] (==) Automatically adding devices
[2036415.243] (==) Automatically enabling devices
[2036415.243] (==) Automatically adding GPU devices
[2036415.243] (==) Automatically binding GPU devices
[2036415.243] (==) Max clients allowed: 256, resource mask: 0x1fffff
[2036415.243] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[2036415.243]  Entry deleted from font path.
[2036415.243] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[2036415.243]  Entry deleted from font path.
[2036415.244] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[2036415.244]  Entry deleted from font path.
[2036415.244] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[2036415.244]  Entry deleted from font path.
[2036415.244] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[2036415.244]  Entry deleted from font path.
[2036415.244] (==) FontPath set to:
 /usr/share/fonts/X11/misc,
 /usr/share/fonts/X11/Type1,
 built-ins
[2036415.244] (==) ModulePath set to "/usr/lib/xorg/modules"
[2036415.244] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[2036415.244] (WW) Disabling Keyboard0
[2036415.244] (WW) Disabling Mouse0
[2036415.244] (II) Loader magic: 0x564f6e1e1020
[2036415.244] (II) Module ABI versions:
[2036415.244]  X.Org ANSI C Emulation: 0.4
[2036415.244]  X.Org Video Driver: 24.1
[2036415.244]  X.Org XInput driver : 24.1
[2036415.244]  X.Org Server Extension : 10.0
[2036415.245] (++) using VT number 7

[2036415.245] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[2036415.246] (II) xfree86: Adding drm device (/dev/dri/card0)
[2036415.282] (--) PCI: (2@0:0:0) 10de:2204:1458:403b rev 161, Mem @ 0xcf000000/16777216, 0x383fe0000000/268435456, 0x383ff0000000/33554432, I/O @ 0x00006000/128, BIOS @ 0x????????/524288
[2036415.283] (--) PCI: (3@0:0:0) 10de:2204:1458:403b rev 161, Mem @ 0xcd000000/16777216, 0x383fc0000000/268435456, 0x383fd0000000/33554432, I/O @ 0x00005000/128, BIOS @ 0x????????/524288
[2036415.283] (--) PCI:*(6@0:0:0) 1a03:2000:15d9:0852 rev 48, Mem @ 0xcb000000/16777216, 0xcc000000/131072, I/O @ 0x00004000/128
[2036415.283] (--) PCI: (131@0:0:0) 10de:2204:1458:403b rev 161, Mem @ 0xfa000000/16777216, 0x387fe0000000/268435456, 0x387ff0000000/33554432, I/O @ 0x0000d000/128, BIOS @ 0x????????/524288
[2036415.283] (--) PCI: (132@0:0:0) 10de:2204:1458:403b rev 161, Mem @ 0xf8000000/16777216, 0x387fc0000000/268435456, 0x387fd0000000/33554432, I/O @ 0x0000c000/128, BIOS @ 0x????????/524288
[2036415.283] (II) LoadModule: "glx"
[2036415.283] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[2036415.285] (II) Module glx: vendor="X.Org Foundation"
[2036415.285]  compiled for 1.20.11, module version = 1.0.0
[2036415.285]  ABI class: X.Org Server Extension, version 10.0
[2036415.285] (II) LoadModule: "nvidia"
[2036415.285] (WW) Warning, couldn't open module nvidia
[2036415.285] (EE) Failed to load module "nvidia" (module does not exist, 0)
[2036415.285] (==) Matched ast as autoconfigured driver 0
[2036415.285] (==) Matched modesetting as autoconfigured driver 1
[2036415.285] (==) Matched fbdev as autoconfigured driver 2
[2036415.285] (==) Matched vesa as autoconfigured driver 3
[2036415.285] (==) Assigned the driver to the xf86ConfigLayout
[2036415.285] (II) LoadModule: "ast"
[2036415.286] (WW) Warning, couldn't open module ast
[2036415.286] (EE) Failed to load module "ast" (module does not exist, 0)
[2036415.286] (II) LoadModule: "modesetting"
[2036415.286] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[2036415.286] (II) Module modesetting: vendor="X.Org Foundation"
[2036415.286]  compiled for 1.20.11, module version = 1.20.11
[2036415.286]  Module class: X.Org Video Driver
[2036415.286]  ABI class: X.Org Video Driver, version 24.1
[2036415.286] (II) LoadModule: "fbdev"
[2036415.286] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[2036415.286] (II) Module fbdev: vendor="X.Org Foundation"
[2036415.286]  compiled for 1.20.1, module version = 0.5.0
[2036415.286]  Module class: X.Org Video Driver
[2036415.286]  ABI class: X.Org Video Driver, version 24.0
[2036415.286] (II) LoadModule: "vesa"
[2036415.286] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[2036415.286] (II) Module vesa: vendor="X.Org Foundation"
[2036415.286]  compiled for 1.20.4, module version = 2.4.0
[2036415.286]  Module class: X.Org Video Driver
[2036415.286]  ABI class: X.Org Video Driver, version 24.0
[2036415.286] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[2036415.286] (II) FBDEV: driver for framebuffer: fbdev
[2036415.286] (II) VESA: driver for VESA chipsets: vesa
[2036415.287] (WW) xf86OpenConsole: VT_GETSTATE failed: Inappropriate ioctl for device
[2036415.287] (WW) Falling back to old probe method for modesetting
[2036415.287] (II) modeset(1): using default device
[2036415.287] (II) Loading sub module "fbdevhw"
[2036415.287] (II) LoadModule: "fbdevhw"
[2036415.287] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[2036415.287] (II) Module fbdevhw: vendor="X.Org Foundation"
[2036415.287]  compiled for 1.20.11, module version = 0.0.2
[2036415.287]  ABI class: X.Org Video Driver, version 24.1
[2036415.287] (WW) Falling back to old probe method for fbdev
[2036415.287] (II) Loading sub module "fbdevhw"
[2036415.287] (II) LoadModule: "fbdevhw"
[2036415.288] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[2036415.288] (II) Module fbdevhw: vendor="X.Org Foundation"
[2036415.288]  compiled for 1.20.11, module version = 0.0.2
[2036415.288]  ABI class: X.Org Video Driver, version 24.1
[2036415.288] vesa: Ignoring device with a bound kernel driver
[2036415.288] (II) modeset(G0): using drv /dev/dri/card0
[2036415.288] (EE) Screen 0 deleted because of no matching config section.
[2036415.288] (II) UnloadModule: "modesetting"
[2036415.288] (EE) Screen 1 deleted because of no matching config section.
[2036415.288] (II) UnloadModule: "fbdev"
[2036415.288] (II) UnloadSubModule: "fbdevhw"
[2036415.288] (EE) Screen 1 deleted because of no matching config section.
[2036415.288] (II) UnloadModule: "vesa"
[2036415.288] (EE) 
Fatal server error:
[2036415.288] (EE) Cannot run in framebuffer mode. Please specify busIDs        for all framebuffer devices
[2036415.288] (EE) 
[2036415.288] (EE) 
Please consult the The X.Org Foundation support 
  at http://wiki.x.org
 for help. 
[2036415.288] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[2036415.288] (EE) 
[2036415.288] (EE) Server terminated with error (1). Closing log file.
nvidia-smi 
Fri Dec 31 08:46:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 30%   23C    P8    24W / 350W |   1997MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 30%   20C    P8    11W / 350W |   1450MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:83:00.0 Off |                  N/A |
| 30%   21C    P8    13W / 350W |   3038MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:84:00.0 Off |                  N/A |
| 30%   21C    P8    10W / 350W |  20331MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
docker version 
Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:49 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:54:13 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
cat /etc/X11/xorg.conf
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 470.63.01

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/input/mice"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Modeline "1920x1080R"  138.50  1920 1968 2000 2080  1080 1083 1088 1111 +hsync -vsync
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    Option         "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:2:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "ProbeAllGpus" "False"
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "ConnectedMonitor" "DFP"
    SubSection     "Display"
        Virtual     1920 1080
        Depth       24
        Modes      "1920x1080R"
    EndSubSection
EndSection

Quick clarifications

Hi @ehfd, appreciate all your work on this. I used an older version of this repo awhile back and wanted to see how it's been improved since then so I ran some tests last night.

Quick questions:

  • Do you have to stop gdm/ligthdm if it's already running on your current machine? Not sure this is clear in the documentation, but I found by stopping it, I could get the container to work. But when gdm was already running, the container would not work.
  • I was able to get the egl version of this container running, but not all apps would work off of it (like Firefox). However, when I'm in the glx version, I can clearly see Firefox running off the gpu by checking nvidia-smi for running processes. Does the egl version only support certain software out of the box?
  • Is egl the only one that can support multiple containers running off of the same gpu? Or is that possible with glx as well? If not, what stops it from working with glx?

Connection failed in webrtc

I have no idea what's wrong. I can connect to webrtc but not the vnc

Run with command (from ReadMe):

docker run --gpus 1 -it -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -p 127.0.0.1:8080:8080 ghcr.io/ehfd/nvidia-glx-desktop:latest

GPU is v100
Here are the logs
Some ERRORs are in entrypoint.log but I cannot determine which one is critical

entrypoint-stdout---supervisor-bgokahf1.log
supervisord.log
Xorg.0.log
selkies-gstreamer-stdout---supervisor-8u_8ckmh.log

touch does not support sub directories.

I'm attempting to launch the container, however the script, /etc/selkies-gstreamer-entrypoint.sh crashes with the error:
sudo touch /dev/input/{js0,js1,js2,js3} touch: cannot touch '/dev/input/js0': No such file or directory touch: cannot touch '/dev/input/js2': No such file or directory touch: cannot touch '/dev/input/js3': No such file or directory

it seems that touch dosent support creating files in subdirectories like this, as cd'ing into the directory and running touch {js0,js1,js2,js3} works fine.

billede

I'm running this on a ubuntu 22.04 host with the nvidia container toolbox.

Manually install the compat version of the CUDA toolkit in the container

Currently there are only 4 tags 18.04, 20.04, 22.04, latest. Whenever you release a new version, you update every tag and the previous images are gone.

It would be nice to be able to keep referencing a specific version or to grab an old release. For example you could keep 22.04-20230801 (never updated) beside 22.04 (updated).

Case in point: it is impossible to get any Xfce images since every tag has been overwritten since the switch to KDE

NVIDIA 535.86 doesn't run headless Xorg servers (fixed in 535.129.03 and 545.29.02)

Hello. I'm trying to run this container in my home Kubernetes cluster on Talos Linux with RTX4090 GPU.
Nvidia driver: 535.86.05

root@csgo-0:/tmp# cat /home/user/.local/share/xorg/Xorg.0.log
[  3108.301] _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
[  3108.301] 
X.Org X Server 1.21.1.3
X Protocol Version 11, Revision 0
[  3108.301] Current Operating System: Linux csgo-0 6.1.35-talos #1 SMP PREEMPT_DYNAMIC Wed Jun 28 13:58:51 UTC 2023 x86_64
[  3108.301] Kernel command line: talos.platform=metal talos.config=none console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 mitigations=off cpufreq.default_governor=performance
[  3108.301] xorg-server 2:21.1.3-2ubuntu2.5 (For technical support please see http://www.ubuntu.com/support) 
[  3108.301] Current version of pixman: 0.40.0
[  3108.301]    Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
[  3108.301] Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  3108.301] (==) Log file: "/home/user/.local/share/xorg/Xorg.0.log", Time: Tue Jul 25 13:36:43 2023
[  3108.301] (==) Using config file: "/etc/X11/xorg.conf"
[  3108.301] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  3108.301] (==) ServerLayout "Layout0"
[  3108.301] (**) |-->Screen "Screen0" (0)
[  3108.301] (**) |   |-->Monitor "Monitor0"
[  3108.301] (**) |   |-->Device "Device0"
[  3108.301] (**) |-->Input Device "Keyboard0"
[  3108.301] (**) |-->Input Device "Mouse0"
[  3108.301] (**) Option "AutoAddGPU" "false"
[  3108.301] (==) Automatically adding devices
[  3108.301] (==) Automatically enabling devices
[  3108.301] (**) Not automatically adding GPU devices
[  3108.301] (==) Automatically binding GPU devices
[  3108.301] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  3108.301] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  3108.301]    Entry deleted from font path.
[  3108.301] (==) FontPath set to:
        /usr/share/fonts/X11/misc,
        built-ins
[  3108.301] (==) ModulePath set to "/usr/lib/xorg/modules"
[  3108.301] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  3108.301] (WW) Disabling Keyboard0
[  3108.301] (WW) Disabling Mouse0
[  3108.301] (II) Loader magic: 0x55f31992c020
[  3108.301] (II) Module ABI versions:
[  3108.301]    X.Org ANSI C Emulation: 0.4
[  3108.301]    X.Org Video Driver: 25.2
[  3108.301]    X.Org XInput driver : 24.4
[  3108.301]    X.Org Server Extension : 10.0
[  3108.303] (EE) systemd-logind: failed to get session: Launch helper exited with unknown return code 1
[  3108.303] (II) xfree86: Adding drm device (/dev/dri/card0)
[  3108.303] (II) Platform probe for /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0
[  3108.304] (--) PCI:*(1@0:0:0) 10de:2684:10de:165b rev 161, Mem @ 0x93000000/16777216, 0x4000000000/34359738368, 0x4800000000/33554432, I/O @ 0x00006000/128, BIOS @ 0x????????/524288
[  3108.304] (II) LoadModule: "glx"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  3108.305] (II) Module glx: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org Server Extension, version 10.0
[  3108.305] (II) LoadModule: "nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  3108.305] (II) Module nvidia: vendor="NVIDIA Corporation"
[  3108.305]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.305]    Module class: X.Org Video Driver
[  3108.305] (II) NVIDIA dlloader X Driver  535.86.05  Fri Jul 14 20:26:08 UTC 2023
[  3108.305] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  3108.305] (II) Loading sub module "fb"
[  3108.305] (II) LoadModule: "fb"
[  3108.305] (II) Module "fb" already built-in
[  3108.305] (II) Loading sub module "wfb"
[  3108.305] (II) LoadModule: "wfb"
[  3108.305] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  3108.305] (II) Module wfb: vendor="X.Org Foundation"
[  3108.305]    compiled for 1.21.1.3, module version = 1.0.0
[  3108.305]    ABI class: X.Org ANSI C Emulation, version 0.4
[  3108.305] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  3108.305] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  3108.305] (==) NVIDIA(0): RGB weight 888
[  3108.305] (==) NVIDIA(0): Default visual is TrueColor
[  3108.305] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  3108.305] (**) NVIDIA(0): Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
[  3108.305] (**) NVIDIA(0): Option "ProbeAllGpus" "False"
[  3108.305] (**) NVIDIA(0): Option "BaseMosaic" "False"
[  3108.305] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration" "True"
[  3108.305] (**) NVIDIA(0): Option "HardDPMS" "False"
[  3108.305] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP"
[  3108.305] (**) NVIDIA(0): Enabling 2D acceleration
[  3108.305] (**) NVIDIA(0): ConnectedMonitor string: "DFP"
[  3108.305] (II) Loading sub module "glxserver_nvidia"
[  3108.305] (II) LoadModule: "glxserver_nvidia"
[  3108.305] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[  3108.309] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[  3108.309]    compiled for 1.6.99.901, module version = 1.0.0
[  3108.309]    Module class: X.Org Server Extension
[  3108.309] (II) NVIDIA GLX Module  535.86.05  Fri Jul 14 20:27:17 UTC 2023
[  3108.309] (II) NVIDIA: The X server supports PRIME Render Offload.
[  3108.322] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
[  3108.322] (--) NVIDIA(0):     DFP-0 (boot)
[  3108.322] (--) NVIDIA(0):     DFP-1
[  3108.322] (--) NVIDIA(0):     DFP-2
[  3108.322] (--) NVIDIA(0):     DFP-3
[  3108.322] (--) NVIDIA(0):     DFP-4
[  3108.322] (--) NVIDIA(0):     DFP-5
[  3108.322] (--) NVIDIA(0):     DFP-6
[  3108.322] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-0".
[  3108.322] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 4090 (AD102-A) at PCI:1:0:0
[  3108.322] (II) NVIDIA(0):     (GPU-0)
[  3108.322] (--) NVIDIA(0): Memory: 25153536 kBytes
[  3108.322] (--) NVIDIA(0): VideoBIOS: 95.02.20.00.01
[  3108.322] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): connected
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): LNX PiKVM (DFP-0): 600.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-1: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-3: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
[  3108.365] (--) NVIDIA(GPU-0): DFP-5: 2670.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: disconnected
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
[  3108.365] (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
[  3108.365] (--) NVIDIA(GPU-0): 
[  3108.365] (**) NVIDIA(GPU-0): Mode Validation Overrides for LNX PiKVM (DFP-0):
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVirtualSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidMaxPClkCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoHorizSyncCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoVertRefreshCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoExtendedGpuCapabilitiesCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoTotalSizeCheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDualLinkDVICheck
[  3108.365] (**) NVIDIA(GPU-0):     NoDisplayPortBandwidthCheck
[  3108.365] (**) NVIDIA(GPU-0):     AllowNon3DVisionModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonEdidModes
[  3108.365] (**) NVIDIA(GPU-0):     AllowNonHDMI3DModes
[  3108.365] (**) NVIDIA(GPU-0):     NoEdidHDMI2Check
[  3108.365] (**) NVIDIA(GPU-0):     AllowDpInterlaced
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add conservative default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(GPU-0): Unable to add "nvidia-auto-select" mode to ModePool.
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:1920x1080R"; removing.
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): Unable to validate any modes; falling back to the default mode
[  3108.366] (WW) NVIDIA(0):     "nvidia-auto-select".
[  3108.366] (WW) NVIDIA(0): 
[  3108.366] (WW) NVIDIA(0): No valid modes for "DFP-0:nvidia-auto-select"; removing.
[  3108.366] (EE) NVIDIA(0): Unable to use default mode "nvidia-auto-select".
[  3108.366] (EE) NVIDIA(0): Failing initialization of X screen
[  3108.427] (II) UnloadModule: "nvidia"
[  3108.427] (II) UnloadSubModule: "glxserver_nvidia"
[  3108.427] (II) Unloading glxserver_nvidia
[  3108.427] (II) UnloadSubModule: "wfb"
[  3108.427] (EE) Screen(s) found, but none have a usable configuration.
[  3108.427] (EE) 
Fatal server error:
[  3108.427] (EE) no screens found(EE) 
[  3108.427] (EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
[  3108.427] (EE) Please also check the log file at "/home/user/.local/share/xorg/Xorg.0.log" for additional information.
[  3108.427] (EE) 
[  3108.427] (EE) Server terminated with error (1). Closing log file.

Multiple GPUs, for multiple VNC desktop

We have one Gpu server with 4 Gpus. There are 4 developers want to use these Gpus, one for each. But we can only open one vnc desktop at the same time, when we open second, the first one will go black. Command I tried:
docker run --name glx1 -d --gpus "device=1" --privileged -it -e SIZEW=1920 -e SIZEH=1080 -e SHARED=TRUE -e VNCPASS=vncpasswd -p 5902:5901 ehfd/nvidia-glx-desktop:latest
By this command, we can run program on second gpu, but can not work together with the first container running on --gpus "device=0". The first container will go black when second container start, even they are not on same gpu.

Also tried command without privileged:
docker run --name glx1 -d --gpus "device=1" --device=/dev/tty1:rw -it -e SIZEW=1920 -e SIZEH=1080 -e SHARED=TRUE -e VNCPASS=vncpasswd -p 5902:5901 ehfd/nvidia-glx-desktop:latest
By this command, x11vnc can not start up. Error is could't open display.

And tried command in README, tried to open two vnc on same gpu 0, no luck.

And btw, there is small issue in bootstrap.sh, when docker restart your container, bootstrap will stuck on re-install driver. We fixed it by commenting it out.

Xorg fatal error: (EE) no screens found(EE)

Have been trying to get it to work for a while now.

What seems to stand out

[  2386.268] (EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
[  2386.268] (EE) NVIDIA(0): Failing initialization of X screen

Any ides?

/var/log/Xorg.0.log
[  2386.258] 
X.Org X Server 1.20.9
X Protocol Version 11, Revision 0
[  2386.258] Build Operating System: Linux 4.15.0-140-generic x86_64 Ubuntu
[  2386.258] Current Operating System: Linux d515df6cb351 5.12.14-arch1-1 #1 SMP PREEMPT Thu, 01 Jul 2021 07:26:06 +0000 x86_64
[  2386.258] Kernel command line: initrd=\initramfs-linux.img root=PARTUUID=1fc6aac3-02 rootfstype=ext4 add_efi_memmap pcie_acs_override=downstream,multifunction systemd.unified_cgroup_hierarchy=0 nvidia-drm.modeset=1
[  2386.258] Build Date: 08 April 2021  12:29:22PM
[  2386.258] xorg-server 2:1.20.9-2ubuntu1.2~20.04.2 (For technical support please see http://www.ubuntu.com/support) 
[  2386.258] Current version of pixman: 0.38.4
[  2386.258] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[  2386.258] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  2386.258] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Aug  3 10:21:30 2021
[  2386.258] (==) Using config file: "/etc/X11/xorg.conf"
[  2386.258] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  2386.258] (==) ServerLayout "Layout0"
[  2386.258] (**) |-->Screen "Screen0" (0)
[  2386.258] (**) |   |-->Monitor "Monitor0"
[  2386.258] (**) |   |-->Device "Device0"
[  2386.259] (**) |-->Input Device "Keyboard0"
[  2386.259] (**) |-->Input Device "Mouse0"
[  2386.259] (==) Automatically adding devices
[  2386.259] (==) Automatically enabling devices
[  2386.259] (==) Automatically adding GPU devices
[  2386.259] (==) Automatically binding GPU devices
[  2386.259] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  2386.259] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  2386.259] 	Entry deleted from font path.
[  2386.259] (==) FontPath set to:
	/usr/share/fonts/X11/misc,
	/usr/share/fonts/X11/Type1,
	built-ins
[  2386.259] (==) ModulePath set to "/usr/lib/xorg/modules"
[  2386.259] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  2386.259] (WW) Disabling Keyboard0
[  2386.259] (WW) Disabling Mouse0
[  2386.259] (II) Loader magic: 0x5582690b8020
[  2386.259] (II) Module ABI versions:
[  2386.259] 	X.Org ANSI C Emulation: 0.4
[  2386.259] 	X.Org Video Driver: 24.1
[  2386.259] 	X.Org XInput driver : 24.1
[  2386.259] 	X.Org Server Extension : 10.0
[  2386.259] (++) using VT number 7

[  2386.259] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[  2386.259] (II) xfree86: Adding drm device (/dev/dri/card0)
[  2386.261] (--) PCI:*(10@0:0:0) 10de:1b81:1043:8598 rev 161, Mem @ 0xf6000000/16777216, 0xe0000000/268435456, 0xf0000000/33554432, I/O @ 0x0000e000/128, BIOS @ 0x????????/131072
[  2386.261] (II) LoadModule: "glx"
[  2386.261] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  2386.262] (II) Module glx: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org Server Extension, version 10.0
[  2386.262] (II) LoadModule: "nvidia"
[  2386.262] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[  2386.262] (II) Module nvidia: vendor="NVIDIA Corporation"
[  2386.262] 	compiled for 1.6.99.901, module version = 1.0.0
[  2386.262] 	Module class: X.Org Video Driver
[  2386.262] (II) NVIDIA dlloader X Driver  465.31  Thu May 13 22:19:15 UTC 2021
[  2386.262] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  2386.262] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[  2386.262] (WW) xf86OpenConsole: VT_GETSTATE failed: Inappropriate ioctl for device
[  2386.262] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
[  2386.262] (II) Loading sub module "fb"
[  2386.262] (II) LoadModule: "fb"
[  2386.262] (II) Loading /usr/lib/xorg/modules/libfb.so
[  2386.262] (II) Module fb: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  2386.262] (II) Loading sub module "wfb"
[  2386.262] (II) LoadModule: "wfb"
[  2386.262] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  2386.262] (II) Module wfb: vendor="X.Org Foundation"
[  2386.262] 	compiled for 1.20.9, module version = 1.0.0
[  2386.262] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  2386.262] (II) Loading sub module "ramdac"
[  2386.262] (II) LoadModule: "ramdac"
[  2386.262] (II) Module "ramdac" already built-in
[  2386.263] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[  2386.263] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  2386.263] (==) NVIDIA(0): RGB weight 888
[  2386.263] (==) NVIDIA(0): Default visual is TrueColor
[  2386.263] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  2386.263] (**) NVIDIA(0): Option "DPI" "96 x 96"
[  2386.263] (**) NVIDIA(0): Option "ModeValidation" "NoMaxPClkCheck, NoEdidMaxPClkCheck, NoMaxSizeCheck, NoHorizSyncCheck, NoVertRefreshCheck, NoVirtualSizeCheck, NoExtendedGpuCapabilitiesCheck, NoTotalSizeCheck, NoDualLinkDVICheck, NoDisplayPortBandwidthCheck, AllowNon3DVisionModes, AllowNonHDMI3DModes, AllowNonEdidModes, NoEdidHDMI2Check, AllowDpInterlaced"
[  2386.263] (**) NVIDIA(0): Option "ProbeAllGpus" "False"
[  2386.263] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration" "True"
[  2386.263] (**) NVIDIA(0): Option "ConnectedMonitor" "DP-0"
[  2386.263] (**) NVIDIA(0): Enabling 2D acceleration
[  2386.263] (**) NVIDIA(0): ConnectedMonitor string: "DP-0"
[  2386.263] (II) Loading sub module "glxserver_nvidia"
[  2386.263] (II) LoadModule: "glxserver_nvidia"
[  2386.263] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[  2386.266] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[  2386.266] 	compiled for 1.6.99.901, module version = 1.0.0
[  2386.266] 	Module class: X.Org Server Extension
[  2386.266] (II) NVIDIA GLX Module  465.31  Thu May 13 22:16:59 UTC 2021
[  2386.266] (II) NVIDIA: The X server supports PRIME Render Offload.
[  2386.268] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:10:0:0
[  2386.268] (--) NVIDIA(0):     DFP-0
[  2386.268] (--) NVIDIA(0):     DFP-1
[  2386.268] (--) NVIDIA(0):     DFP-2
[  2386.268] (--) NVIDIA(0):     DFP-3
[  2386.268] (--) NVIDIA(0):     DFP-4
[  2386.268] (--) NVIDIA(0):     DFP-5 (boot)
[  2386.268] (--) NVIDIA(0):     DFP-6
[  2386.268] (**) NVIDIA(0): Using ConnectedMonitor string "DFP-3".
[  2386.268] (WW) NVIDIA: No DRM device: No direct render devices found.
[  2386.268] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce GTX 1070 (GP104-A) at PCI:10:0:0
[  2386.268] (II) NVIDIA(0):     (GPU-0)
[  2386.268] (--) NVIDIA(0): Memory: 8388608 kBytes
[  2386.268] (--) NVIDIA(0): VideoBIOS: 86.04.50.00.64
[  2386.268] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[  2386.268] (EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
[  2386.268] (EE) NVIDIA(0): Failing initialization of X screen
[  2386.268] (II) UnloadModule: "nvidia"
[  2386.268] (II) UnloadSubModule: "glxserver_nvidia"
[  2386.268] (II) Unloading glxserver_nvidia
[  2386.268] (II) UnloadSubModule: "wfb"
[  2386.269] (II) UnloadSubModule: "fb"
[  2386.269] (EE) Screen(s) found, but none have a usable configuration.
[  2386.269] (EE) 
Fatal server error:
[  2386.269] (EE) no screens found(EE) 
[  2386.269] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[  2386.269] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[  2386.269] (EE) 
[  2386.269] (EE) Server terminated with error (1). Closing log file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.