Comments (32)
Seems nccl2 is not built into the image when using the instructions.
(RayWorkerWrapper pid=6313) INFO 04-24 02:10:53 pynccl_utils.py:17] Failed to import NCCL library: Cannot find libnccl.so.2 in the system.
from vllm.
Can you try the latest main? It should install vllm-nccl-cu12
, which should work and bring the correct nccl version.
from vllm.
I tried:
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai
or
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=20 --build-arg nvcc_threads=20
following the documentation here:
https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html
Is this command wrong for nccl?
from vllm.
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packages
Your environment must be incorrect.
from vllm.
I'm confused. If I build docker image my environment is not relevant.
from vllm.
Please report the environment inside the docker.
from vllm.
I don't understand. I am running the docker command to build the image, there is nothing "inside" docker to run. It's being built.
I can of course run after the fact, but that is not relevant to my environment as building Docker image should be independent of the environment.
from vllm.
attach a shell into the docker image, report the environment inside it.
from vllm.
ubuntu@compute-permanent-node-171:~/vllm$ docker run -ti --runtime=nvidia --gpus '"device=0,1,2,6"' --shm-size=10.24gb --entrypoint=bash -p 5004:5004 -e NCCL_IGNORE_DISABLED_P2P=1 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -u root:root -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/ -v "${HOME}"/.triton:$HOME/.triton/ --network host fee8ae2c9682
WARNING: Published ports are discarded when using host network mode
root@compute-permanent-node-171:/vllm-workspace# apt-get install wget
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
wget
0 upgraded, 1 newly installed, 0 to remove and 38 not upgraded.
Need to get 367 kB of archives.
After this operation, 1008 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 wget amd64 1.21.2-2ubuntu1 [367 kB]
Fetched 367 kB in 1s (283 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package wget.
(Reading database ... 19331 files and directories currently installed.)
Preparing to unpack .../wget_1.21.2-2ubuntu1_amd64.deb ...
Unpacking wget (1.21.2-2ubuntu1) ...
Setting up wget (1.21.2-2ubuntu1) ...
root@compute-permanent-node-171:/vllm-workspace# wget https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
--2024-04-24 05:39:32-- https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24877 (24K) [text/plain]
Saving to: 'collect_env.py'
collect_env.py 100%[=========================================================================================================================================>] 24.29K --.-KB/s in 0s
2024-04-24 05:39:32 (145 MB/s) - 'collect_env.py' saved [24877/24877]
root@compute-permanent-node-171:/vllm-workspace# /usr/bin/python3.10 collect_env.py
Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.35
Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-1018-oracle-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100 80GB HBM3
GPU 1: NVIDIA H100 80GB HBM3
GPU 2: NVIDIA H100 80GB HBM3
GPU 3: NVIDIA H100 80GB HBM3
Nvidia driver version: 535.161.07
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-111
Off-line CPU(s) list: 112-223
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 56
Socket(s): 2
Stepping: 8
CPU max MHz: 3800.0000
CPU min MHz: 0.0000
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 5.3 MiB (112 instances)
L1i cache: 3.5 MiB (112 instances)
L2 cache: 224 MiB (112 instances)
L3 cache: 210 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-55
NUMA node1 CPU(s): 56-111
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Vulnerable
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.19.3
[pip3] torch==2.2.1
[pip3] triton==2.2.0
[pip3] vllm-nccl-cu12==2.18.1.0.3.0
[conda] Could not collectROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 NIC10 NIC11 NIC12 NIC13 NIC14 NIC15 NIC16 NIC17 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 PXB PXB NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS 0-55 0 N/A
GPU1 NV18 X NV18 NV18 NODE NODE NODE PXB PXB NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS 0-55 0 N/A
GPU2 NV18 NV18 X NV18 NODE NODE NODE NODE NODE PXB PXB NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS 0-55 0 N/A
GPU3 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE NODE PXB PXB NODE NODE 56-111 1 N/A
NIC0 PXB NODE NODE SYS X PIX NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC1 PXB NODE NODE SYS PIX X NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC2 NODE NODE NODE SYS NODE NODE X NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC3 NODE PXB NODE SYS NODE NODE NODE X PIX NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC4 NODE PXB NODE SYS NODE NODE NODE PIX X NODE NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC5 NODE NODE PXB SYS NODE NODE NODE NODE NODE X PIX NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC6 NODE NODE PXB SYS NODE NODE NODE NODE NODE PIX X NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC7 NODE NODE NODE SYS NODE NODE NODE NODE NODE NODE NODE X PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC8 NODE NODE NODE SYS NODE NODE NODE NODE NODE NODE NODE PIX X SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC9 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX NODE NODE NODE NODE NODE NODE NODE
NIC10 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X NODE NODE NODE NODE NODE NODE NODE
NIC11 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE X NODE NODE NODE NODE NODE NODE
NIC12 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE X PIX NODE NODE NODE NODE
NIC13 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE PIX X NODE NODE NODE NODE
NIC14 SYS SYS SYS PXB SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE NODE X PIX NODE NODE
NIC15 SYS SYS SYS PXB SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE NODE PIX X NODE NODE
NIC16 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE NODE NODE NODE X PIX
NIC17 SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE NODE NODE NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
NIC10: mlx5_10
NIC11: mlx5_11
NIC12: mlx5_12
NIC13: mlx5_13
NIC14: mlx5_14
NIC15: mlx5_15
NIC16: mlx5_16
NIC17: mlx5_17
root@compute-permanent-node-171:/vllm-workspace#
from vllm.
Also:
root@compute-permanent-node-171:/vllm-workspace# /usr/bin/python3.10
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
Just seems the nccl part of the build is broken.
from vllm.
Actually the file is present, but not being found by vllm:
root@compute-permanent-node-171:/vllm-workspace# find / | grep libnccl
/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2
/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
root@compute-permanent-node-171:/vllm-workspace#
from vllm.
which user are you using to run the srcipt? the path vllm use to find is ~/.config/vllm/nccl/
. it will change if you change the user.
from vllm.
I ran as root so I could install wget since that is not installed by default in the image you make for vllm.
Do you have a targeted question w.r.t. the actual issue of the vllm startup not finding nccl lib?
from vllm.
try to add environment variable export VLLM_NCCL_SO_PATH=/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
from vllm.
I don't know what is the problem in your side. You can try to debug this function:
Line 584 in 468d761
from vllm.
It's not the function. I showed it was not able to load the nccl library. It finds it but was unable to use it.
The function error is just a cascade.
In case helps:
root@compute-permanent-node-171:/vllm-workspace# ldconfig -v /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
/sbin/ldconfig.real: Path `/usr/local/cuda-12/targets/x86_64-linux/lib' given more than once
(from /etc/ld.so.conf.d/988_cuda-12.conf:1 and /etc/ld.so.conf.d/000_cuda.conf:1)
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
(from /etc/ld.so.conf.d/x86_64-linux-gnu.conf:4 and /etc/ld.so.conf.d/x86_64-linux-gnu.conf:3)
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
(from <builtin>:0 and /etc/ld.so.conf.d/x86_64-linux-gnu.conf:3)
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
(from <builtin>:0 and /etc/ld.so.conf.d/x86_64-linux-gnu.conf:3)
/sbin/ldconfig.real: Path `/usr/lib' given more than once
(from <builtin>:0 and <builtin>:0)
/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1: (from <cmdline>:0)
/sbin/ldconfig.real: Can't open directory /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1: Not a directory
/usr/local/cuda/targets/x86_64-linux/lib: (from /etc/ld.so.conf.d/000_cuda.conf:1)
libcudart.so.12 -> libcudart.so.12.1.55
/usr/lib/x86_64-linux-gnu/libfakeroot: (from /etc/ld.so.conf.d/fakeroot-x86_64-linux-gnu.conf:1)
libfakeroot-0.so -> libfakeroot-tcp.so
/usr/local/lib: (from /etc/ld.so.conf.d/libc.conf:2)
/lib/x86_64-linux-gnu: (from /etc/ld.so.conf.d/x86_64-linux-gnu.conf:3)
libsqlite3.so.0 -> libsqlite3.so.0.8.6
libsasl2.so.2 -> libsasl2.so.2.0.25
libhistory.so.8 -> libhistory.so.8.1
libldap-2.5.so.0 -> libldap-2.5.so.0.1.11
liblber-2.5.so.0 -> liblber-2.5.so.0.1.11
libksba.so.8 -> libksba.so.8.14.0
libnpth.so.0 -> libnpth.so.0.1.2
libreadline.so.8 -> libreadline.so.8.1
libassuan.so.0 -> libassuan.so.0.8.5
libubsan.so.1 -> libubsan.so.1.0.0
libjbig.so.0 -> libjbig.so.0
libopcodes-2.38-system.so -> libopcodes-2.38-system.so
libquadmath.so.0 -> libquadmath.so.0.0.0
libgd.so.3 -> libgd.so.3.0.8
libmd.so.0 -> libmd.so.0.0.5
libpython3.10.so.1.0 -> libpython3.10.so.1.0
libfreetype.so.6 -> libfreetype.so.6.18.1
libasan.so.6 -> libasan.so.6.0.0
libXdmcp.so.6 -> libXdmcp.so.6.0.0
libtiff.so.5 -> libtiff.so.5.7.0
libgpm.so.2 -> libgpm.so.2
libnghttp2.so.14 -> libnghttp2.so.14.20.1
libexpat.so.1 -> libexpat.so.1.8.7
libgomp.so.1 -> libgomp.so.1.0.0
libmpfr.so.6 -> libmpfr.so.6.1.0
libwebp.so.7 -> libwebp.so.7.1.3
libdeflate.so.0 -> libdeflate.so.0
libbrotlicommon.so.1 -> libbrotlicommon.so.1.0.9
libmpdec++.so.3 -> libmpdec++.so.2.5.1
libedit.so.2 -> libedit.so.2.0.68
libmpdec.so.3 -> libmpdec.so.2.5.1
libjpeg.so.8 -> libjpeg.so.8.2.2
libctf-nobfd.so.0 -> libctf-nobfd.so.0.0.0
libitm.so.1 -> libitm.so.1.0.0
libisl.so.23 -> libisl.so.23.1.0
libcurl-gnutls.so.4 -> libcurl-gnutls.so.4.7.0
libtsan.so.0 -> libtsan.so.0.0.0
libctf.so.0 -> libctf.so.0.0.0
libX11.so.6 -> libX11.so.6.4.0
libsodium.so.23 -> libsodium.so.23.3.0
libbrotlidec.so.1 -> libbrotlidec.so.1.0.9
libmpc.so.3 -> libmpc.so.3.2.1
librtmp.so.1 -> librtmp.so.1
libbsd.so.0 -> libbsd.so.0.11.5
libXext.so.6 -> libXext.so.6.4.0
libcbor.so.0.8 -> libcbor.so.0.8.0
libbrotlienc.so.1 -> libbrotlienc.so.1.0.9
libexpatw.so.1 -> libexpatw.so.1.8.7
libatomic.so.1 -> libatomic.so.1.2.0
libxcb.so.1 -> libxcb.so.1.1.0
libperl.so.5.34 -> libperl.so.5.34.0
libpsl.so.5 -> libpsl.so.5.3.2
libfontconfig.so.1 -> libfontconfig.so.1.12.0
libXau.so.6 -> libXau.so.6.0.0
libbfd-2.38-system.so -> libbfd-2.38-system.so
libfido2.so.1 -> libfido2.so.1.10.0
libXmuu.so.1 -> libXmuu.so.1.0.0
libssh.so.4 -> libssh.so.4.8.7
liblsan.so.0 -> liblsan.so.0.0.0
libpng16.so.16 -> libpng16.so.16.37.0
libgdbm_compat.so.4 -> libgdbm_compat.so.4.0.0
libXpm.so.4 -> libXpm.so.4.11.0
libgdbm.so.6 -> libgdbm.so.6.0.0
libcc1.so.0 -> libcc1.so.0.0.0
libnvidia-pkcs11-openssl3.so.535.161.07 -> libnvidia-pkcs11-openssl3.so.535.161.07
libnvidia-allocator.so.1 -> libnvidia-allocator.so.535.161.07
libnvidia-pkcs11.so.535.161.07 -> libnvidia-pkcs11.so.535.161.07
libcuda.so.1 -> libcuda.so.535.161.07
libnvidia-ml.so.1 -> libnvidia-ml.so.535.161.07
libnvidia-opencl.so.1 -> libnvidia-opencl.so.535.161.07
libnvidia-cfg.so.1 -> libnvidia-cfg.so.535.161.07
libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.535.161.07
libcudadebugger.so.1 -> libcudadebugger.so.535.161.07
libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.535.161.07
libselinux.so.1 -> libselinux.so.1
libpthread.so.0 -> libpthread.so.0
libthread_db.so.1 -> libthread_db.so.1
libk5crypto.so.3 -> libk5crypto.so.3.1
libudev.so.1 -> libudev.so.1.7.2
libnsl.so.1 -> libnsl.so.1
libhogweed.so.6 -> libhogweed.so.6.4
libBrokenLocale.so.1 -> libBrokenLocale.so.1
libnettle.so.8 -> libnettle.so.8.4
libnss_dns.so.2 -> libnss_dns.so.2
libapt-pkg.so.6.0 -> libapt-pkg.so.6.0.0
libsemanage.so.2 -> libsemanage.so.2
libc.so.6 -> libc.so.6
libbz2.so.1.0 -> libbz2.so.1.0.4
libanl.so.1 -> libanl.so.1
libncurses.so.6 -> libncurses.so.6.3
liblz4.so.1 -> liblz4.so.1.9.3
libmenuw.so.6 -> libmenuw.so.6.3
libsmartcols.so.1 -> libsmartcols.so.1.1.0
libpcreposix.so.3 -> libpcreposix.so.3.13.3
libmvec.so.1 -> libmvec.so.1
libgcc_s.so.1 -> libgcc_s.so.1
libm.so.6 -> libm.so.6
libnss_compat.so.2 -> libnss_compat.so.2
libnss_files.so.2 -> libnss_files.so.2
libc_malloc_debug.so.0 -> libc_malloc_debug.so.0
libsepol.so.2 -> libsepol.so.2
libnsl.so.2 -> libnsl.so.2.0.1
libdl.so.2 -> libdl.so.2
libe2p.so.2 -> libe2p.so.2.3
libaudit.so.1 -> libaudit.so.1.0.0
libapt-private.so.0.0 -> libapt-private.so.0.0.0
libpamc.so.0 -> libpamc.so.0.82.1
libtinfo.so.6 -> libtinfo.so.6.3
librt.so.1 -> librt.so.1
libgssapi_krb5.so.2 -> libgssapi_krb5.so.2.2
libkeyutils.so.1 -> libkeyutils.so.1.9
libformw.so.6 -> libformw.so.6.3
libtic.so.6 -> libtic.so.6.3
libxxhash.so.0 -> libxxhash.so.0.8.1
libpam.so.0 -> libpam.so.0.85.1
libseccomp.so.2 -> libseccomp.so.2.5.3
libpcre.so.3 -> libpcre.so.3.13.3
libutil.so.1 -> libutil.so.1
libz.so.1 -> libz.so.1.2.11
libssl.so.3 -> libssl.so.3
libdebconfclient.so.0 -> libdebconfclient.so.0.0.0
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 is the dynamic linker, ignoring
ld-linux-x86-64.so.2 -> ld-linux-x86-64.so.2
libgcrypt.so.20 -> libgcrypt.so.20.3.4
libpcre2-8.so.0 -> libpcre2-8.so.0.10.4
libdb-5.3.so -> libdb-5.3.so
libcap-ng.so.0 -> libcap-ng.so.0.0.0
libprocps.so.8 -> libprocps.so.8.0.3
libkrb5.so.3 -> libkrb5.so.3.3
libncursesw.so.6 -> libncursesw.so.6.3
libuuid.so.1 -> libuuid.so.1.3.0
libss.so.2 -> libss.so.2.0
libcom_err.so.2 -> libcom_err.so.2.1
libform.so.6 -> libform.so.6.3
libpcprofile.so -> libpcprofile.so
libresolv.so.2 -> libresolv.so.2
libtirpc.so.3 -> libtirpc.so.3.0.0
libgpg-error.so.0 -> libgpg-error.so.0.32.1
libblkid.so.1 -> libblkid.so.1.1.0
libmount.so.1 -> libmount.so.1.1.0
libgmp.so.10 -> libgmp.so.10.4.1
libcrypt.so.1 -> libcrypt.so.1.1.0
libnss_hesiod.so.2 -> libnss_hesiod.so.2
libcrypto.so.3 -> libcrypto.so.3
libcap.so.2 -> libcap.so.2.44
libp11-kit.so.0 -> libp11-kit.so.0.3.0
libkrb5support.so.0 -> libkrb5support.so.0.1
libsystemd.so.0 -> libsystemd.so.0.32.0
libtasn1.so.6 -> libtasn1.so.6.6.2
libstdc++.so.6 -> libstdc++.so.6.0.30
libacl.so.1 -> libacl.so.1.1.2301
libffi.so.8 -> libffi.so.8.1.0
libpanel.so.6 -> libpanel.so.6.3
libidn2.so.0 -> libidn2.so.0.3.7
libpanelw.so.6 -> libpanelw.so.6.3
libattr.so.1 -> libattr.so.1.1.2501
libmemusage.so -> libmemusage.so
libunistring.so.2 -> libunistring.so.2.2.0
libext2fs.so.2 -> libext2fs.so.2.4
libpam_misc.so.0 -> libpam_misc.so.0.82.1
libgnutls.so.30 -> libgnutls.so.30.31.0
libmenu.so.6 -> libmenu.so.6.3
liblzma.so.5 -> liblzma.so.5.2.5
libzstd.so.1 -> libzstd.so.1.4.8
/lib: (from <builtin>:0)
root@compute-permanent-node-171:/vllm-workspace#
from vllm.
Is it possible to know your exact command for building the docker image?
Also, is there a docker image for the pre-release 0.4.1?
from vllm.
The relevant error after adding the env you suggested:
ubuntu@compute-permanent-node-171:~/vllm$ docker logs ac406d9bc86d
INFO 04-24 06:02:06 api_server.py:151] vLLM API server version 0.4.1
INFO 04-24 06:02:06 api_server.py:152] args: Namespace(host='0.0.0.0', port=5004, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='databricks/dbrx-instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir='/home/ubuntu/.cache/huggingface/hub', load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', worker_use_ray=True, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=1234, swap_space=4, gpu_memory_utilization=0.98, num_gpu_blocks_override=None, max_num_batched_tokens=32768, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_max_model_len=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=100)
2024-04-24 06:02:28,968 INFO worker.py:1749 -- Started a local Ray instance.
INFO 04-24 06:02:29 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='databricks/dbrx-instruct', speculative_config=None, tokenizer='databricks/dbrx-instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir='/home/ubuntu/.cache/huggingface/hub', load_format=auto, tensor_parallel_size=4, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-24 06:02:46 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
ldd: /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1: No such file or directory
ERROR 04-24 06:02:46 pynccl.py:45] Failed to load NCCL library from /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-6.5.0-1018-oracle-x86_64-with-glibc2.35.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 04-24 06:02:46 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 .
INFO 04-24 06:02:46 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerWrapper pid=6433) INFO 04-24 06:02:46 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
(RayWorkerWrapper pid=6433) ldd: /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1: No such file or directory
(RayWorkerWrapper pid=6433) ERROR 04-24 06:02:46 pynccl.py:45] Failed to load NCCL library from /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-6.5.0-1018-oracle-x86_64-with-glibc2.35.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerWrapper pid=6433) INFO 04-24 06:02:46 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 .
(RayWorkerWrapper pid=6433) INFO 04-24 06:02:46 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-24 06:02:47 selector.py:28] Using FlashAttention backend.
(RayWorkerWrapper pid=6433) INFO 04-24 06:02:47 selector.py:28] Using FlashAttention backend.
from vllm.
I'll try running as root with the env.
from vllm.
Why ldd /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
returned error? I don't know why your environment is broken.
from vllm.
I'm just building image with the documented command. It has nothing to do with my environment. The same exact commands work on the release version of 0.4.0.post1 docker image.
For non-root it only finds /usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2
so I'll set env to that.
from vllm.
Ok that env worked for non-root user with that non-root path:
docker run -d --runtime=nvidia --gpus '"device=0,1,2,6"' --shm-size=10.24gb -p 5004:5004 -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 -e NCCL_IGNORE_DISABLED_P2P=1 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -u `id -u`:`id -g` -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/ -v "${HOME}"/.triton:$HOME/.triton/ --network host fee8ae2c9682 --port=5004 --host=0.0.0.0 --model=databricks/dbrx-instruct --seed 1234 --trust-remote-code --tensor-parallel-size=4 --max-num-batched-tokens=32768 --max-log-len=100 --trust-remote-code --worker-use-ray --enforce-eager --gpu-memory-utilization 0.98 --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.dbrx.txt
I think there must be some bug in the docker image that this would be required. You keep blaming my env, but for docker that can't be. I told you my command, same as documented.
from vllm.
It might be the case that you are using a non-root user, which cannot access the /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
file because it is created by root user.
from vllm.
I've always run prior docker images as non-root. If the build command is somehow wrong or the image is wrong and isn't correctly supporting non-root user, there could be issue yes. But has nothing to do with my env.
from vllm.
I think that is the key. You are running the image as non-root.
from vllm.
Thanks for your help. I hope the released version doesn't have the same problems.
from vllm.
The docker image is meant to be run as a root user. That's the issue we fight very hard with nccl :(
from vllm.
@pseudotensor Can you please specify which version of flash attention are you working with ?
from vllm.
@youkaichao Ok, but I've been running the releases of vllm and my own build of vllm inside h2ogpt as non-root for many months now. So this must be a new problem.
@ttbachyinsda Unsure what you mean, I'm building docker image using documented commands, so it's whatever is in the Dockerfile.
from vllm.
This is a new problem. Until nccl team address this issue NVIDIA/nccl#1234 , we will suffer a lot with nccl :(
from vllm.
Ok no problem, I guess good to document (in the docker run part) that env and what to set it to for the 2 cases of running as root and running as some user like I shared. Then the issue can be closed.
from vllm.
document (in the docker run part) that env and what to set it to for the 2 cases of running as root and running as some user like I shared
will do.
from vllm.
Related Issues (20)
- [Bug]: async engine failure when placing multi lora adapter under load HOT 2
- [Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM HOT 1
- [ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite HOT 3
- [Bug]: vLLM embeddings example code doesn't work HOT 2
- [Bug]: Crash sometimes using LLM entrypoint and LoRA adapters HOT 1
- [Usage]: Multiple samplig params with OpenAI library HOT 5
- [Bug]: The tail problem HOT 1
- [New Model]: LLaVA-NeXT-Video support
- [Usage]: extractive question answering using VLLM
- [Feature]: Triton GPTQ
- [Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
- [Bug]: nsys cannot track the cuda kernel called by the process except rank 0 HOT 2
- [Usage]: Do we have any tutorials for using vllm with tensorrt-LLM? HOT 2
- [Usage]: how should I do data parallelism using vLLM?
- [Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests HOT 1
- [Misc]: Should inference with temperature 0 generate the same results for a lora adapter and equivalent merged model? HOT 5
- [Bug] [spec decode] [flash_attn]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache
- [Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. HOT 1
- [Feature]: Linear adapter support for Mixtral
- [Feature]: VLLM suport for function calling in Mistral-7B-Instruct-v0.3 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.