GithubHelp home page GithubHelp logo

Comments (6)

youkaichao avatar youkaichao commented on June 11, 2024 2

vllm-nccl-cu12==2.18.1.0.4.0 does not install in ./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 . It should be in ~/.config/vllm/nccl/cu12/libnccl.so.2.18.1 . The home directory depends on which user installs it. It is possible that your current user is different from the user who installed it.

./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 is the library installed by pytorch, and has a problem of increased memory used reported at NVIDIA/nccl#1234 . That's why vllm need to change the nccl dependency.

Unless either NVIDIA/nccl#1234 or pypi/support#3792 is resolved, we have no choice but to bring libnccl.so this way. Sorry for the trouble. This is not what we want, either. We also hope to manage dependency in a standard pip way.

from vllm.

youkaichao avatar youkaichao commented on June 11, 2024

run with export NCCL_DEBUG=TRACE ? this might be a nccl problem.

from vllm.

zhaoxf4 avatar zhaoxf4 commented on June 11, 2024

run with export NCCL_DEBUG=TRACE ? this might be a nccl problem.

After the command is executed, the complete log is as follows:

(vllm-test) $ export NCCL_DEBUG=TRACE

(vllm-test) $ CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server --model /data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct --dtype half --tensor-parallel-size 2
INFO 04-24 09:39:20 api_server.py:151] vLLM API server version 0.4.1
INFO 04-24 09:39:20 api_server.py:152] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-24 09:39:20 config.py:948] Casting torch.bfloat16 to torch.float16.
2024-04-24 09:39:22,770 INFO worker.py:1749 -- Started a local Ray instance.
INFO 04-24 09:39:23 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
ERROR 04-24 09:39:29 worker_base.py:153]     return executor(*args, **kwargs)
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
ERROR 04-24 09:39:29 worker_base.py:153]     init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
ERROR 04-24 09:39:29 worker_base.py:153]     pynccl_utils.init_process_group(
ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
ERROR 04-24 09:39:29 worker_base.py:153]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined
Traceback (most recent call last):
  File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/entrypoints/openai/api_server.py", line 159, in <module>
    engine = AsyncLLMEngine.from_engine_args(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 361, in from_engine_args
    engine = cls(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 319, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 437, in _init_engine
    return engine_class(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/llm_engine.py", line 148, in __init__
    self.model_executor = executor_class(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 44, in _init_executor
    self._init_workers_ray(placement_group)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 181, in _init_workers_ray
    self._run_workers("init_device")
  File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 323, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 154, in execute_method
    raise e
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
    return executor(*args, **kwargs)
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
    init_worker_distributed_environment(self.parallel_config, self.rank,
  File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
    pynccl_utils.init_process_group(
  File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
    logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
NameError: name 'ncclGetVersion' is not defined
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     init_worker_distributed_environment(self.parallel_config, self.rank,
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     pynccl_utils.init_process_group(
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]   File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153]     logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined

Looks like no new logs have been added.
I found out that my libnccl version is 2.15.1. Is this version likely to have an impact on multi-card communication?

(vllm-test) $ ll /usr/lib/x86_64-linux-gnu/libnccl.so.2
lrwxrwxrwx 1 root root 17 Sep 20  2022 /usr/lib/x86_64-linux-gnu/libnccl.so.2 -> libnccl.so.2.15.1*

from vllm.

youkaichao avatar youkaichao commented on June 11, 2024

[pip3] vllm-nccl-cu12==2.18.1.0.3.0

If you have vllm-nccl-cu12 installed, you don't need to specify VLLM_NCCL_SO_PATH . It should just work.

from vllm.

zhaoxf4 avatar zhaoxf4 commented on June 11, 2024

[pip3] vllm-nccl-cu12==2.18.1.0.3.0

If you have vllm-nccl-cu12 installed, you don't need to specify VLLM_NCCL_SO_PATH . It should just work.

PR #4259 can avoid this problem, but should just ignore the error.
I'll try to reproduce this problem on another machine or docker.

from vllm.

kylejablon avatar kylejablon commented on June 11, 2024

Just chiming in to say I'm experiencing a similar issue. Perhaps this is just an issue issue with how my directories are being set up. I have vllm-nccl-cu12==2.18.1.0.4.0 installed, and it's still having trouble finding libnccl.

$ find . | grep libnccl

./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2

Digging through the latest version of find_nccl_library seemed to confirm the issue for me

>>> find_nccl_library()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 18, in find_nccl_library
  File "<stdin>", line 25, in find_library
ValueError: Cannot find libnccl.so.2 in the system.

I think manually passing in libnccl may be the only feasible solution

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.