Hi, I want to run example.py in windows 11, but I get weird errors (sockets): <h1

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error when running example.py about llama-adapter HOT 3 OPEN

opengvlab commented on June 16, 2024

Error when running example.py

from llama-adapter.

Comments (3)

aojunzz commented on June 16, 2024 1

@reddiamond1234 we don't support the windows platform now.

from llama-adapter.

randerzander commented on June 16, 2024

This problem isn't limited to Windows. On Ubuntu w/ a CUDA 12 driver:

(llama_adapter) dev@desktop:~/projects/LLaMA-Adapter$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

(llama_adapter) dev@desktop:~/projects/LLaMA-Adapter$ torchrun --nproc_per_node 1 example.py \
         --ckpt_dir $TARGET_FOLDER/model_size\
         --tokenizer_path $TARGET_FOLDER/tokenizer.model \
         --adapter_path $ADAPTER_PATH
Traceback (most recent call last):
  File "example.py", line 114, in <module>                                                                                                                            
    fire.Fire(main)                                                                
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example.py", line 88, in main
    local_rank, world_size = setup_model_parallel()
  File "example.py", line 35, in setup_model_parallel
    torch.distributed.init_process_group("nccl")
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2214009) of binary: /home/dev/miniconda/envs/llama_adapter/bin/python
Traceback (most recent call last):
  File "/home/dev/miniconda/envs/llama_adapter/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/dev/miniconda/envs/llama_adapter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-05-12_11:56:32
  host      : desktop
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2214009)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

from llama-adapter.

randerzander commented on June 16, 2024

In case it helps others, I worked around this problem by:

Recreating the llama_adapter conda env
first installing the appropriate torch build for my machine (install command picked from here)
Remove the torch entry from requirements.txt
pip install -r requirements.txt

from llama-adapter.

Error when running example.py about llama-adapter HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs