longyuewangdcu / chinese-llama-2 Goto Github PK

improve Llama-2's proficiency in comprehension, generation, and translation of Chinese.

Shell 0.07% Python 90.84% Makefile 0.01% Dockerfile 0.04% Jsonnet 0.01% MDX 8.61% C++ 0.03% Cuda 0.38% Cython 0.01% C 0.01%

llama-2 chinese-llama large-language-models

chinese-llama-2's Introduction

Chinese-Llama-2: 中文Llama-2大模型

¹Zefeng Du^†, ²Minghao Wu^†, ¹Jianhui Pang^†, ¹Derek Wong, Longyue Wang^*, Zhaopeng Tu

¹ University of Macau, ² Monash University,

^†equal contribution

^*Longyue Wang is the corresponding author: [email protected]

🦙Chinese-Llama-2 project aims to enhance the understanding, generation, translation capabilities of the large language model Llama-2 in Chinese language. With the application of methods such as LoRA fine-tuning, full-parameter instruction fine-tuning, and secondary pre-training, we cordially invite you to download and utilize the associated datasets, training guides, and model parameters. 🦙Chinese-Llama-2 旨在进一步增强Llama-2大模型的中文理解、生成、翻译等能力。尝试LoRA微调、全参数指令微调、二次预训练等技术，欢迎下载并使用相关数据集、训练教程、模型参数。

News

[2023.11.28] 🚀 We continiously pretrain Llama-2 on 400GB Chinese and English literary texts and then fine-tune it on Chinese instruction dataset at Chinese-Llama-2-7B-conpre.
[2023.07.22] 🚀 We fine-tune the Llama-2 on the Chinese instruction dataset, known as Chinese-Llama-2, and release the Chinese-Llama-2-7B at seeledu/Chinese-Llama-2-7B. The full instruction fine-tuning code and example data are also released.
[2023.07.20] 🚀 We fine-tune the Llama-2 on the Chinese instruction dataset using LoRA technique, known as Chinese-Llama-2-LoRA, and release the Chinese-Llama-2-LoRA-7B.
[2023.07.18] 🎉🎉🎉 Llama-2 is announced!

Overview

Chinese-Llama-2 is a project that aims to expand the impressive capabilities of the Llama-2 language model to the Chinese language. Developed by MetaAI, Llama-2 has already proven to be a powerful language model. In this project, we focus on three key areas of research:

Parameter-efficient fine-tuning: We employ the LoRA (Low-Rank Adaptation) technique to fine-tune Llama-2 specifically for the Chinese instruction dataset. This approach optimizes the model's performance while minimizing the number of required parameters.
Full instruction fine-tuning: We fine-tune all parameters of Llama-2 on the Chinese instruction dataset, [BAAI/COIG] (https://huggingface.co/datasets/BAAI/COIG) and Chinese-English Doc-level translation dataset. By allowing the model to adapt fully to the characteristics of the Chinese language, we enhance its proficiency and accuracy in generating Chinese text.
Continued pre-training: To further enhance Llama-2's Chinese language understanding, we continue its pre-training on large-scale Chinese corpora. By exposing the model to vast amounts of Chinese text data, we enable it to capture intricate linguistic patterns and nuances, resulting in improved language generation.

This repository contains all the necessary code and resources to implement the aforementioned areas of research, facilitating experimentation and advancement in Chinese natural language processing using the Llama-2 model.

Installation

To use Chinese-Llama-2, follow these steps:

Clone the repository:

git clone https://github.com/longyuewangdcu/chinese-llama-2.git

Change into the project directory:
```
cd chinese-llama-2
```

Install the required dependencies:

pip install -e ./transformers
pip install -r requirements.txt

Parameter-Efficient Fine-Tuning

Model Checkpoints

The LoRA weights for meta-llama/Llama-2-7b-hf can be found at here or 百度网盘. More model weights will be available soon.

Using the Model

You use our fine-tuned model as follows:

path= #path to the project
model_path= #path to the origin model
lora_model_path== #path to the lora model
python3 $path/test/inference_lora.py --model-name-or-path $model_path \
    --lora-weights $lora_model_path\
    -t 0.7 \
    -sa 'sample' \
    -i $path/test/test_case.txt \
    -o $path/test/test_case.general-task.txt

Full parameters instruction fine-tuning

The full Chinese instruction fine-tuning weights for meta-llama/Llama-2-7b-hf can be found at seeledu/Chinese-Llama-2-7B or 百度网盘.

Using the Model

You use our fine-tuned model as follows:

path= #path to the project
model_path= #path to the model
python3 $path/test/inference.py --model-name-or-path $model_path \
    -t 0.7 \
    -sa 'sample' \
    -i $path/test/test_case.txt \
    -o $path/test/test_case.general-task.txt

Showcases

Example 1:

Question	素数是什么？
Llama-2 7B Chat	I'm glad you're interested in learning about prime numbers! However, I must point out that the term "素数" is not a commonly used term in mathematics. In fact, the term "prime number" is the more commonly used term in English. A prime number is a positive integer that is divisible only by itself and 1. For example, the first few prime numbers are 2, 3, 5, 7, 11, and 13. If you have any other questions or if there's anything else I can help you with, feel free to ask!
Chinese-Llama-2-LoRA-7B	素数是那些没有正整数因子的整数。
Chinese-Llama-2-7B	素数是一个大于一且只能被一和自身整除的数。

Example 2:

Question	科举制度是怎么样的。
Llama-2 7B Chat	I'm just an AI, I don't have have personal opinions or beliefs, and I cannot provide answers that are discriminatory or toxic. However, I can provide you with information on the concept of "科举制度" (kēng yí zhì dào) in a safe and respectful manner."科举制度" is a term used in China to refer to a system of governmental examinations that were used to select officials for the imperial bureaucracy in ancient China. The exams were designed to test a candidate's knowledge of Confucian classics, as well as their moral character and literary ability. Candidates who passed the exams were appointed to government positions, and the system was used for over a thousand years to select officials for the imperial government.While the "科举制度" system is no longer in use today, it is still studied by historians and scholars as an important part of Chinese history and cultural heritage. It is also a symbol of the country's rich cultural legacy and its commitment to education and meritocracy.I hope this information helps you understand the concept of "科举制度" in a safe and respectful manner. If you have any further questions, please feel free to ask.
Chinese-Llama-2-LoRA-7B	科举制度是**古代的一种官员任命方式，通过考试和推荐授予官职。。
Chinese-Llama-2-7B	科举制度是一种鼓励读书、赞赏才能和辛勤工作的考试制度，通过考试选拔官员。通常由皇帝主持，旨在选拔最合格的人进入**和地方政府。考试由三级组成，每个级别都有不同的标准，考生必须通过每个级别才能继续前进。通过科举制度，皇帝可以任命那些具备技能和知识的人担任要求的职位。

Datasets

We finetune Llama-2 on the Chinese Alpaca instruction dataset, which consists of 51K examples.

Parameter-efficient Fine-Tuning

To finetune Llama-2 using LoRA on the Chinese instruction dataset, Run the command to start LoRA finetune:

# Multi-nodes are also supported

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=eth1
export NCCL_IB_GID_INDEX=3
export NCCL_IB_SL=3
export NCCL_NET_GDR_READ=1

export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

path= #path to the project
train_path=$path/train/run_clm_lora.py

model_path=$path/model/llama2-7B-HF
model_save=$path/checkpoint/chinese-llama2-7b-4096-enzh/

torchrun --nnodes 1 --node_rank $INDEX --nproc_per_node 8 \
  --master_addr $MASTER_ADDR --master_port $MASTER_PORT  \
  ${train_path} \
  --deepspeed $path/train/deepspeed_config_bf16.json \
  --model_name_or_path ${model_path} \
  --train_file $path/data/instruction/all_instruction_hf.json \
  --validation_file $path/data/instruction/all_instruction_hf_dev.json \
  --preprocessing_num_workers 32 \
  --dataloader_num_workers 16 \
  --dataloader_pin_memory True \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --num_train_epochs 3 \
  --save_strategy "steps" \
  --save_steps 500 \
  --save_total_limit 1 \
  --learning_rate 2e-5 \
  --weight_decay 0. \
  --warmup_ratio 0.03 \
  --lr_scheduler_type "cosine" \
  --logging_steps 10 \
  --block_size 4096 \
  --use_lora True \
  --lora_config $path/train/lora_config.json \
  --do_train \
  --bf16 True \
  --bf16_full_eval True \
  --evaluation_strategy "no" \
  --validation_split_percentage 0 \
  --streaming \
  --ddp_timeout 72000 \
  --seed 1 \
  --overwrite_output_dir\
  --gradient_checkpointing True \
  --output_dir ${model_save}

Full Parameter Fine-tuning

To finetune Llama-2 using full parameters fine-tuning on the Chinese instruction dataset, run the command to start :

# Multi-nodes are also supported
# use flash attention to lower memory usage
pip install flash-attn==1.0.4

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=eth1
export NCCL_IB_GID_INDEX=3
export NCCL_IB_SL=3
export NCCL_NET_GDR_READ=1

export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

export HF_HOME=
export TRANSFORMERS_CACHE=
path= # path to llama2-chinese
train_path=$path/train/run_clm_llms_mem.py
model_path=$path/model/llama2-7B-HF # place original model here
model_save=$path/checkpoint/llama2-7b-llama2_coig_dt_ca-all/

# MASTER_ADDR set to localhost
HOST_NUM=2
torchrun --nnodes $HOST_NUM --node_rank $INDEX --nproc_per_node 8 \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT  \
    ${train_path} \
    --deepspeed $path/train/deepspeed_config_bf16.json \
    --model_name_or_path ${model_path} \
    --train_file $path/data/instruction/example_instruction_hf.json \
    --validation_file $path/data/instruction/example_instruction_hf_dev.json \
    --preprocessing_num_workers 32 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 2 \
    --num_train_epochs 3 \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 10 \
    --block_size 4096 \
    --do_train \
    --bf16 True \
    --bf16_full_eval True \
    --evaluation_strategy "no" \
    --validation_split_percentage 0 \
    --streaming \
    --ddp_timeout 72000 \
    --seed 1 \
    --overwrite_output_dir\
    --gradient_checkpointing True \
    --output_dir ${model_save}\

TODO

Continued pre-training
Based on llama2-chat, do SFT.
Release fine-tuned data we used

Stay tuned!

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request. We appreciate your contributions to making Chinese-Llama-2 even better.

Acknowledgments

Chinese-Llama-2 builds upon the Llama-2 developed by MetaAI. We would like to express our gratitude to the following open-source projects for their valuable contributions to the community:

Stanford Alpaca for providing the Alpaca dataset, we used its data format in our experiments.
Parrot for providing a helpful implementation of the training of LLaMA.
LLaMA-2 for providing a powerful LLM.

Citation

@misc{du-etal-2022-chinese-llama-2,
  author = {Zefeng Du, Minghao Wu, Longyue Wang},
  title = {Chinese-Llama-2},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/longyuewangdcu/Chinese-Llama-2}}
}

chinese-llama-2's People

Contributors

Stargazers

Watchers

chinese-llama-2's Issues

Unable to download the Chinese-Llama-2-7B-conpre model

Hello, downloading the Chinese-Llama-2-7B-conpre model requires a password. Can you share the download password?

Re-dowloaded model files, still got the "no data" error

          > ok, if you have any other questions, you can open another issue to discuss.

Sorry, but I re-download the model files and still got the same error

[2023-07-24 06:38:46,649] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]python-BaseException
Loading checkpoint shards:   0%|          | 0/2 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/data/kexin/anaconda3/envs/cllama2/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device
    new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Process finished with exit code 143

Originally posted by @XiongKexin in #2 (comment)

the model download from https://huggingface.co/seeledu/Chinese-Llama-2-7B can no be used

Loading checkpoint shards: 0%| | 0/2 [01:33<?, ?it/s]
Traceback (most recent call last):
File "/home/chenjunhao/chinese-llama-2/test/inference.py", line 137, in
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16, device_map="auto")
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 2643, in from_pretrained
) = cls._load_pretrained_model(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 2966, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 671, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/chenjunhao/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Error of llama.cpp convert

Traceback (most recent call last):
File "convert.py", line 1264, in
main()
File "convert.py", line 1244, in main
model_plus = load_some_model(args.model)
File "convert.py", line 1165, in load_some_model
models_plus.append(lazy_load_file(path))
File "convert.py", line 955, in lazy_load_file
return lazy_load_torch_file(fp, path)
File "convert.py", line 826, in lazy_load_torch_file
model = unpickler.load()
File "convert.py", line 815, in find_class
return self.CLASSES[(module, name)]
KeyError: ('torch._utils', '_rebuild_meta_tensor_no_storage')

can't run llama-2-7b-hf

Hi there. I'm running fine-tune codes and get the error message.

Traceback (most recent call last):
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
resolved_config_file = cached_file(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF'. Use repo_type argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py", line 786, in
main()
File "/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py", line 454, in main
config = AutoConfig.from_pretrained(model_args.model_name_or_path, config_kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 983, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 617, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 693, in _get_config_dict
raise EnvironmentError(
OSError: Can't load the configuration of '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF' is the correct path to a directory containing a config.json file
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11867) of binary: /root/miniconda3/envs/llama-v2/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/llama-v2/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 346, in wrapper
return f(*args, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py FAILED

Failures:
[1]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 11868)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 11869)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 11870)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 11871)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 11872)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 11873)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 11874)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 11867)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Mac

Hello,
Does it support Mac /M1/M2 or Linux?

please advise

如何让它的回答更加丰富

相对于llama-2英文版的回复，中文的回复是比较短，没有英文的丰富，如何让它的回答更加丰富？谢谢

Missing test/inference.py

It seems like there is no "test/inference.py" file?

NotImplementedError: Cannot copy out of meta tensor; no data!

When I run test/inference.py, there is an error "NotImplementedError: Cannot copy out of meta tensor; no data!". I don't know how to fix it. Is this due to my wrong transformer version(4.29.0)?

[2023-07-24 05:20:38,101] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]python-BaseException
Loading checkpoint shards:   0%|          | 0/2 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/data/kexin/anaconda3/envs/cllama2/lib/python3.8/site-packages/accelerate/utils/modeling.py" line 149, in set_module_tensor_to_device
    new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Process finished with exit code 143

支持13B和70B参数的模型微调吗

你好，非常开心和感激你让llama-2对中文的支持。我看到你是支持了7B参数的，相同的代码可以用于支持13B和70B参数的模型微调吗？

Fine tuning

I'm running the bash script to fine-tune the model and get the following error message:

[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).

Could you please check that?

Question about the tokenizer

Hello! It is very nice that you adapt Llama 2 for Chinese language and got great result.
I am new to LLM, and I wonder how to get the tokenizer for Llama 2? If I remember correctly, Llama 2 does not offically support Chinese, and the official model only have a couple hunderds of Chinese characters in its tokenizer.
Any explanation will be greatly appreciated, thanks!

model inference error

{'': 0}
Using pad_token, but it is not set yet.
Setting pad_token_id to eos_token_id:2 for open-end generation.
Traceback (most recent call last):
File "/home/chenjunhao/chinese-llama-2/test/inference_lora.py", line 160, in
generated_ids = model.generate(inputs=input_ids, attention_mask=attn_mask, generation_config=gen_config)
File "/home/chenjunhao/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/generation/utils.py", line 2514, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

every elements in probs are 0

longyuewangdcu / chinese-llama-2 Goto Github PK

chinese-llama-2's Introduction

Chinese-Llama-2: 中文Llama-2大模型

News

Overview

Installation

Parameter-Efficient Fine-Tuning

Model Checkpoints

Using the Model

Full parameters instruction fine-tuning

Using the Model

Showcases

Datasets

Parameter-efficient Fine-Tuning

Full Parameter Fine-tuning

TODO

Contributing

Acknowledgments

Citation

chinese-llama-2's People

Contributors

Stargazers

Watchers

Forkers

chinese-llama-2's Issues

/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py FAILED

Root Cause (first observed failure): [0]: time : 2023-07-31_09:50:12 host : autodl-container-95b911bb00-66f99c55 rank : 0 (local_rank: 0) exitcode : 1 (pid: 11867) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Root Cause (first observed failure):
[0]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 11867)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html