lm-sys / fastchat Goto Github PK
View Code? Open in Web Editor NEWAn open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
License: Apache License 2.0
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
License: Apache License 2.0
Demo,https://chat.lmsys.org/ “NETWORK ERROR. PLEASE REGENERATE OR REFRESH THIS PAGE. ”
gsutil cp gs://skypilot-chatbot/chatbot/13b/ckpt/added_tokens.json ./
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)
Hi there,
I was looking at the README file and I noticed that the codebase is using Stanford Alpaca's fine-tuning code with some modifications. I also saw that you mentioned that the hyperparameters used for training are similar to those used in Stanford Alpaca.
I was wondering if the training script can be used for Lora version training? I believe that the changes made to support gradient checkpointing and Flash Attention can make Lora training much faster. Could you please confirm if this is possible?
Thank you!
Hi there. As I wanted to Not finetune a llama, but rather a gpt / Bloom in my language, I was wondering If I could use the train functionality? Does it Work? What do I need to consider except for translating the dataset? Thank you for your time!!
https://chat.lmsys.org/ has "koala-13b" as a model option. I can't find any information about this one; can you explain this model?
I run fine tune scripts on a 4 * A100 80G machine and meet this issue, can someone help take a look?
root@1cafe085c343:~/FastChat# torchrun --nnodes=1 --nproc_per_node=4 --master_port=3124 fastchat/train/train_mem.py --model_name_or_path /root/llama-13b-hf --data_path /root/sharegpt_vicuna/sharegpt_20230401_clean_lang_split.json --bf16 True --output_dir ./checkpoints --num_train_epochs 3 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 2 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit 100 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:19<00:00, 6.46s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards: 33%|████████████████████████████████████████████ | 1/3 [00:15<00:30, 15.33s/it]WARNING:root:Loading data...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.59s/it]
Using pad_token, but it is not set yet.
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Loading data...
Loading checkpoint shards: 67%|████████████████████████████████████████████████████████████████████████████████████████ | 2/3 [00:26<00:12, 12.95s/it]WARNING:root:Formatting inputs...Skip in lazy mode
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:34<00:00, 11.39s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:37<00:00, 12.66s/it]
Using pad_token, but it is not set yet.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Formatting inputs...Skip in lazy mode
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.14.0
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/11814 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
train()
File "/root/FastChat/fastchat/train/train.py", line 315, in train
Traceback (most recent call last):
File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
trainer.train()
File "/transformers/src/transformers/trainer.py", line 1639, in train
train()
File "/root/FastChat/fastchat/train/train.py", line 315, in train
trainer.train()
File "/transformers/src/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2652, in training_step
return inner_training_loop(
File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
loss = self.compute_loss(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
Traceback (most recent call last):
File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
tr_loss_step = self.training_step(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2652, in training_step
train()
File "/root/FastChat/fastchat/train/train.py", line 315, in train
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
trainer.train()
File "/transformers/src/transformers/trainer.py", line 1639, in train
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
return inner_training_loop(
File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
loss = self.compute_loss(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
tr_loss_step = self.training_step(model, inputs)
File "/transformers/src/transformers/trainer.py", line 2652, in training_step
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
Traceback (most recent call last):
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
loss = self.compute_loss(model, inputs)return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
train()
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
File "/root/FastChat/fastchat/train/train.py", line 315, in train
trainer.train()
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
return module(*inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/transformers/src/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
File "/transformers/src/transformers/trainer.py", line 2652, in training_step
loss = self.compute_loss(model, inputs)
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
outputs = model(**inputs)
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
output = self._fsdp_wrapped_module(*args, **kwargs)return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
outputs = self.model(
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
return module(*inputs, output_attentions, None)
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
outputs = run_function(*args)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
outputs = run_function(*args)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
return module(*inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
return module(*inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
TypeError: forward() got an unexpected keyword argument 'position_ids'
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
return forward_call(*args, **kwargs)
File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
wandb: Waiting for W&B process to finish... (failed 1).
I've had a lot of confusion around this myself. I noticed the license for fastchat is Apache 2.0
Vicuna is fine tuned on LLaMa which I believe is open for research, but not for commercial use.
Are Vicuna and any derived products open source for any usage?
I am getting those errors, naturally I do no have CUDA on macbook. Are there any steps to take to recompile torch with MPS enabled or can you perform checks for this in FastChat?
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Merhaba yazılım öğrenmek istiyorum yardımcı olacak ders vermek isteyen arkadaş bulmak isterim
Meet a few issues in the evaluation
ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-13b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer.jsonl
Traceback (most recent call last):
File "model_qa.py", line 61, in <module>
eval_model(args.model_name, args.question_file, args.answers_file)
AttributeError: 'Namespace' object has no attribute 'answers_file'
ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-7b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer_llama-7b.jsonl
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.16s/it]
Traceback (most recent call last):
File "model_qa.py", line 61, in <module>
eval_model(args.model_name, args.question_file, args.answer_file)
File "/usr/lib/python3/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "model_qa.py", line 22, in eval_model
ques_file = open(os.path.expanduser(questions_file), "r")
FileNotFoundError: [Errno 2] No such file or directory: 'tables/question.jsonl'
I am amazed at your accomplishments, and again amazed at the clean results of the demo you provided.
So I was looking forward to running this demo in my machine.
unfortunately there was no weight released, but I'm hopeful that they will release it soon.
when I look at the answer about releasing the model, it says that you will release the model after the light-weighting is complete. So will it be a different model than what we're experiencing in the demo now, or the weights of the VICUNA-13B?
Thanks a ton team for the release of model. Any notebook for demo? Some steps about mentioning model paths are a bit confusing. thanks
I think last commit (bf4e67e) broke the WebUI pipeline, workers can't be found after some period of time. error disappears after reverting this commit to a previous one.
Can't you make a low-requirement version like Dalai? In addition, this coin limit is very boring. Can't you remove this?
I just noticed a typo in the readme.
# Luanch a gradio web server.
I'm guessing it is meant to be launch.
Hi,
Could you add a colour feature to the webpage and a font-size feature as well?
Thx.
Thanks for this release.
When I try to launch web UI gradio, it throws this error. I am running it on Colab.
error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-04-01 09:04:18 | INFO | controller | Init controller
2023-04-01 09:04:18 | ERROR | stderr | INFO: Started server process [14623]
2023-04-01 09:04:18 | ERROR | stderr | INFO: Waiting for application startup.
2023-04-01 09:04:18 | ERROR | stderr | INFO: Application startup complete.
2023-04-01 09:04:18 | ERROR | stderr | ERROR: [Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-04-01 09:04:18 | ERROR | stderr | INFO: Waiting for application shutdown.
2023-04-01 09:04:18 | ERROR | stderr | INFO: Application shutdown complete.
Is the dataset public? Where can I find it?
Hello,
Thank you for sharing your awesome work!
I'm trying to train Vicuna on my own dataset. I walked through the installation process from source. I had to install pytorch
with cuda
11.7.0 support instead of 11.6. My server only supports cuda
11.2.2/11.4.4/11.5.2/11.7.0/11.8.0 but not 11.6.
When I try to train the 13B model with flash attention, I get CUDA OOM error even when the per_device_train_batch_size
is set to 1. I think there might be a memory leak. I also tried building flash attention from source and still got the same error.
I know this is probably a flash attention problem, but do you have any insights? Any guidance will be very much appreciated.
Best regards,
Hani
I'm not sure if langchain support is already possible with this model, but if it isn't, I would like to request that it be implemented. If it is already possible, I would like to request that documentation be added to explaining how to use it in combination with langchain.
Using langchain and llama-Index with Vicuna would be a great option for many solutions that require a lot of context and are therefor to expensive to use with an LLM API like openai.
Thank you for open sourcing such a great model.
Hi Vicuna team,
Thanks for the great work to push the LLM fine-tuning a step further. this is especially amazing as a student/research led initiative.
I found most of the evaluation targets helpfulness situation. Do you have plan to evaluate on untruthful, harmful, toxic or sensitive questions? That is the main benefit from RLHF so I'm curious if simply supervised fine-tuning on a pre-aligned GPT could also inherit the human preference learned from RLHF in the original model (gpt 3.5).
This is the output on testing 7B traiing with 8x A6000 48G. Notice the ram usage. It appears to load the fp32 model of 7B directly into memory per proc/gpu on a local instance. If using this script to launch 30B model, it requires 128GB per gpu. Pretty crazy ram usage.
As such, due to a crash below, 7B doesn't run, and forget about running 30B as the memory requirement exceed even this machine which has 512GB ram.
env:
Ubuntu 22.04
Cuda 11.8
8x A6000 48G
512GB ram
top - 10:47:42 up 3 min, 0 users, load average: 5.27, 1.96, 0.75
Tasks: 25 total, 9 running, 16 sleeping, 0 stopped, 0 zombie
%Cpu(s): 24.9 us, 19.0 sy, 0.0 ni, 54.5 id, 0.8 wa, 0.0 hi, 0.9 si, 0.0 st
MiB Mem : 515671.4 total, 300058.6 free, 213289.8 used, 2323.0 buff/cache
MiB Swap: 8011.0 total, 8011.0 free, 0.0 used. 298580.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
549 root 20 0 60.7g 25.7g 220344 R 87.0 5.1 0:54.15 python
551 root 20 0 60.7g 25.7g 220340 R 87.0 5.1 0:54.22 python
550 root 20 0 60.8g 25.7g 219780 R 86.0 5.1 0:54.16 python
552 root 20 0 60.7g 25.7g 219372 R 86.0 5.1 0:54.30 python
555 root 20 0 61.1g 26.0g 219640 R 84.2 5.2 0:53.26 python
552 root 20 0 61.1g 26.0g 219372 R 83.2 5.2 0:53.44 python
553 root 20 0 61.1g 26.0g 219212 R 83.2 5.2 0:53.24 python
556 root 20 0 61.1g 26.0g 219820 R 83.2 5.2 0:53.20 python
torchrun --nnodes=1 --nproc_per_node=8 --master_port=12345 \
fastchat/train/train.py \
--model_name_or_path /root/llama-7b-hf \
--data_path '/root/alpaca_data_cleaned.json' \
--bf16 True \
--output_dir ./checkpoints \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 200 \
--save_total_limit 100 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True
Further more, 7b runs on this machine but then it crashes here:
Traceback (most recent call last):
File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
train()
File "/root/FastChat/fastchat/train/train.py", line 315, in train
trainer.train()
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
loss = self.compute_loss(model, inputs)
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
args, kwargs = _pre_forward(
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
state._exec_order_data.record_pre_forward(handles, module.training)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
self._check_order(handles_key, is_training)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 0 parameters while rank 3 is all-gathering 1 parameters
Traceback (most recent call last):
File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
train()
File "/root/FastChat/fastchat/train/train.py", line 315, in train
trainer.train()
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
loss = self.compute_loss(model, inputs)
File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
args, kwargs = _pre_forward(
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
state._exec_order_data.record_pre_forward(handles, module.training)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
self._check_order(handles_key, is_training)
File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 1 parameters while rank 2 is all-gathering 0 parameters
I am wandering can the sharegpt data be released?
Release model weights for Vicuna-13B, so that one doesn't have to train a model themselves.
How i can get vicuna model?
After install procedure i see only one model - [opt-1.3b]
Considering using RWKV model? That's open sourced and works almost as good as Llama - as per the author.
I'm assuming there was a bump in memory performance from implementing flash attention. I can fine-tune on 8192 sequence lengths efficiently now? I would assume so.
The chatbot has some amazing capabilities that are very close to the original chatgpt
but it did inherit openai wokness too in the sense that any controversial topic is heavily leaning to a left libtard mindset
Hope to fix that or at least make it neutral.
https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/eval_gpt_review.py file expect a rule file.
FastChat/fastchat/eval/eval_gpt_review.py
Lines 77 to 87 in b720df1
Seems the rule file includes different prompt, role information. I would like to reproduce the evaluation but seems I can not find this file. Can you give some pointers?
When I run "python3 -m fastchat.serve.cli --model-name facebook/opt-1.3b", but nothing output.
How to serve using Vicuna?
Since the 13B didn't cost a lot to rain, will you consider releasing different weights to allow for more diverse setups/capabilities? For instance, 7B, 30B, 65B?
Why does the demo prompt, "YOUR INPUT VIOLATES OPENAI CONTENT MODERATION API. PLEASE TRY AGAIN." It's very strange that there would be the word OPENAI in it?
When you ask it for a few adult website URLs
When I deploy the demo, there is a problem:
after python3 -m fastchat.serve.model_worker,
I can see that the model has been loaded into the GPU, and after python3 -m fastchat.serve.gradio_web_server
, it can be simply run on the web page, but when I refresh the page, I find that the model is not loaded on the web page. And I didn't find the model through the python3 -m fastchat.serve.test_message
, but through nvidia-smi
I could see that the model was loaded into the GPU, so I had to kill the program and re-execute.
How to solve this problem?
Here's one way to get 24 from 20, 20, 2, 16 using simple addition, subtraction, multiplication, and division, using each number only once:
20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
So, 24 can be obtained by using the following calculations:
20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
I am getting a vocab size mismatch between the model (32001) and tokenizer.model (32000) as there is no added_tokens.json included with the delta weight release
Running this on Python 3.11.2 in Fedora 37.
At install the -e (-editable) command is dysfunctional.
... has a 'pyproject.toml' and its build backend is missing the 'build_editable' hook. Since it does not have a 'setup.py' nor a 'setup.cfg', it cannot be installed in editable mode. Consider using a build backend that supports PEP 660.
I'm not particularly familiar with PEP 660 or PEP 517 or else I'd just diagnose it myself. Likewise this is apparently a rare enough issue that there's little to go off of. Is there a module I'm missing? I have the wheel
module installed.
I actually successfully run merge in another machine. For the second run, I encounter some failures like below. I do think it's due to python library version issues
root@1b3744debaeb:~# python3 -m fastchat.model.apply_delta --base /root/llama-13b-hf --target /root/vicuna-13b --delta /root/vicuna-13b-delta-v0
Loading base model
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:10<00:00, 3.42s/it]
Loading delta
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:15<00:00, 5.23s/it]
Applying delta
Applying delta: 100%|████████████████████████████████████████| 403/403 [00:50<00:00, 7.92it/s]
Saving target model
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 441, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 668, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/126: file write failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 49, in <module>
apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 37, in apply_delta
base.save_pretrained(target_model_path)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1808, in save_pretrained
save_function(shard, os.path.join(save_directory, shard_file))
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 291, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 1963554048 vs 1963553936
I have few observations in the testing.
My questions is
Could maintainer give some insights?
Does this work on AMD cards? What are the GPU requirements for inference?
We would like to support ChatGLM-6B.
However, its interface is slightly different from other models.
According to its README, they implemented a custom interface for chat, and stream chat.
This requires some generalization of our current implementation.
Also, add a new prompt template here
FastChat/fastchat/conversation.py
Lines 149 to 152 in 375b8c8
resolved
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.