GithubHelp home page GithubHelp logo

lm-sys / fastchat Goto Github PK

View Code? Open in Web Editor NEW
36.2K 36.2K 4.4K 34.01 MB

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

License: Apache License 2.0

Python 90.83% Shell 0.72% Dockerfile 0.02% Jupyter Notebook 8.43%

fastchat's People

Contributors

aliasaria avatar andy-yang-1 avatar babychousr avatar bofenghuang avatar codingwithtim avatar congchan avatar dachengli1 avatar fozziethebeat avatar hzg0601 avatar imoneoi avatar infwinston avatar jingsong-yan avatar jondurbin avatar leiwen83 avatar lewtun avatar lisadunlap avatar liunux4odoo avatar merrymercy avatar michaelvll avatar mingfang avatar nielstron avatar siddartha-re avatar steve-tech avatar suquark avatar surak avatar thelinuxkid avatar wangshuai09 avatar ying1123 avatar zhisbug avatar zyhowell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastchat's Issues

no right to download checkpoint

gsutil cp gs://skypilot-chatbot/chatbot/13b/ckpt/added_tokens.json ./
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)

Can the training script be used for Lora version training?

Hi there,

I was looking at the README file and I noticed that the codebase is using Stanford Alpaca's fine-tuning code with some modifications. I also saw that you mentioned that the hyperparameters used for training are similar to those used in Stanford Alpaca.

I was wondering if the training script can be used for Lora version training? I believe that the changes made to support gradient checkpointing and Flash Attention can make Lora training much faster. Could you please confirm if this is possible?

Thank you!

Training Script also for GPT2 / BLOOM

Hi there. As I wanted to Not finetune a llama, but rather a gpt / Bloom in my language, I was wondering If I could use the train functionality? Does it Work? What do I need to consider except for translating the dataset? Thank you for your time!!

TypeError: forward() got an unexpected keyword argument 'position_ids'

I run fine tune scripts on a 4 * A100 80G machine and meet this issue, can someone help take a look?

root@1cafe085c343:~/FastChat# torchrun --nnodes=1 --nproc_per_node=4 --master_port=3124     fastchat/train/train_mem.py     --model_name_or_path /root/llama-13b-hf     --data_path /root/sharegpt_vicuna/sharegpt_20230401_clean_lang_split.json     --bf16 True     --output_dir ./checkpoints     --num_train_epochs 3     --per_device_train_batch_size 4     --per_device_eval_batch_size 4     --gradient_accumulation_steps 2     --evaluation_strategy "no"     --save_strategy "steps"     --save_steps 1200     --save_total_limit 100     --learning_rate 2e-5     --weight_decay 0.     --warmup_ratio 0.03     --lr_scheduler_type "cosine"     --logging_steps 1     --fsdp "full_shard auto_wrap"     --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'     --tf32 True     --model_max_length 2048     --gradient_checkpointing True     --lazy_preprocess True
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.46s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards:  33%|████████████████████████████████████████████                                                                                        | 1/3 [00:15<00:30, 15.33s/it]WARNING:root:Loading data...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.59s/it]
Using pad_token, but it is not set yet.
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Loading data...
Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████████████████████████████████                                            | 2/3 [00:26<00:12, 12.95s/it]WARNING:root:Formatting inputs...Skip in lazy mode
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:34<00:00, 11.39s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:37<00:00, 12.66s/it]
Using pad_token, but it is not set yet.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Formatting inputs...Skip in lazy mode
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.14.0
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
  0%|                                                                                                                                                                   | 0/11814 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    loss = self.compute_loss(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    loss = self.compute_loss(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
Traceback (most recent call last):
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
        loss = self.compute_loss(model, inputs)return CheckpointFunction.apply(function, preserve, *args)

  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    train()
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
        output = self._fsdp_wrapped_module(*args, **kwargs)return forward_call(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
    outputs = self.model(
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
      File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
    return module(*inputs, output_attentions, None)
return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
TypeError: forward() got an unexpected keyword argument 'position_ids'
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
wandb: Waiting for W&B process to finish... (failed 1).

Licensing Question

I've had a lot of confusion around this myself. I noticed the license for fastchat is Apache 2.0

Vicuna is fine tuned on LLaMa which I believe is open for research, but not for commercial use.

Are Vicuna and any derived products open source for any usage?

Can it run on macbook?

I am getting those errors, naturally I do no have CUDA on macbook. Are there any steps to take to recompile torch with MPS enabled or can you perform checks for this in FastChat?

    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

merhaba doslar

Merhaba yazılım öğrenmek istiyorum yardımcı olacak ders vermek isteyen arkadaş bulmak isterim

Evaluation scripts can not successfully run

Meet a few issues in the evaluation

ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-13b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer.jsonl
Traceback (most recent call last):
  File "model_qa.py", line 61, in <module>
    eval_model(args.model_name, args.question_file, args.answers_file)
AttributeError: 'Namespace' object has no attribute 'answers_file'
ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-7b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer_llama-7b.jsonl
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.16s/it]
Traceback (most recent call last):
  File "model_qa.py", line 61, in <module>
    eval_model(args.model_name, args.question_file, args.answer_file)
  File "/usr/lib/python3/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "model_qa.py", line 22, in eval_model
    ques_file = open(os.path.expanduser(questions_file), "r")
FileNotFoundError: [Errno 2] No such file or directory: 'tables/question.jsonl'

only releasing the lighter weights?

I am amazed at your accomplishments, and again amazed at the clean results of the demo you provided.

So I was looking forward to running this demo in my machine.
unfortunately there was no weight released, but I'm hopeful that they will release it soon.

when I look at the answer about releasing the model, it says that you will release the model after the light-weighting is complete. So will it be a different model than what we're experiencing in the demo now, or the weights of the VICUNA-13B?

Colab notebook for demo

Thanks a ton team for the release of model. Any notebook for demo? Some steps about mentioning model paths are a bit confusing. thanks

Connection errored out on a AWS Sagemaker notebook

When going through the steps for Web UI, I get the following output in the console:

image

When I open in the browser I get this.

image

Any quick solutions for this?

I run it on AWS Sagemaker notebook with GPU.

Dalai vs Vicuna

Can't you make a low-requirement version like Dalai? In addition, this coin limit is very boring. Can't you remove this?

Error while attempting to bind on address on Colab notebook

Thanks for this release.

When I try to launch web UI gradio, it throws this error. I am running it on Colab.
error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address

2023-04-01 09:04:18 | INFO | controller | Init controller
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Started server process [14623]
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Waiting for application startup.
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Application startup complete.
2023-04-01 09:04:18 | ERROR | stderr | ERROR:    [Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Waiting for application shutdown.
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Application shutdown complete.

CUDA OOM When Using Flash Attention

Hello,

Thank you for sharing your awesome work!

I'm trying to train Vicuna on my own dataset. I walked through the installation process from source. I had to install pytorch with cuda 11.7.0 support instead of 11.6. My server only supports cuda 11.2.2/11.4.4/11.5.2/11.7.0/11.8.0 but not 11.6.

When I try to train the 13B model with flash attention, I get CUDA OOM error even when the per_device_train_batch_size is set to 1. I think there might be a memory leak. I also tried building flash attention from source and still got the same error.

I know this is probably a flash attention problem, but do you have any insights? Any guidance will be very much appreciated.

Best regards,
Hani

Langchain Support

I'm not sure if langchain support is already possible with this model, but if it isn't, I would like to request that it be implemented. If it is already possible, I would like to request that documentation be added to explaining how to use it in combination with langchain.

Using langchain and llama-Index with Vicuna would be a great option for many solutions that require a lot of context and are therefor to expensive to use with an LLM API like openai.

Thank you for open sourcing such a great model.

Evaluation on untruthful, harmful, toxic or sensitive questions

Hi Vicuna team,

Thanks for the great work to push the LLM fine-tuning a step further. this is especially amazing as a student/research led initiative.

I found most of the evaluation targets helpfulness situation. Do you have plan to evaluate on untruthful, harmful, toxic or sensitive questions? That is the main benefit from RLHF so I'm curious if simply supervised fine-tuning on a pre-aligned GPT could also inherit the human preference learned from RLHF in the original model (gpt 3.5).

7B/30B test training and massive ram usage directly proportional to model

This is the output on testing 7B traiing with 8x A6000 48G. Notice the ram usage. It appears to load the fp32 model of 7B directly into memory per proc/gpu on a local instance. If using this script to launch 30B model, it requires 128GB per gpu. Pretty crazy ram usage.

As such, due to a crash below, 7B doesn't run, and forget about running 30B as the memory requirement exceed even this machine which has 512GB ram.

env:
Ubuntu 22.04
Cuda 11.8
8x A6000 48G
512GB ram
top - 10:47:42 up 3 min,  0 users,  load average: 5.27, 1.96, 0.75
Tasks:  25 total,   9 running,  16 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.9 us, 19.0 sy,  0.0 ni, 54.5 id,  0.8 wa,  0.0 hi,  0.9 si,  0.0 st
MiB Mem : 515671.4 total, 300058.6 free, 213289.8 used,   2323.0 buff/cache
MiB Swap:   8011.0 total,   8011.0 free,      0.0 used. 298580.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                               
    549 root      20   0   60.7g  25.7g 220344 R  87.0   5.1   0:54.15 python                                
    551 root      20   0   60.7g  25.7g 220340 R  87.0   5.1   0:54.22 python                                
    550 root      20   0   60.8g  25.7g 219780 R  86.0   5.1   0:54.16 python                                
    552 root      20   0   60.7g  25.7g 219372 R  86.0   5.1   0:54.30 python                                
    555 root      20   0   61.1g  26.0g 219640 R  84.2   5.2   0:53.26 python                                
    552 root      20   0   61.1g  26.0g 219372 R  83.2   5.2   0:53.44 python                                
    553 root      20   0   61.1g  26.0g 219212 R  83.2   5.2   0:53.24 python                                
    556 root      20   0   61.1g  26.0g 219820 R  83.2   5.2   0:53.20 python 
torchrun --nnodes=1 --nproc_per_node=8 --master_port=12345 \
    fastchat/train/train.py \
    --model_name_or_path /root/llama-7b-hf \
    --data_path '/root/alpaca_data_cleaned.json' \
    --bf16 True \
    --output_dir ./checkpoints \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

Further more, 7b runs on this machine but then it crashes here:

Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
    args, kwargs = _pre_forward(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
    state._exec_order_data.record_pre_forward(handles, module.training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
    self._check_order(handles_key, is_training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
    raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 0 parameters while rank 3 is all-gathering 1 parameters
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
    args, kwargs = _pre_forward(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
    state._exec_order_data.record_pre_forward(handles, module.training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
    self._check_order(handles_key, is_training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
    raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 1 parameters while rank 2 is all-gathering 0 parameters

Release model weights

Release model weights for Vicuna-13B, so that one doesn't have to train a model themselves.

RWKV

Considering using RWKV model? That's open sourced and works almost as good as Llama - as per the author.

training on longer sequences

I'm assuming there was a bump in memory performance from implementing flash attention. I can fine-tune on 8192 sequence lengths efficiently now? I would assume so.

Too woke

The chatbot has some amazing capabilities that are very close to the original chatgpt
but it did inherit openai wokness too in the sense that any controversial topic is heavily leaning to a left libtard mindset
Hope to fix that or at least make it neutral.

rule file to generate review with GPT-4 is missing

https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/eval_gpt_review.py file expect a rule file.

category = json.loads(ques_js)['category']
if category in rule_dict:
rule = rule_dict[category]
else:
rule = rule_dict['default']
prompt = rule['prompt']
role = rule['role']
content = (f'[Question]\n{ques}\n\n'
f'[{role} 1]\n{ans1}\n\n[End of {role} 1]\n\n'
f'[{role} 2]\n{ans2}\n\n[End of {role} 2]\n\n'
f'[System]\n{prompt}\n\n')

Seems the rule file includes different prompt, role information. I would like to reproduce the evaluation but seems I can not find this file. Can you give some pointers?

Are you going to release other weights?

Since the 13B didn't cost a lot to rain, will you consider releasing different weights to allow for more diverse setups/capabilities? For instance, 7B, 30B, 65B?

the model is found to be loaded,but not found on the web page

When I deploy the demo, there is a problem:
after python3 -m fastchat.serve.model_worker, I can see that the model has been loaded into the GPU, and after python3 -m fastchat.serve.gradio_web_server, it can be simply run on the web page, but when I refresh the page, I find that the model is not loaded on the web page. And I didn't find the model through the python3 -m fastchat.serve.test_message, but through nvidia-smi I could see that the model was loaded into the GPU, so I had to kill the program and re-execute.
How to solve this problem?

Whether it is a partial AI verification question

  • How to get 24 from 20,20,2,16 by simple addition, subtraction, multiplication and division, using each number only once.
Here's one way to get 24 from 20, 20, 2, 16 using simple addition, subtraction, multiplication, and division, using each number only once:
20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52

So, 24 can be obtained by using the following calculations:

20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52

Missing added_tokens.json

I am getting a vocab size mismatch between the model (32001) and tokenizer.model (32000) as there is no added_tokens.json included with the delta weight release

PEP 660 issue.

Running this on Python 3.11.2 in Fedora 37.

At install the -e (-editable) command is dysfunctional.

... has a 'pyproject.toml' and its build backend is missing the 'build_editable' hook. Since it does not have a 'setup.py' nor a 'setup.cfg', it cannot be installed in editable mode. Consider using a build backend that supports PEP 660.

I'm not particularly familiar with PEP 660 or PEP 517 or else I'd just diagnose it myself. Likewise this is apparently a rare enough issue that there's little to go off of. Is there a module I'm missing? I have the wheel module installed.

delta weight merge failure

I actually successfully run merge in another machine. For the second run, I encounter some failures like below. I do think it's due to python library version issues

root@1b3744debaeb:~# python3 -m fastchat.model.apply_delta --base /root/llama-13b-hf --target /root/vicuna-13b --delta /root/vicuna-13b-delta-v0
Loading base model
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:10<00:00,  3.42s/it]
Loading delta
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:15<00:00,  5.23s/it]
Applying delta
Applying delta: 100%|████████████████████████████████████████| 403/403 [00:50<00:00,  7.92it/s]
Saving target model
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 668, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/126: file write failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 49, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 37, in apply_delta
    base.save_pretrained(target_model_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1808, in save_pretrained
    save_function(shard, os.path.join(save_directory, shard_file))
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 291, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 1963554048 vs 1963553936

Why Vicuna's chinese language competence is much better than Alpaca?

I have few observations in the testing.

  1. Vicuna can correctly type Chinese if the input is Chinese, while, Alpaca generate English for Chinese questions frequently.
  2. Vicuna seems trained on some chinese dataset because it does tell helpful information.

My questions is

  1. Sharegpt only have 70k conversations. I don't think it have many chinese inputs. Why vicuan outperform alpaca a lot?
  2. I personally finetune alpaca 13B with chinese dataset BELLE 0.5M https://github.com/LianjiaTech/BELLE. Even though, I feel Vicuna is a little bit better (Vicuna gives details description because it uses 2048 token size). This is incredible because Vicuna is not finetune with large chinese dataset like BELLE at all.

Could maintainer give some insights?

Vicuna

image
image

Alpaca

image

AMD Support

Does this work on AMD cards? What are the GPU requirements for inference?

Support ChatGLM

We would like to support ChatGLM-6B.

However, its interface is slightly different from other models.
According to its README, they implemented a custom interface for chat, and stream chat.
This requires some generalization of our current implementation.

Also, add a new prompt template here

conv_templates = {
"v1": conv_v1_2,
"bair_v1": conv_bair_v1,
}

Weird responses from Assistant

Thanks team for your efforts in this release. I am getting weird responses from Assistant. I am not sure if it is a bug. These responses are totally different from responses in the demo. I am running it on colab. Do you have working demo version on colab?

fastchat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.