lm-sys / fastchat Goto Github PK

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

License: Apache License 2.0

Python 90.83% Shell 0.72% Dockerfile 0.02% Jupyter Notebook 8.43%

fastchat's People

Contributors

Stargazers

Watchers

Forkers

vhanla cygwynd winglian djabdou50 friendmine yqty spiritbroski jprobichaud hbcbh1999 mistobaan hanbub trigrass2 marcus-arcadius nasakim air23zj shanhedian2017 soliujing celeste-cj hugothomel grasshourse to-be-architect assassindesign markchou dan255 ftgreat zurichrain machinelearningsystem kokizzu xuejj suryatmodulus kirillkazakov8 pixelkaiser adonishing hertera1 piotrlnordea bananemure hruzgar ylinlinz applelisa83 huang3eng kzke sizzles novaswang git-models pluiez zhiguohe isimsizolan dmartinezh97 xieren58 gamedevboy wangzhiwei-ai pjahad pgenvil efjerryyang saradhix hercules261188 haikuoxin rapidai datacraft-ai fabiorizzomatos gitbenxing beavis28 cyoyo-geek zehebi29 anhmike deerme jackhwl deylies paris0120 timatom billschumacher maozhitao annihilatorrrr matteo-grella musajoemo abeizn aklikais galtlab sneaking learning-group1 guruace markhng525 tailaw manzanita-ai kalufinnle erickwan starlab-llm waleking neurlnetworker mehyar500 deniska83 erivandev nangal fastchat-fork-org gufmar jcarlosneto paulwang1905 yuanli1 ittican-org jaidsar

fastchat's Issues

NETWORK ERROR. PLEASE REGENERATE OR REFRESH THIS PAGE.

Demo，https://chat.lmsys.org/ “NETWORK ERROR. PLEASE REGENERATE OR REFRESH THIS PAGE. ”

no right to download checkpoint

gsutil cp gs://skypilot-chatbot/chatbot/13b/ckpt/added_tokens.json ./
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)

Can the training script be used for Lora version training?

Hi there,

I was looking at the README file and I noticed that the codebase is using Stanford Alpaca's fine-tuning code with some modifications. I also saw that you mentioned that the hyperparameters used for training are similar to those used in Stanford Alpaca.

I was wondering if the training script can be used for Lora version training? I believe that the changes made to support gradient checkpointing and Flash Attention can make Lora training much faster. Could you please confirm if this is possible?

Thank you!

Training Script also for GPT2 / BLOOM

Hi there. As I wanted to Not finetune a llama, but rather a gpt / Bloom in my language, I was wondering If I could use the train functionality? Does it Work? What do I need to consider except for translating the dataset? Thank you for your time!!

Koala 13B?

https://chat.lmsys.org/ has "koala-13b" as a model option. I can't find any information about this one; can you explain this model?

TypeError: forward() got an unexpected keyword argument 'position_ids'

I run fine tune scripts on a 4 * A100 80G machine and meet this issue, can someone help take a look?

root@1cafe085c343:~/FastChat# torchrun --nnodes=1 --nproc_per_node=4 --master_port=3124     fastchat/train/train_mem.py     --model_name_or_path /root/llama-13b-hf     --data_path /root/sharegpt_vicuna/sharegpt_20230401_clean_lang_split.json     --bf16 True     --output_dir ./checkpoints     --num_train_epochs 3     --per_device_train_batch_size 4     --per_device_eval_batch_size 4     --gradient_accumulation_steps 2     --evaluation_strategy "no"     --save_strategy "steps"     --save_steps 1200     --save_total_limit 100     --learning_rate 2e-5     --weight_decay 0.     --warmup_ratio 0.03     --lr_scheduler_type "cosine"     --logging_steps 1     --fsdp "full_shard auto_wrap"     --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'     --tf32 True     --model_max_length 2048     --gradient_checkpointing True     --lazy_preprocess True
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
/transformers/src/transformers/training_args.py:1356: FutureWarning: using `--fsdp_transformer_layer_cls_to_wrap` is deprecated. Use fsdp_config instead
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.46s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards:  33%|████████████████████████████████████████████                                                                                        | 1/3 [00:15<00:30, 15.33s/it]WARNING:root:Loading data...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.59s/it]
Using pad_token, but it is not set yet.
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Loading data...
Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████████████████████████████████                                            | 2/3 [00:26<00:12, 12.95s/it]WARNING:root:Formatting inputs...Skip in lazy mode
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:34<00:00, 11.39s/it]
Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:37<00:00, 12.66s/it]
Using pad_token, but it is not set yet.
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Formatting inputs...Skip in lazy mode
WARNING:root:Formatting inputs...Skip in lazy mode
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.14.0
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
  0%|                                                                                                                                                                   | 0/11814 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    loss = self.compute_loss(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    trainer.train()
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    loss = self.compute_loss(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
    tr_loss_step = self.training_step(model, inputs)
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
Traceback (most recent call last):
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
        loss = self.compute_loss(model, inputs)return CheckpointFunction.apply(function, preserve, *args)

  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
  File "/root/FastChat/fastchat/train/train_mem.py", line 12, in <module>
    train()
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/transformers/src/transformers/trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
  File "/transformers/src/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
        output = self._fsdp_wrapped_module(*args, **kwargs)return forward_call(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
    outputs = self.model(
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
      File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
    return module(*inputs, output_attentions, None)
return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
    return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
    outputs = run_function(*args)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
    return module(*inputs, output_attentions, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
TypeError: forward() got an unexpected keyword argument 'position_ids'
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids'
wandb: Waiting for W&B process to finish... (failed 1).

Licensing Question

I've had a lot of confusion around this myself. I noticed the license for fastchat is Apache 2.0

Vicuna is fine tuned on LLaMa which I believe is open for research, but not for commercial use.

Are Vicuna and any derived products open source for any usage?

Can it run on macbook?

I am getting those errors, naturally I do no have CUDA on macbook. Are there any steps to take to recompile torch with MPS enabled or can you perform checks for this in FastChat?

    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

some parameters in 'clean_sharegpt.py'?

When using the script "clean_sharegpt.py", I found that some parameters are not given, can you give me an example? Thank you.

merhaba doslar

Merhaba yazılım öğrenmek istiyorum yardımcı olacak ders vermek isteyen arkadaş bulmak isterim

An example that Vicuna did much worse than chatGPT

as titled.
from Vicuna:

from chatGPT:

I am wondering whether we can do a deep dive into this case as it's a normal query user would ask.
temperature: 0.7; max output tokens: 512

Evaluation scripts can not successfully run

Meet a few issues in the evaluation

ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-13b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer.jsonl
Traceback (most recent call last):
  File "model_qa.py", line 61, in <module>
    eval_model(args.model_name, args.question_file, args.answers_file)
AttributeError: 'Namespace' object has no attribute 'answers_file'

ubuntu@152-70-114-166:~/FastChat/fastchat/eval$ python model_qa.py --model-name $HOME/llama-7b-hf/ --question-file tables/question.jsonl --answer-file table/answer/answer_llama-7b.jsonl
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.16s/it]
Traceback (most recent call last):
  File "model_qa.py", line 61, in <module>
    eval_model(args.model_name, args.question_file, args.answer_file)
  File "/usr/lib/python3/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "model_qa.py", line 22, in eval_model
    ques_file = open(os.path.expanduser(questions_file), "r")
FileNotFoundError: [Errno 2] No such file or directory: 'tables/question.jsonl'

only releasing the lighter weights?

I am amazed at your accomplishments, and again amazed at the clean results of the demo you provided.

So I was looking forward to running this demo in my machine.
unfortunately there was no weight released, but I'm hopeful that they will release it soon.

when I look at the answer about releasing the model, it says that you will release the model after the light-weighting is complete. So will it be a different model than what we're experiencing in the demo now, or the weights of the VICUNA-13B?

Colab notebook for demo

Thanks a ton team for the release of model. Any notebook for demo? Some steps about mentioning model paths are a bit confusing. thanks

improve readme file readability

Separate all commands and text to improve readability.

workers cant' be found after some period of time

I think last commit (bf4e67e) broke the WebUI pipeline, workers can't be found after some period of time. error disappears after reverting this commit to a previous one.

Connection errored out on a AWS Sagemaker notebook

When going through the steps for Web UI, I get the following output in the console:

When I open in the browser I get this.

Any quick solutions for this?

I run it on AWS Sagemaker notebook with GPU.

UI contains cross-site scripting (XSS) vulnerabilities

the UI is not filtering input/output appropriately

Dalai vs Vicuna

Can't you make a low-requirement version like Dalai? In addition, this coin limit is very boring. Can't you remove this?

Typo in the Readme # Luanch a gradio web server.

I just noticed a typo in the readme.

# Luanch a gradio web server.

I'm guessing it is meant to be launch.

Add colour feature and font size

Hi,
Could you add a colour feature to the webpage and a font-size feature as well?
Thx.

Error while attempting to bind on address on Colab notebook

Thanks for this release.

When I try to launch web UI gradio, it throws this error. I am running it on Colab.
error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address

2023-04-01 09:04:18 | INFO | controller | Init controller
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Started server process [14623]
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Waiting for application startup.
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Application startup complete.
2023-04-01 09:04:18 | ERROR | stderr | ERROR:    [Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Waiting for application shutdown.
2023-04-01 09:04:18 | ERROR | stderr | INFO:     Application shutdown complete.

Training dataset for Vicuña?

Is the dataset public? Where can I find it?

CUDA OOM When Using Flash Attention

Hello,

Thank you for sharing your awesome work!

I'm trying to train Vicuna on my own dataset. I walked through the installation process from source. I had to install pytorch with cuda 11.7.0 support instead of 11.6. My server only supports cuda 11.2.2/11.4.4/11.5.2/11.7.0/11.8.0 but not 11.6.

When I try to train the 13B model with flash attention, I get CUDA OOM error even when the per_device_train_batch_size is set to 1. I think there might be a memory leak. I also tried building flash attention from source and still got the same error.

I know this is probably a flash attention problem, but do you have any insights? Any guidance will be very much appreciated.

Best regards,
Hani

Langchain Support

I'm not sure if langchain support is already possible with this model, but if it isn't, I would like to request that it be implemented. If it is already possible, I would like to request that documentation be added to explaining how to use it in combination with langchain.

Using langchain and llama-Index with Vicuna would be a great option for many solutions that require a lot of context and are therefor to expensive to use with an LLM API like openai.

Thank you for open sourcing such a great model.

Evaluation on untruthful, harmful, toxic or sensitive questions

Hi Vicuna team,

Thanks for the great work to push the LLM fine-tuning a step further. this is especially amazing as a student/research led initiative.

I found most of the evaluation targets helpfulness situation. Do you have plan to evaluate on untruthful, harmful, toxic or sensitive questions? That is the main benefit from RLHF so I'm curious if simply supervised fine-tuning on a pre-aligned GPT could also inherit the human preference learned from RLHF in the original model (gpt 3.5).

7B/30B test training and massive ram usage directly proportional to model

This is the output on testing 7B traiing with 8x A6000 48G. Notice the ram usage. It appears to load the fp32 model of 7B directly into memory per proc/gpu on a local instance. If using this script to launch 30B model, it requires 128GB per gpu. Pretty crazy ram usage.

As such, due to a crash below, 7B doesn't run, and forget about running 30B as the memory requirement exceed even this machine which has 512GB ram.

env:
Ubuntu 22.04
Cuda 11.8
8x A6000 48G
512GB ram

top - 10:47:42 up 3 min,  0 users,  load average: 5.27, 1.96, 0.75
Tasks:  25 total,   9 running,  16 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.9 us, 19.0 sy,  0.0 ni, 54.5 id,  0.8 wa,  0.0 hi,  0.9 si,  0.0 st
MiB Mem : 515671.4 total, 300058.6 free, 213289.8 used,   2323.0 buff/cache
MiB Swap:   8011.0 total,   8011.0 free,      0.0 used. 298580.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                               
    549 root      20   0   60.7g  25.7g 220344 R  87.0   5.1   0:54.15 python                                
    551 root      20   0   60.7g  25.7g 220340 R  87.0   5.1   0:54.22 python                                
    550 root      20   0   60.8g  25.7g 219780 R  86.0   5.1   0:54.16 python                                
    552 root      20   0   60.7g  25.7g 219372 R  86.0   5.1   0:54.30 python                                
    555 root      20   0   61.1g  26.0g 219640 R  84.2   5.2   0:53.26 python                                
    552 root      20   0   61.1g  26.0g 219372 R  83.2   5.2   0:53.44 python                                
    553 root      20   0   61.1g  26.0g 219212 R  83.2   5.2   0:53.24 python                                
    556 root      20   0   61.1g  26.0g 219820 R  83.2   5.2   0:53.20 python

torchrun --nnodes=1 --nproc_per_node=8 --master_port=12345 \
    fastchat/train/train.py \
    --model_name_or_path /root/llama-7b-hf \
    --data_path '/root/alpaca_data_cleaned.json' \
    --bf16 True \
    --output_dir ./checkpoints \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

Further more, 7b runs on this machine but then it crashes here:

Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
    args, kwargs = _pre_forward(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
    state._exec_order_data.record_pre_forward(handles, module.training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
    self._check_order(handles_key, is_training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
    raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 0 parameters while rank 3 is all-gathering 1 parameters
Traceback (most recent call last):
  File "/root/FastChat/fastchat/train/train.py", line 322, in <module>
    train()
  File "/root/FastChat/fastchat/train/train.py", line 315, in train
    trainer.train()
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 757, in forward
    args, kwargs = _pre_forward(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 407, in _pre_forward
    state._exec_order_data.record_pre_forward(handles, module.training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 180, in record_pre_forward
    self._check_order(handles_key, is_training)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/distributed/fsdp/_exec_order_utils.py", line 245, in _check_order
    raise RuntimeError(
RuntimeError: Forward order differs across ranks: rank 0 is all-gathering 1 parameters while rank 2 is all-gathering 0 parameters

Can you release the sharegpt dataset?

I am wandering can the sharegpt data be released?

Release model weights

Release model weights for Vicuna-13B, so that one doesn't have to train a model themselves.

How i can get vicuna model?

How i can get vicuna model?
After install procedure i see only one model - [opt-1.3b]

RWKV

Considering using RWKV model? That's open sourced and works almost as good as Llama - as per the author.

training on longer sequences

I'm assuming there was a bump in memory performance from implementing flash attention. I can fine-tune on 8192 sequence lengths efficiently now? I would assume so.

Too woke

The chatbot has some amazing capabilities that are very close to the original chatgpt
but it did inherit openai wokness too in the sense that any controversial topic is heavily leaning to a left libtard mindset
Hope to fix that or at least make it neutral.

rule file to generate review with GPT-4 is missing

https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/eval_gpt_review.py file expect a rule file.

FastChat/fastchat/eval/eval_gpt_review.py

Lines 77 to 87 in b720df1

 category = json.loads(ques_js)['category'] 

 if category in rule_dict: 

 rule = rule_dict[category] 

 else: 

 rule = rule_dict['default'] 

 prompt = rule['prompt'] 

 role = rule['role'] 

 content = (f'[Question]\n{ques}\n\n' 

 f'[{role} 1]\n{ans1}\n\n[End of {role} 1]\n\n' 

 f'[{role} 2]\n{ans2}\n\n[End of {role} 2]\n\n' 

 f'[System]\n{prompt}\n\n')

Seems the rule file includes different prompt, role information. I would like to reproduce the evaluation but seems I can not find this file. Can you give some pointers?

I can't run "python3 -m fastchat.serve.cli --model-name facebook/opt-1.3b"

When I run "python3 -m fastchat.serve.cli --model-name facebook/opt-1.3b", but nothing output.

How to use Vicuna?

How to serve using Vicuna?

Are you going to release other weights?

Since the 13B didn't cost a lot to rain, will you consider releasing different weights to allow for more diverse setups/capabilities? For instance, 7B, 30B, 65B?

Why does the demo prompt, "YOUR INPUT VIOLATES OPENAI CONTENT MODERATION API. PLEASE TRY AGAIN."

Why does the demo prompt, "YOUR INPUT VIOLATES OPENAI CONTENT MODERATION API. PLEASE TRY AGAIN." It's very strange that there would be the word OPENAI in it?

When you ask it for a few adult website URLs

the model is found to be loaded，but not found on the web page

When I deploy the demo, there is a problem:
after python3 -m fastchat.serve.model_worker, I can see that the model has been loaded into the GPU, and after python3 -m fastchat.serve.gradio_web_server, it can be simply run on the web page, but when I refresh the page, I find that the model is not loaded on the web page. And I didn't find the model through the python3 -m fastchat.serve.test_message, but through nvidia-smi I could see that the model was loaded into the GPU, so I had to kill the program and re-execute.
How to solve this problem?

Whether it is a partial AI verification question

How to get 24 from 20,20,2,16 by simple addition, subtraction, multiplication and division, using each number only once.

Here's one way to get 24 from 20, 20, 2, 16 using simple addition, subtraction, multiplication, and division, using each number only once:
20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52

So, 24 can be obtained by using the following calculations:

20 + 20 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52
52 + 2 = 54
54 - 16 = 38
38 * 2 = 76
76 / 2 = 38
38 + 2 = 40
40 - 2 = 38
38 * 2 = 76
76 / 2 = 38
38 + 16 = 54
54 - 2 = 52
52 * 2 = 104
104 / 2 = 52

Missing added_tokens.json

I am getting a vocab size mismatch between the model (32001) and tokenizer.model (32000) as there is no added_tokens.json included with the delta weight release

Move content moderation warning messages to the chatbot box

When inputting anything close to something that would be moderated by ChatGPT we see this in the text box.

It would be nicer to have this display in the chat content area instead of having it change the input box.

PEP 660 issue.

Running this on Python 3.11.2 in Fedora 37.

At install the -e (-editable) command is dysfunctional.

... has a 'pyproject.toml' and its build backend is missing the 'build_editable' hook. Since it does not have a 'setup.py' nor a 'setup.cfg', it cannot be installed in editable mode. Consider using a build backend that supports PEP 660.

I'm not particularly familiar with PEP 660 or PEP 517 or else I'd just diagnose it myself. Likewise this is apparently a rare enough issue that there's little to go off of. Is there a module I'm missing? I have the wheel module installed.

delta weight merge failure

I actually successfully run merge in another machine. For the second run, I encounter some failures like below. I do think it's due to python library version issues

root@1b3744debaeb:~# python3 -m fastchat.model.apply_delta --base /root/llama-13b-hf --target /root/vicuna-13b --delta /root/vicuna-13b-delta-v0
Loading base model
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:10<00:00,  3.42s/it]
Loading delta
Loading checkpoint shards: 100%|█████████████████████████████████| 3/3 [00:15<00:00,  5.23s/it]
Applying delta
Applying delta: 100%|████████████████████████████████████████| 403/403 [00:50<00:00,  7.92it/s]
Saving target model
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 668, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/126: file write failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 49, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/usr/local/lib/python3.10/dist-packages/fastchat/model/apply_delta.py", line 37, in apply_delta
    base.save_pretrained(target_model_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1808, in save_pretrained
    save_function(shard, os.path.join(save_directory, shard_file))
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 291, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:337] . unexpected pos 1963554048 vs 1963553936

Why Vicuna's chinese language competence is much better than Alpaca?

I have few observations in the testing.

Vicuna can correctly type Chinese if the input is Chinese, while, Alpaca generate English for Chinese questions frequently.
Vicuna seems trained on some chinese dataset because it does tell helpful information.

My questions is

Sharegpt only have 70k conversations. I don't think it have many chinese inputs. Why vicuan outperform alpaca a lot?
I personally finetune alpaca 13B with chinese dataset BELLE 0.5M https://github.com/LianjiaTech/BELLE. Even though, I feel Vicuna is a little bit better (Vicuna gives details description because it uses 2048 token size). This is incredible because Vicuna is not finetune with large chinese dataset like BELLE at all.

Could maintainer give some insights?

Vicuna

Alpaca

AMD Support

Does this work on AMD cards? What are the GPU requirements for inference?

Support ChatGLM

We would like to support ChatGLM-6B.

However, its interface is slightly different from other models.
According to its README, they implemented a custom interface for chat, and stream chat.
This requires some generalization of our current implementation.

Also, add a new prompt template here

FastChat/fastchat/conversation.py

Lines 149 to 152 in 375b8c8

 conv_templates = { 

 "v1": conv_v1_2, 

 "bair_v1": conv_bair_v1, 

 }

Mediocre quality output ?

Weird responses from Assistant

Thanks team for your efforts in this release. I am getting weird responses from Assistant. I am not sure if it is a bug. These responses are totally different from responses in the demo. I am running it on colab. Do you have working demo version on colab?

Resolved

resolved

	category = json.loads(ques_js)['category']
	if category in rule_dict:
	rule = rule_dict[category]
	else:
	rule = rule_dict['default']
	prompt = rule['prompt']
	role = rule['role']
	content = (f'[Question]\n{ques}\n\n'
	f'[{role} 1]\n{ans1}\n\n[End of {role} 1]\n\n'
	f'[{role} 2]\n{ans2}\n\n[End of {role} 2]\n\n'
	f'[System]\n{prompt}\n\n')

lm-sys / fastchat Goto Github PK

fastchat's People

Contributors

Stargazers

Watchers

Forkers

fastchat's Issues

Vicuna

Alpaca

Recommend Projects

Recommend Topics

Recommend Org

Jobs