tatsu-lab / stanford_alpaca Goto Github PK
View Code? Open in Web Editor NEWCode and documentation to train Stanford's Alpaca models, and generate the data.
Home Page: https://crfm.stanford.edu/2023/03/13/alpaca.html
License: Apache License 2.0
Code and documentation to train Stanford's Alpaca models, and generate the data.
Home Page: https://crfm.stanford.edu/2023/03/13/alpaca.html
License: Apache License 2.0
Thanks for the greate job to help us easy to finetune on llama
I found that the training data is just single turn, does support for the mulit-turn data like OIG
Any plan?
How much vRAM does finetuning LLaMa 7B require?
What was the hardware used to train Alpaca?
In ReadMe.md I saw that the model is finetuned using Stanford hugging face setup. I tried it but getting this error. Could someone help in calling Llama weights using hf
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Bitsy/llama-7b-hfcompatible-clean")
Error :
KeyError Traceback (most recent call last)
in
1 from transformers import AutoModelForCausalLM
2
----> 3 model = AutoModelForCausalLM.from_pretrained("Bitsy/llama-7b-hfcompatible-clean")
2 frames
/usr/local/lib/python3.9/dist-packages/transformers/models/auto/configuration_auto.py in getitem(self, key)
577 return self._extra_content[key]
578 if key not in self._mapping:
--> 579 raise KeyError(key)
580 value = self._mapping[key]
581 module_name = model_type_to_module_name(key)
KeyError: 'llama'
Thanks for the great work, I reproduced the training, but at inference time tends to generate shorter text. I am using:
generated = model.generate(batch["input_ids"], max_length=512)
Does the interface on the demo web page adjust other kwargs?
Thanks
Dear @ALL
Due to OOM as mentioned in previous issues, who can finetune LLaMA using bitsandbytes for an 8-bit setting on a single 3090?
If yes, please share your experiments and experience.
Best regards,
Linh
I encountered the CUDA OOM on a single core A100 80G using your training code? Can i fix this by changing anything?
From your training code, the output label and input label is the same. Where do you shift the output label? Will this happen automatically inside trainer?
Hi, thanks for sharing such a great job.
I've read your fine-tuning code and I'm a little confused about the inputs of the model.
From the code, the Input of model should be, here's an, example: ### # Instruction: {instruction}### Input{input}### Response:{response}. so the input_ids: tokenizer(example), label_ids:tokenizer(example), and label_ids[:len(source_len)]=IGNORE_INDEX.
I would like to ask, why do input ids contain response token ids? So the data target won't leak?
I am looking forward to your reply. Thank you very much.
Congratulations on the fine-tune! We have observed some fantastic performance through the provided web interface.
AFAIK the original Llama model was released under GNU/GPL, you should be able to distribute derivative work respecting this original license, correct? (Even if the original model weights have not officially been distributed to the public yet)
Will you provide some sort of wait-list to notify us when the model weights are made available?
Interested in as much information as you may share on this, again, congratulations and thank your impressive work!
need help!
The exception can be fixed by replacing 'dict' to 'Dict'
from typing import Optional, Sequence, Union ... def openai_completion( prompts: Union[str, Sequence[str], Sequence[dict[str, str]], dict[str, str]],
-->
from typing import Optional, Sequence, Union, Dict ... def openai_completion( prompts: Union[str, Sequence[str], Sequence[Dict[str, str]], Dict[str, str]],
I have three questions regarding the fine-tuning process.
Thank you in advance.
Hi, there, I just finish the finetuning process as introduced in train.py. However, I encountered one problem about trainer.evaluate().
{'loss': 0.3974, 'learning_rate': 3.5380966993958655e-11, 'epoch': 3.0}
{'loss': 0.4492, 'learning_rate': 0.0, 'epoch': 3.0}
{'train_runtime': 17758.138, 'train_samples_per_second': 8.785, 'train_steps_per_second': 0.069, 'train_loss': 0.7304400721402787, 'epoch': 3.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1218/1218 [4:55:48<00:00, 14.57s/it]
Traceback (most recent call last):
File "/home/codes/finetune_llama/alpaca/train.py", line 233, in <module>
train()
File "/home/codes/finetune_llama/alpaca/train.py", line 227, in train
trainer.evaluate()
File "/home/anaconda3/envs/hawq/lib/python3.9/site-packages/transformers/trainer.py", line 2920, in evaluate
eval_dataloader = self.get_eval_dataloader(eval_dataset)
File "/home/anaconda3/envs/hawq/lib/python3.9/site-packages/transformers/trainer.py", line 934, in get_eval_dataloader
raise ValueError("Trainer: evaluation requires an eval_dataset.")
ValueError: Trainer: evaluation requires an eval_dataset.
Should I give an eval_dataset here?
Environment: 6xA6000 48GB with Ubuntu 22.04, Pytorch 1.13.0
I ran into a generation problem after following your instruction to convert LLaMA-7B weight using your attached script.
I simply used the following script to directly test generation after loading the converted LLaMA-7B model:
tokenizer.batch_decode(model.generate(**tokenizer('I want to ', return_tensors="pt")))
The output of above code is:
'I want to acoérницschutzirectorioieckťDEX threshold släktetolasĭüttpiel'
The problem happens both before and after following your README for instruction fine-tuning. (note that I see the loss is decreasing over time during the fine-tuning stage which seems OK)
I have no problem running generation using original code from LLaMA, may I know your generation script so that I can test what caused the problem? Thanks.
As an independent researcher I'm interested in knowing what generation parameters are used in the Gradio Web Demo. Things such as temperature and repetition penalty, if you have used even more advanced samplers like Typical Sampling or Tail Free Sampling, I'd be interested to know that as well. From my brief testing it appears that the some parameter or setting is hampering creativity, perhaps that is intentional for the demo?
Thanks in advance!
the version of transformers is https://github.com/huggingface/transformers/pull/21955/commits
Thanks for sharing this project. I have been trying to train the larger model for an offline first free education assistant for poor students preparing for competitive exams . Sharing training code, even if in an pr can really helpful for me fine-tune an education assistant.
The blog says training recipe is too released in the code, but I cannot find it. Can you update the repo with code used for training the model, along with required dependencies/guide, etc, to help us do the same, maybe with bigger models.
Thanks for this awesome repo.
Can this finetuning script fit into A10, which only has 24GB GPU memory? I am trying to fine-tune the model on 4 A10 GPUs using a batch size of 1, but I still get an OOM error.
Hi, devs at stanford. Today I took a try on your project and run the command to generate the data. And after awhile, it outputs a json file, regen.json like below. So I have a little confused, forgive my ignorance but I really don't know how to make something cool with this "regex.json" file. You know what I mean, is like I got a file, but what can I do with it. I guessed ppl might able to create something similar to ChatGpt but weaker, this is my guessing so far. Please enlightened me, thanks.
Hi
Great work! In READM, you guys mention that 4 A100 80G can train this model, but when I try 8 40G A100, it meets cuda oom error.
Hi, we find your work in home page.
https://crfm.stanford.edu/2023/03/13/alpaca.html
This work inspires us how to adjust large language models in a good way.
Now, We are thinking about why this small model can store enough world knowledge.
Best.
Hello, thank you for open-sourcing this work. We are now interested in generating our own instructions to fine-tune the Llama model based on your documentation and approach. Could you please advise on any resources or references we can use? Also, are these codes available on Hugging Face?
Thanks for sharing the training code. I've finished a 3-epoch finetuing.
However, I don't find the inference code.
Would you please give some advice on it? or sharing the infercence code ?
Thanks again!
HI guys,
This one is awesome. When do you guys plan to support airgap installation? in another words, the end user can run it in their Laptop or any VMs in public cloud?
Hi,
What is the steps to train it with this specific Bible content?
Example:
https://raw.githubusercontent.com/tushortz/variety-bible-text/master/bibles/kjv.txt
Can you show me the steps to train it?
And the other question is: The final file is compatible with LLAMA?
Thanks.
Hi,
Can a consumer level GPU run infer with alpaca-7B model?
I am currently training the model, and I am hoping to compare it with others. I am only using only 2 A100-80G.
Here is my wanb log:
https://wandb.ai/charliezjw/huggingface/runs/hil1q6lt
My first run of the trainer could not save the model because the evaluate() call fails. I have removed that method call and now would like to resume from the last checkpoint. However, I cannot seem to get that working. Is there some disparity between model architecture and checkpoint architecture? The change I made to accommodate checkpoint resumption and the error I get are shown below show below.
Change for checkpoint resumption
data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args) trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) transformers.logging.set_verbosity_info() trainer.train() #trainer.train("output/checkpoint-18000") #trainer.evaluate() trainer.save_state() safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)
Error stacktrace
train.py FAILED
`
Once you collect 52k synthetic dataset, how did you plot the pie chart here ?
Thanks !
In the provided training command:
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
--tf32 True
Why is --bf16
used, if the model checkpoints were originally fp16? Is it simply overridden by the --tf32
flag later?
Hi, What form should generate_instruction_following_data be in to execute generate instruction?
prompt_batches: 0%| | 0/1 [00:00<?, ?it/s]WARNING:root:OpenAIError: This model's maximum context length is 4097 tokens, however you requested 4162 tokens (1090 in your prompt; 3072 for the completion). Please reduce your prompt; or completion length..
It seems no eval_dataset and thus no storing for checkpoint ?
(for privacy, I hide the absolute file path and replace with )
Traceback (most recent call last):
File "<path>/stanford_alpaca/train.py", line 232, in <module>
train()
File "<path>/stanford_alpaca/train.py", line 226, in train
trainer.evaluate()
File "<path>/stanford_alpaca/transformers-68d640f7c368bcaaaecfc678f11908ebbd3d6176/src/transformers/trainer.py", line 2920, in evaluate
eval_dataloader = self.get_eval_dataloader(eval_dataset)
File "<path>/stanford_alpaca/transformers-68d640f7c368bcaaaecfc678f11908ebbd3d6176/src/transformers/trainer.py", line 934, in get_eval_dataloader
raise ValueError("Trainer: evaluation requires an eval_dataset.")
ValueError: Trainer: evaluation requires an eval_dataset.
As the name implies, can you share the training log?
Hello, thank you for open-sourcing your training details! I just tried your demo and found the responses surprisingly fluent.
Wondering if your decision to train on a 52K instruction dataset was influenced by some criteria? I'm wondering if there's a floor where you found responses to be qualitatively inferior, or trying a number beyond 52K to have not yielded better results?
Dear Stanford Researchers, Professors, Students (all geniuses) thank you for your amazing job!
Would the tuning code you released in this repo (and the dataset) be fit for finetuning larger LLaMA models like 13b/30b/65b?
How would the computational effort scale with such models?
Hi, thanks for sharing your work, this is amazing!
Do you plan to release the web demo code ?
Could you share the link to the adopted llama-7b model? I was trying the one from decapoda-research (https://huggingface.co/decapoda-research) (https://huggingface.co/decapoda-research/llama-7b-hf/discussions) but it looks like the model itself cannot be loaded.
gpt-3.5-turbo
is cheaper and faster than davinci. I'm not 100% sure whether it will actually work better for Alpaca but figure it may be worth a trial. Any interest in taking a PR?
Hello, first of all thank you for releasing the training code for alpaca, we really appreaciate it.
I am running the fine-tuning script on an 4xA100-SXM4-80GB, and currently getting an 24H ETA. Which doesn't really scales with the reported "3 hours on 8 80GB A100s" mentioned on https://crfm.stanford.edu/2023/03/13/alpaca.html , Shouldn't it be around 6hours, or even 12hours considering that the script "is not particularly optimized"?
Is anyone else encountering this issue? And if this is expected, then what were the methods you used to optimize the fine-tuning process?
Running on CUDA 12.1, Torch 1.13, and the transformers fork of llama at the commit you mentioned.
Thanks.
I am trying to run this thru my laptop(MacOS), just wondering any particular requirements in hardware?
hi ,dear
the model could be open source ?
The blog post says $500 was spent producing the dataset.
The blog post also says $100 was spent on 3xA100 80GB for 3 hours.
The market rate for 4xA100 is around $8 per hour. (See vast.ai for example)
If the dataset is provided for fine tuning then Alpaca could be reproduce for just about $24 and we would not have to wait for Facebook's response regarding sharing of the pre-trained model.
Hello, when finetuning LLaMa, are there any specific layers frozen so that these layers do not trained in training?
@percyliang @guestrin @thashim @rtaori @lxuechen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.