Comments (11)
I could not get the finetune script in the directory to work on Mistral.
However, I ran the steps above to update the tokenizer of the model (mistral ft'ed on textbooks) I wished to further fine-tune and then ran over the self-rag dataset with Axolotl. With my config I was able to complete two epochs in just several hours. The outputted model is here. I found an ARC challenge score of 75% using the same args as described in the repo.
Great work self-rag team, this looks really impressive. I will have the full pipeline online and easy to access shortly.
EDIT: Doing two more epochs now, to see how further tuning impacts the scores.
from self-rag.
First pass is online now - https://www.reddit.com/r/LocalLLaMA/comments/17knjfz/update_from_sciphi_introducing/?rdt=55834.
The model is looking quite powerful for the size. I am hopeful that more people will continue to build on the self-rag work.
from self-rag.
Hi @emrgnt-cmplxty, thank you for tyring it out! I haven't tried Mistral by myself yet, so I am not sure how they process new special tokens... For Llama2-7b/13B or Llama1, we didn't have any issue adding special tokens.
Yes I'd recomment to doube check whether the special tokens are proprietly added and used during fine-tuning (e.g., print out the tokenized output and see if the special tokens are appearing in the processed output).
Also at one point, my co-author @yizhongw and I found that llama2 tokenizer was adding [UNK] tokens a lot and hurted the performance sigificantly when Yizhong was using slightly olrder version of huggingface transformers. Probably it might also help to double check whether the tokenizer output does not have any weird [UNK] tokens.
from self-rag.
@AkariAsai Would it be a large hassle to outline how to extend the tokenizer with the process that you used? I think this would be very helpful for myself and for others. This would also allow us to use other training software, like Axolotl.
from self-rag.
If you are loading model checkpoints from huggingface transformers
, it only requires two lines of code.
tokenizer.add_special_tokens
to expand the special tokens - https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/finetune.py#L460model.resize_token_embeddings
to expand the embedding size - https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/finetune.py#L481
But I am not sure why the Mistra-7B fine-tuning got lower scores, though... I can take a look at fine-tuning of Mistra early next week.
from self-rag.
Cool, congrats!! Thank you so much for all of the help & contributions!
If you are implementing your own fine-tuning script , another key thing for Self-RAG is to add cntext markup to markup the retrived context surrounded by <paragraph>
& </paragraph>
.
I found even without this our model often performs fine in open-domain QA or classification tasks, but for long-form QA, this might be crucial (e.g., without this, the model starts generating paragraph by itself.
https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/finetune.py#L274
from self-rag.
Oh, I see. I did not add this into my FT logic. I have set my completion rules to stop on <paragraph>
tokens and everything seems to be working as expected. What is the impact of missing this logic? Is there any way to pre-compute this and then re-upload the data?
Either way, I will try to integrate the proper functionality for this into my workflow with Axolotl, though this is a framework I am still picking up.
Lastly, one thing I am noticing is that my FT'ed model attempts to retrieve after every completion when writing long-form content. Have you seen this before? Is it likely to be related to the failure to use the logic you outlined above?
EDIT - Disregard the first question. Reading through the code a second time, I now see that failing to mask the paragraph tokens will mean that the model is trained to predict them.
from self-rag.
This is a fantastic news! Thank you so much for all the work! I'll add mentions to this model in our README.
from self-rag.
I am closing this issue now, but feel free to reopen it!
from self-rag.
wo epoc
Hi, I appreciate your attempt! I am wondering what machines you use to run that script, I tried to run the same scripts using 8*V100 32g, but it seems that it would take hundreds of hours to complete my training, and because V100 does not support bf16 so I changed the accelerate argument to use fp16, I am not sure if I am doing it right.
from self-rag.
First pass is online now - https://www.reddit.com/r/LocalLLaMA/comments/17knjfz/update_from_sciphi_introducing/?rdt=55834.
The model is looking quite powerful for the size. I am hopeful that more people will continue to build on the self-rag work.
Hello, I recently wanted to use the axolotl library for further fine-tuning based on your SciPhi-Self-RAG-Mistral-7B-32k model for the medical field, but my current training cannot achieve convergence. My parameters are as follows: ( Can I ask about your training parameter settings?)
base_model: SciPhi-Self-RAG-Mistral-7B-32k
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizerload_in_8bit: true
load_in_4bit: false
strict: falsedatasets:
- path: /home/zhongjin/mnt_data/cl/data/bio_new.jsonl
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./lora-outadapter: lora
lora_model_dir:sequence_len: 4096
sample_packing: true
pad_to_sequence_len: truelora_r: 64
lora_alpha: 128
lora_dropout: 0.1
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:gradient_accumulation_steps: 16
micro_batch_size: 2
num_epochs: 5
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00002train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: falsegradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention:loss_watchdog_threshold: 10.0
loss_watchdog_patience: 3
warmup_ratio: 0.03
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
from self-rag.
Related Issues (20)
- The meaning of "_w_gs.jsonl" in evaluation data HOT 1
- Where does the retrieval done?
- Questions about Critic model HOT 2
- Retrieval-augmented baselines - Huggingface models HOT 4
- I have create a virtual enviroment in anaconda. However, something went wrong when i try to 'pip install -r requirement' HOT 2
- 4 bit quantized version of 7B?
- How long does it takes to train an epoch for critic/generator model on llama-7B with 8 A100?
- What does YOUR_INPUT_FILE look like? Can you provide an example? Thanks very much! HOT 1
- Explanation needed for [Continue to Use Evidence] HOT 1
- How can I get initial input file for generator?
- model issues
- Processed Input Dataset and Flan-3B Critic Generated Dataset
- Reproducing Self-RAG
- accuracy metric HOT 3
- About parameter `max_depth` HOT 2
- Doesn't the generator need to call the retriever when training the model?
- The critic model will generate different type of token when I use run_reward_vllm.py to generate tokens HOT 1
- some problem with run_long_form_static.py
- Data formatting to call the retriever
- Question Regarding Formula Error in Your Paper
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-rag.