Where are the "preceding sentences" from? about self-rag HOT 5 CLOSED

akariasai commented on July 17, 2024

Where are the "preceding sentences" from?

from self-rag.

Comments (5)

AkariAsai commented on July 17, 2024 1

It will be added when you run the create_retrieval_data.py script! Essentially what we did is to select the sentences before the target output sentence y_t when it makes reflection token predictions!

https://github.com/AkariAsai/self-rag/blob/main/data_creation/generator/create_retrieval_data.py#L89

This discussion might be relevant: #7

from self-rag.

Aman-4-Real commented on July 17, 2024 1

I want to build my own dataset to fine-tune a Critic model, but I found that the prompt in the code has a preceding presence, and I don't know where it comes from and how it was generated?

PROMPT_DICT = { "context": ( "Given an instruction, please make ......TL;DR...... is correct or not.\n" "###\nInstruction: {instruction}\n" "Evidence: {evidence}\n" "Output: {target_output}\n" "Rating: " ) }

I am dealing with the same thing to fine-tune a Critic model. I guess the order in the guideline to run this step

self-rag/data_creation/generator/README.md

Lines 35 to 47 in ed22968

 To add predictions for retrieval necessity given the input data only, please run the command below. 

 ``` 

 python run_reward_vllm.py \ 

  --input_file YOUR_INPUT_FILENAME \ 

  --model_name YOUR_CRITIC_MODEL_PATH \ 

  --task 'retrieval' \ 

  --inst_mode retrieval_instruction  

  --input_mode retrieval_input \ 

  --metric match \ 

  --result_fp INITIAL_RETRIEVAL_TOKEN_OUTPUT \ 

  --split test  

 ```

may be wrong. Directly using this command will lead to a "preceding presence missing" error as you mentioned.

What the author said here is helpful. So I ran the #L35-L47 after doing "Continuous retrieval (t>1)" and before "Create IsRel and IsSUp input data" and got the final constructed data in the correct format.

from self-rag.

cgt-woailol commented on July 17, 2024

当您运行脚本时，它将被添加！从本质上讲，我们所做的是在目标输出句子进行反射标记预测时选择它前面的句子！create_retrieval_data.py``y_t

https://github.com/AkariAsai/self-rag/blob/main/data_creation/generator/create_retrieval_data.py#L89

这个讨论可能是相关的：#7

But I want to use gpt-4 to build my datasets. What you say is the next setp. So I mean that the 'preceding sentences' from chatgpt_need_retrieval.py.

from self-rag.

AkariAsai commented on July 17, 2024

Could you ellaborate your question a bit more? preceding_sentences will be added if you run the create_retrieval_data.py) script with --multiple_sent option.

from self-rag.

AkariAsai commented on July 17, 2024

I am closing this issue for now, but feel free to reopen it!

from self-rag.

Where are the "preceding sentences" from? about self-rag HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	To add predictions for retrieval necessity given the input data only, please run the command below.

	```
	python run_reward_vllm.py \
	--input_file YOUR_INPUT_FILENAME \
	--model_name YOUR_CRITIC_MODEL_PATH \
	--task 'retrieval' \
	--inst_mode retrieval_instruction
	--input_mode retrieval_input \
	--metric match \
	--result_fp INITIAL_RETRIEVAL_TOKEN_OUTPUT \
	--split test
	```