GithubHelp home page GithubHelp logo

Comments (5)

AkariAsai avatar AkariAsai commented on July 17, 2024 1

It will be added when you run the create_retrieval_data.py script! Essentially what we did is to select the sentences before the target output sentence y_t when it makes reflection token predictions!

https://github.com/AkariAsai/self-rag/blob/main/data_creation/generator/create_retrieval_data.py#L89

This discussion might be relevant: #7

from self-rag.

Aman-4-Real avatar Aman-4-Real commented on July 17, 2024 1

I want to build my own dataset to fine-tune a Critic model, but I found that the prompt in the code has a preceding presence, and I don't know where it comes from and how it was generated?

PROMPT_DICT = { "context": ( "Given an instruction, please make ......TL;DR...... is correct or not.\n" "###\nInstruction: {instruction}\n" "Evidence: {evidence}\n" "Output: {target_output}\n" "Rating: " ) }

I am dealing with the same thing to fine-tune a Critic model. I guess the order in the guideline to run this step

To add predictions for retrieval necessity given the input data only, please run the command below.
```
python run_reward_vllm.py \
--input_file YOUR_INPUT_FILENAME \
--model_name YOUR_CRITIC_MODEL_PATH \
--task 'retrieval' \
--inst_mode retrieval_instruction
--input_mode retrieval_input \
--metric match \
--result_fp INITIAL_RETRIEVAL_TOKEN_OUTPUT \
--split test
```
may be wrong. Directly using this command will lead to a "preceding presence missing" error as you mentioned.

What the author said here is helpful. So I ran the #L35-L47 after doing "Continuous retrieval (t>1)" and before "Create IsRel and IsSUp input data" and got the final constructed data in the correct format.

from self-rag.

cgt-woailol avatar cgt-woailol commented on July 17, 2024

当您运行脚本时,它将被添加!从本质上讲,我们所做的是在目标输出句子进行反射标记预测时选择它前面的句子!create_retrieval_data.py``y_t

https://github.com/AkariAsai/self-rag/blob/main/data_creation/generator/create_retrieval_data.py#L89

这个讨论可能是相关的:#7

But I want to use gpt-4 to build my datasets. What you say is the next setp. So I mean that the 'preceding sentences' from chatgpt_need_retrieval.py.

from self-rag.

AkariAsai avatar AkariAsai commented on July 17, 2024

Could you ellaborate your question a bit more? preceding_sentences will be added if you run the create_retrieval_data.py) script with --multiple_sent option.

from self-rag.

AkariAsai avatar AkariAsai commented on July 17, 2024

I am closing this issue for now, but feel free to reopen it!

from self-rag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.