Hi there, I cannot seem to find any information on the fine-tuning process in your pap

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Example: Input is given to model: <div class="snippet-clipboard-

How did you go about finetuning? about react HOT 9 CLOSED

timothylimyl commented on July 16, 2024

How did you go about finetuning?

from react.

Comments (9)

ysymyth commented on July 16, 2024 1

Hi, what we did was to use prompted PaLM-540B to generate trajectories with correct answers to fine-tune PaLM-8/62B. These are not shared as PaLM is a Google-internal model, but you can similarly use GPT-3 trajectories to fine-tune.

from react.

ysymyth commented on July 16, 2024 1

Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model?
Yes that's right --- we use prompted LLM to generate 3000 such examples with correct answers.

from react.

ysymyth commented on July 16, 2024 1

Yes, we believe fine-tuning is very promising --- can also check out Toolsformers (https://arxiv.org/abs/2302.04761) which is also about fine-tuning!

from react.

timothylimyl commented on July 16, 2024 1

Thanks for the reply. All the best with your next project!

from react.

timothylimyl commented on July 16, 2024

@ysymyth I was asking for specific details of the fine-tuning process. For example, did you finetune the model by separating the data individually for every question, trajectories and answers? I guess you would this so that you will not mixed up the data during training right?

from react.

ysymyth commented on July 16, 2024

did you finetune the model by separating the data individually for every question, trajectories and answers?
Sorry, what do you mean exactly?

from react.

timothylimyl commented on July 16, 2024

Example:

Input is given to model:

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Targets that you use to fine-tune the model on:

Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft] 
[ADD END TOKEN HERE]

Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model? I was just curious on how did you go about fine-tuning the model.

from react.

timothylimyl commented on July 16, 2024

I see, thank you for the information!

It's very interesting to see knowledge distillation done for reasoning and action in language models. You are prompt engineering LLMs that has in-context learning (which is an emergent property of its size) to teach smaller models how to go about reasoning and acting. From your paper, I believe that REact be another important part of instruction tuning task that we can add when instruction-tuning smaller language models to perform as well as LLMs.

from react.

ilyalasy commented on July 16, 2024

Hey, sorry for asking in a closed issue, but its very relevant to authors' question.
Is it enough to just have pairs (<Question>, <Thought, Action, Observation chain>) in the dataset, or we also need to add intermediate pairs, like: (<Question, Thought, Action, Observation>, <Thought, Action, Observation>)?
I haven't tried to fine-tune the model yet, but my guess is that the model will be able to start answering the question, but might be bad at REacting, i.e. taking previous observations into the consideration. Am I wrong?

from react.

How did you go about finetuning? about react HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs