GithubHelp home page GithubHelp logo

How did you go about finetuning? about react HOT 9 CLOSED

ysymyth avatar ysymyth commented on April 17, 2024
How did you go about finetuning?

from react.

Comments (9)

ysymyth avatar ysymyth commented on April 17, 2024 1

Hi, what we did was to use prompted PaLM-540B to generate trajectories with correct answers to fine-tune PaLM-8/62B. These are not shared as PaLM is a Google-internal model, but you can similarly use GPT-3 trajectories to fine-tune.

from react.

ysymyth avatar ysymyth commented on April 17, 2024 1

Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model?
Yes that's right --- we use prompted LLM to generate 3000 such examples with correct answers.

from react.

ysymyth avatar ysymyth commented on April 17, 2024 1

Yes, we believe fine-tuning is very promising --- can also check out Toolsformers (https://arxiv.org/abs/2302.04761) which is also about fine-tuning!

from react.

timothylimyl avatar timothylimyl commented on April 17, 2024 1

Thanks for the reply. All the best with your next project!

from react.

timothylimyl avatar timothylimyl commented on April 17, 2024

@ysymyth I was asking for specific details of the fine-tuning process. For example, did you finetune the model by separating the data individually for every question, trajectories and answers? I guess you would this so that you will not mixed up the data during training right?

from react.

ysymyth avatar ysymyth commented on April 17, 2024

did you finetune the model by separating the data individually for every question, trajectories and answers?
Sorry, what do you mean exactly?

from react.

timothylimyl avatar timothylimyl commented on April 17, 2024

Example:

Input is given to model:

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Targets that you use to fine-tune the model on:

Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft] 
[ADD END TOKEN HERE]

Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model? I was just curious on how did you go about fine-tuning the model.

from react.

timothylimyl avatar timothylimyl commented on April 17, 2024

I see, thank you for the information!

It's very interesting to see knowledge distillation done for reasoning and action in language models. You are prompt engineering LLMs that has in-context learning (which is an emergent property of its size) to teach smaller models how to go about reasoning and acting. From your paper, I believe that REact be another important part of instruction tuning task that we can add when instruction-tuning smaller language models to perform as well as LLMs.

from react.

Misterion777 avatar Misterion777 commented on April 17, 2024

Hey, sorry for asking in a closed issue, but its very relevant to authors' question.
Is it enough to just have pairs (<Question>, <Thought, Action, Observation chain>) in the dataset, or we also need to add intermediate pairs, like: (<Question, Thought, Action, Observation>, <Thought, Action, Observation>)?
I haven't tried to fine-tune the model yet, but my guess is that the model will be able to start answering the question, but might be bad at REacting, i.e. taking previous observations into the consideration. Am I wrong?

from react.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.