GithubHelp home page GithubHelp logo

Get low accuracy with GPT-3.5. about react HOT 12 CLOSED

ysymyth avatar ysymyth commented on April 28, 2024
Get low accuracy with GPT-3.5.

from react.

Comments (12)

Jiayi-Pan avatar Jiayi-Pan commented on April 28, 2024 1

Same trend on 2k-3k

Method 2000-2100 2000-3000
ReAct 0.5345 / 0.28 0.5735 / 0.328
Act 0.674 / 0.38 0.67 / 0.352

from react.

ysymyth avatar ysymyth commented on April 28, 2024

hi can you show your code and example trajectory?

from react.

Luoyang144 avatar Luoyang144 commented on April 28, 2024

I'm using this notebook, and using API from azure, so I change the llm function(call GPT-3.5).
image
the final result:
image

from react.

ysymyth avatar ysymyth commented on April 28, 2024

can you show some trajs

from react.

zhiyuanc2001 avatar zhiyuanc2001 commented on April 28, 2024

Hi, I'm tring to run ReAct with GPT-3.5-Turbo on hotpot dataset with provided jupyter notebook. But only get 0.182 accuracy, is it a reasonable result? I think it is much lower than result showed in paper.

Hi, I got similar reults. I think it is the size of GPT-3.5-Turbo and alignment tax result in the low score. :-)

from react.

Luoyang144 avatar Luoyang144 commented on April 28, 2024

In fact, the results of ReAct are no longer as good as directly allowing GPT3.5 to reason. Why did this happen?

from react.

ysymyth avatar ysymyth commented on April 28, 2024

can you show some trajectories? also, try the original text-davinci-002 and see if scores also become lower?

from react.

Jiayi-Pan avatar Jiayi-Pan commented on April 28, 2024

It looks like we observed the same phenomenon on at least a subset of tasks on webshop benchmark.

We run react/act using the official code on webshop task 2000~2100 with gpt-3.5-turbo-instruct
The result is

  • ReAct: 0.5345 avg reward, 0.28 success rate
  • Act: 0.674 avg reward, 0.38 success rate

You can find the raw trajectories here

from react.

Luoyang144 avatar Luoyang144 commented on April 28, 2024

Here is running log of gpt4 ReACT, still get lower result (GPT4 get 0.33).
https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log

from react.

ysymyth avatar ysymyth commented on April 28, 2024

Interesting. Is it only on HotpotQA or more tasks? Also, maybe check if text-davanci-002 result is reproducible?

https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log cannot be opened.

from react.

Luoyang144 avatar Luoyang144 commented on April 28, 2024

text-davinci-002 is not available now.
This link should be accessible now: https://github.com/Luoyang144/share/blob/main/gpt4_hotpot_react.log

from react.

ysymyth avatar ysymyth commented on April 28, 2024

My hypothesis is that later models after text-davinci-002 might be tuned on trajectories similar to Act, plus domains like QA have intuitive tools, and tasks like HotPotQA have intuitive reasoning patterns. On more out-of-distribution domains and tasks (e.g., WebShop, or AlfWorld), reasoning should still improve decision making generalization and transparency.

Close it for now but let me know if there's more findings or analysis into this.

from react.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.