GithubHelp home page GithubHelp logo

ysymyth / react Goto Github PK

View Code? Open in Web Editor NEW
1.9K 1.9K 190.0 6.31 MB

[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models

License: MIT License

Jupyter Notebook 99.37% Python 0.63%
decision-making large-language-models llm prompting reasoning

react's People

Contributors

john-b-yang avatar ysymyth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

react's Issues

I got zero score running Webshop.ipython

I tried to run Webshop.ipython, and here are some of the outputs:

Observation: Invalid action!

Action: click[Add to Cart]
Observation: 

Action: click[Add to Cart]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

1 0.0 0.0 0.0
-------------
-----------------
1
Action: reset
Observation: 

Action: click[Buy Now]
Observation: Invalid action!

Action: click[Add to Cart]
Observation: 

Action: click[Add to Cart]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: click[Proceed to checkout]
Observation: 

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!

2 0.0 0.0 0.0
-------------
-----------------

How can I get the right score? Thank you

Get low accuracy with GPT-3.5.

Hi, I'm tring to run ReAct with GPT-3.5-Turbo on hotpot dataset with provided jupyter notebook. But only get 0.182 accuracy, is it a reasonable result? I think it is much lower than result showed in paper.

Question for the code

Hi, thanks for publishing this code. It is really helpful for me!

I have some questions about the code.

  1. in
    except:
  def reset(self, seed=None, return_info=False, options=None, idx=None):
    self.env.reset(seed=seed, return_info=return_info, options=options)
    try:
      self.env.step('')
    except:
      pass
    self.env.reset(seed=seed, return_info=return_info, options=options)
    self.data_idx = int(np.random.randint(len(self.data))) if idx is None else idx
    observation = f"Claim: {self.data[self.data_idx][0]}"
    info = self._get_info()
    return (observation, info) if return_info else observation

I can not figure out why we need this try-except code, it seems this part of the code did nothing. The second self.env.reset will reset the env, there is no need for the first reset.

  1. Still for the reset code, the return_info argument seems always been False. I think this argument can be dropped. Besides, the options and seed arguments have never been used in WikiEnv.reset, FeverWrapper.reset and WikiEnv.reset.
    def reset(self, seed=None, return_info=False, options=None, idx=None):

    def reset(self, seed=None, return_info=False, options=None):

    def reset(self, seed=None, return_info=False, options=None, idx=None):

    def reset(self, seed=None, return_info=False, options=None, idx=None):

  2. in the WikiEnv.step, the reward has not been changed since it was initialized, and has never been used why do we need this variable?
    Besides, in the FeverWrapper.step, the reward is obtained by self.get_reward, not from WikiEnv.step. in

    obs, _, done, info = self.env.step(action)
    , you use a _ to receive the reward from WikiEnv.step, it also demonstrated the reward in WikiEnv.step is not useful.

Thanks for your patience ~

How can I install ReAct?

Don't give me links to Alfworld! The installations there don't work, the support is nonexistent.
How can I install ReAct on my Ubuntu 22.04?

Questions on Table 3 (AlfWorld)

Hi,
Thanks for your great work! I have a question on Table 3, where results of Act and ReAct are reported as avg/best of 6. I am wondering where does 6 come from, given that the decoding strategy is greedy.
Thank you!

Old or New openai version

I used the code as it is for the hotpotqa.ipynb and found the following error:

APIRemovedInV1 Traceback (most recent call last)
Cell In[53], line 10
8 old_time = time.time()
9 for i in idxs[:500]:
---> 10 r, info = webthink(i, to_print=True)
11 rs.append(info['em'])
12 infos.append(info)

Cell In[47], line 26
24 for i in range(1, 8):
25 n_calls += 1
---> 26 thought_action = llm(prompt + f"Thought {i}:", stop=[f"\nObservation {i}:"])
27 try:
28 thought, action = thought_action.strip().split(f"\nAction {i}: ")

Cell In[52], line 10
9 def llm(prompt, stop=["\n"]):
---> 10 response = openai.Completion.create(
11 model="text-davinci-002",
12 prompt=prompt,
13 temperature=0,
14 max_tokens=100,
15 top_p=1,
16 frequency_penalty=0.0,
17 presence_penalty=0.0,
18 stop=stop
19 )
20 return response["choices"][0]["text"]

File c:\Users\fattoh.alqershi\ReAct\ReAct\myvenv\lib\site-packages\openai\lib_old_api.py:39, in APIRemovedInV1Proxy.call(self, *_args, **_kwargs)
38 def call(self, *_args: Any, **_kwargs: Any) -> Any:
---> 39 raise APIRemovedInV1(symbol=self._symbol)

APIRemovedInV1:

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28

It seems that error in version, when I back to version openai==0.28.

It raised other error related to the client parameters. Expected (messages, and other .....) but no messages there in the code.
Please, support me.

Thanks.

cot->react & react->cot

Hello, I would like to ask if there is a code implementation for cot ->react and react ->cot mentioned in the paper

你好,我想问一下论文里提到的cot->react 和 react->cot 有代码实现吗

Why is `clean_str` present in `wikienv`?

I am finding certain strings can break clean_str:

p = "This is a test string with unicode escape: \\u00e9"

This will break clean_str:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 43: unexpected end of data

Why do we need to convert string to be UTF-8? And if it's required, why not just ignore conversion errors?

question on alfworld and textworld version.

when i run alfworld.ipynb, it return:
Initializing AlfredTWEnv...
Checking for solvable games...
Overall we have 134 games
Evaluating with 134 games
Traceback (most recent call last):
File "/home/ict/ReAct/react.py", line 55, in
env = env.init_env(batch_size=1)
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/alfworld/agents/environment/alfred_tw_env.py", line 224, in init_env
infos = textworld.EnvInfos(won=True, admissible_commands=True, expert_type=expert_type, expert_plan=expert_plan, extras=["gamefile"])
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/textworld/core.py", line 109, in init
raise ValueError(msg)
ValueError: Unknown information requested: ['expert_plan', 'expert_type']. Available information are: ['admissible_commands', 'command_templates', 'description', 'entities', 'extras', 'facts', 'fail_facts', 'feedback', 'game', 'intermediate_reward', 'inventory', 'last_action', 'last_command', 'location', 'lost', 'max_score', 'moves', 'objective', 'policy_commands', 'score', 'verbs', 'win_facts', 'won']
it seems that textworld do not work any more.

How did you go about finetuning?

Hi there, I cannot seem to find any information on the fine-tuning process in your paper and this repository.

A snippet from your paper:

However, when finetuned with just 3,000 examples, ReAct becomes the best
method among the four, with PaLM-8B finetuned ReAct outperforming all PaLM-62B prompting
methods, and PaLM-62B finetuned ReAct outperforming all 540B prompting methods. In contrast,
finetuning Standard or CoT is significantly worse than finetuning ReAct or Act for both PaLM-
8/62B, as the former essentially teaches models to memorize (potentially halluincated) knowledge
facts, and the latter teaches models how to (reason and) act to access information from Wikipedia, a
more generalizable skill for knowledge reasoning.

Question about webshopEnv

Hi! I'm replicating ReAct results on WebShop, and I have several questions with webshopEnv in the jupyter notebook

  • It seems like you set the environment to only output the top 3 product (instead of the full 10)
if prod_cnt >= 3:
    processed_t = ''

Is this also what you used in the paper?

  • There's also assert False when the button Next or Prev is clicked. Is this also intentional?

Also, I have got results of ReAct on WebShop with session id fixed_{1-500}, which I believe is the same setup as the paper, using this environment (did not modify it) but with different llm (not PaLM-540B):

gpt-turbo-3.5
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 59.9 Success Rate: 30.0

code-davinci-002
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 65.60 Success Rate: 38.8

Is this to be expected? Wondering if you have any thoughts on this. After some researching, there're people saying that chain-of-thought might not be as effective for models that was trained with RLHF like ChatGPT. But I don't have much explanation for why I'm not seeing the performance boost from Act to ReAct with Codex (code-davinci-002)

Thank you in advance! Love the simplicity of your work and I'm trying to come up with new ideas based off of this paper :)

Have you considered renaming this project?

Hello, thank you for this important work and project!
I'm already seeing many references to the paradigm. The problem is that there was already a massively popular project named React. This makes searches for ReAct somewhat difficult.

Potential Implementation error on Webshop

Hi, I'm trying to reproduce your ReAct results on Webshop using some LLM APIs. However, I sometimes encountered the following errors.

Basically, sometimes, after you select some specific options and then click[Buy Now], it's going to show the error below:

Traceback  
(most recent call last) 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2095 
,
      in  
__call__ 
def __call__(self, environ: dict, start_response: t.Callable) -> t.Any: 
"""The WSGI server calls the Flask application object as the 
WSGI application. This calls :meth:`wsgi_app`, which can be 
wrapped to apply middleware. 
""" 
return self.wsgi_app(environ, start_response) 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2080 
,
      in  
wsgi_app 
try: 
ctx.push() 
response = self.full_dispatch_request() 
except Exception as e: 
error = e 
response = self.handle_exception(e) 
except:  # noqa: B001 
error = sys.exc_info()[1] 
raise 
return response(environ, start_response) 
finally: 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
2077 
,
      in  
wsgi_app 
ctx = self.request_context(environ) 
error: t.Optional[BaseException] = None 
try: 
try: 
ctx.push() 
response = self.full_dispatch_request() 
except Exception as e: 
error = e 
response = self.handle_exception(e) 
except:  # noqa: B001 
error = sys.exc_info()[1] 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1525 
,
      in  
full_dispatch_request 
request_started.send(self) 
rv = self.preprocess_request() 
if rv is None: 
rv = self.dispatch_request() 
except Exception as e: 
rv = self.handle_user_exception(e) 
return self.finalize_request(rv) 
def finalize_request( 
self, 
rv: t.Union[ResponseReturnValue, HTTPException], 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1523 
,
      in  
full_dispatch_request 
self.try_trigger_before_first_request_functions() 
try: 
request_started.send(self) 
rv = self.preprocess_request() 
if rv is None: 
rv = self.dispatch_request() 
except Exception as e: 
rv = self.handle_user_exception(e) 
return self.finalize_request(rv) 
def finalize_request( 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py" 
,
      line  
1509 
,
      in  
dispatch_request 
getattr(rule, "provide_automatic_options", False) 
and req.method == "OPTIONS" 
): 
return self.make_default_options_response() 
# otherwise dispatch to the handler for that endpoint 
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) 
def full_dispatch_request(self) -> Response: 
"""Dispatches the request and on top of that performs request 
pre and postprocessing as well as HTTP exception catching and 
error handling. 
File  
"/home/user/webshop/web_agent_site/app.py" 
,
      line  
221 
,
      in  
done 
return html 
@app.route('/done/<session_id>/<asin>/<options>', methods=['GET', 'POST']) 
def done(session_id, asin, options): 
options = literal_eval(options) 
goal = user_sessions[session_id]['goal'] 
purchased_product = product_item_dict[asin] 
price = product_prices[asin] 
reward, reward_info = get_reward( 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py" 
,
      line  
59 
,
      in  
literal_eval 
expression.  The string or node provided may only consist of the following 
Python literal structures: strings, bytes, numbers, tuples, lists, dicts, 
sets, booleans, and None. 
""" 
if isinstance(node_or_string, str): 
node_or_string = parse(node_or_string, mode='eval') 
if isinstance(node_or_string, Expression): 
node_or_string = node_or_string.body 
def _raise_malformed_node(node): 
raise ValueError(f'malformed node or string: {node!r}') 
def _convert_num(node): 
File  
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py" 
,
      line  
47 
,
      in  
parse 
assert major == 3 
feature_version = minor 
elif feature_version is None: 
feature_version = -1 
# Else it should be an int giving the minor version for 3.x. 
return compile(source, filename, mode, flags, 
_feature_version=feature_version) 
def literal_eval(node_or_string): 
""" 
  File "<unknown>", line 1
    {'color': '2
               ^
SyntaxError: EOL while scanning string literal
 

      This is the Copy/Paste friendly version of the traceback.
     
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2095, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2080, in wsgi_app
    response = self.handle_exception(e)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2077, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1525, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1523, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1509, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/user/webshop/web_agent_site/app.py", line 221, in done
    options = literal_eval(options)
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    {'color': '2
               ^
SyntaxError: EOL while scanning string literal
 

  The debugger caught an exception in your WSGI application.  You can now
  look at the traceback which led to the error.   

  If you enable JavaScript you can also use additional features such as code
  execution (if the evalex feature is enabled), automatic pasting of the
  exceptions and much more. 

        Brought to you by  
DON'T PANIC 
, your
        friendly Werkzeug powered traceback interpreter.
       
Console Locked 

          The console is locked and needs to be unlocked by entering the PIN.
          You can find the PIN printed out on the standard output of your
          shell that runs the server.
         
PIN:

To reproduce the error, you can try this:
In the ipython file of ReAct webshop, select the task id 83: i need a slim fit gray colored coat that has long sleeves. it should be in x-large size, and price lower than 40.00 dollars. Then do the following actions:

  1. search[slim fit gray coat long sleeves x-large]
  2. click[B09FF97YGV]
  3. click[2#gray]
  4. click[x-large]
  5. click[Buy Now]

Then the error occurs. When doing these actions directly on the website, there is no such error. Therefore there may be something wrong when passing the argument to the environment.
(The errors I notice all come when an option that has '#' inside it is selected, maybe that's useful. )
Could you please help check that? Thank you so much!

[Reproducing Results] on Alfworld

Dear Authors,

Thank you for the great work on introducing ReAct.

Since, the original model that you used text-davinci-002 is deprecated on openai the closest two alternatives are: gpt-3.5-turbo and davinci-002. The best performance we get on e.g. the first 10 is 0.3, while the reported results on the first 10 envs of Alfworld are 0.7.

Could you share the traces or advice, what your latest scores on this environment is? Or how to reproduce your score of 0.7. @ysymyth @john-b-yang @descrip

Thanks.

Paper, table2

I am impressed with your research. Thank you for your good research.

But I have a question and would like to ask.

According to Table 2 of the paper, success and failure modes are divided.

  1. what is the definition of success mode and failure mode?
  2. if success mode is a successful case, it should not include false positives, because false positives are predicting something wrong as right.
  3. ultimately, Hallucinated reasoning traces or facts are present in both success mode and failure mode. I wonder why?
table2

Thanks!

Davinci-002

Is davinci-002 referring to text-davinci-002 or davinci-002 (not-finetuned model)?

New OAI Chat Endpoint

You can actually close this issue -- just in case anyone is looking for running it with OAI Chat Completion -

import os
import openai
import requests
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

def llm(prompt, stop=["\n"]):
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {
            "role": "user",
            "content": prompt,
        }
      ],
      temperature=0,
      max_tokens=100,
      top_p=1,
      frequency_penalty=0.0,
      presence_penalty=0.0,
      stop=stop
    )
    return response.choices[0].message.content

How to finetune the small REACT model

Hi, I was wondering how could we finetune the small REACT model given the prompts generated using LLM being prompt tuned.

  1. Are we trying to use LoRA or P-Tuning for the finetuning step?

  2. How to use the prompt data?
    (1) Letting all the actions and thoughts be the input and let the final action (answer) be the output
    (2) Parse the whole ReAct process and use previous in-context info as input and current action as output
    (3) Or any other way you used?

Really appreciate your help.

Webshop experiment details for numbers in paper

Hi,

For webshop env, what was the number of retrieved items displayed per page?
As per the code, it seems item names indexed after 3 are purposefully omitted, which does not seem to be clarified in the actual paper.

Could you please explicitly clarify this setting just so that I am clear whether this was a small change for visualization in code or was it done for all results reported in the paper?

I was looking through the earlier issues in the repo and couldn't find this resolved in the closed issues.

Thanks!

FEVER and WebShop code

Hello @ysymyth, thanks for sharing your code, excellent work! Is there any plan to release the code of FEVER and WebShop? Thank you!

Alfworld GPT-3 Results

Hi,
I wondered if you had more details or numbers from your GPT-3 results on Alfworld? For instance, do you have the splits of accuracy across the different subtasks (as in Table 3 in the paper)?

I would try to reproduce it, but I reckon the total cost would be > $100 and would like to avoid it if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.