GithubHelp home page GithubHelp logo

haotiansun14 / rci-agent Goto Github PK

View Code? Open in Web Editor NEW

This project forked from posgnu/rci-agent

0.0 0.0 0.0 2.18 MB

A codebase for "Language Models can Solve Computer Tasks"

Home Page: https://posgnu.github.io/rci-web/

License: MIT License

JavaScript 23.83% Python 11.31% CSS 3.10% HTML 61.76%

rci-agent's Introduction

RCI Agent for MiniWoB++

Welcome to the codebase for our paper, "Language Models can Solve Computer Tasks". In this codebase, you will find the implementation of our RCI agent, which uses a pre-trained language model to execute computer tasks in MiniWoB++ benchmark guided by natural language. The agent employs a simple RCI prompting scheme that allows it to improve its outputs.

overview

[Website] [Arxiv Paper] [PDF]

Dependencies

The RCI agent is implemented in Python 3.9 and requires the following dependencies:

  • gym
  • openai
  • selenium
  • Pillow
  • regex
pip install -r requirements.txt

Usage

Setup

To run the code, you must first install MiniWoB++ and configure your OpenAI API key. MiniWoB++ is integrated with the OpenAI Gym environment. Navigate to the computergym directory and execute the following command to install it:

cd computergym
pip install -e .

Once that's done, you need to write your OpenAI API key in the example_config.json file, then rename the file to config.json

Run

To run the code, simply execute the following command:

python main.py --env [TASK NAME] --llm [LLM NAME] --num-episodes [NUM EPISODES] --erci [NUM Explicit RCI] --irci [NUM Implicit RCI] --sgrounding

Here are the arguments you need to specify:

  • --env: MiniWoB++ task name

  • --llm: the name of language model. model name and the corresponding API name is specified below.

    • chatgpt: "gpt-3.5-turbo"
    • davinci: "text-davinci-003"
    • ada: "ada"
    • babbage: "babbage"
    • curie: "curie"
    • davinci1: "davinci"
    • davinci2: "text-davinci-002"
  • --env: Name of the MiniWoB++ task you want to run. You can see the list of available tasks in available_tasks.txt

  • --llm: Name of the language model you want to use. The model name and corresponding API name are specified below:

    • chatgpt: "gpt-3.5-turbo"
    • davinci: "text-davinci-003"
    • ada: "ada"
    • babbage: "babbage"
    • curie: "curie"
    • davinci1: "davinci"
    • davinci2: "text-davinci-002"
  • --num-episodes: Number of episodes to run the task

  • --erci: The number of explicit RCI loop for an action plan. -1 will remove the action plan sampling.

  • --irci: The number of implicit RCI loop for the agent grounding.

  • --sgrounding: If this is True, then the state grounding update is enabled.

  • --headless: If this is True, then the MiniWoB++ environment will run in headless mode.

Consider running the following command to verify if everything is functioning correctly:

python main.py --env choose-list --llm chatgpt --num-episodes 1 --irci 1 --sgrounding

Evaluation

Our project's approach has yielded impressive results, with our agent achieving the second-highest score out of all tested models. We have observed that our agent outperforms the baselines, with the exception of CC-Net (SL + RL), which uses dictionary-based typing actions.

What sets our RCI agent apart is that it accomplished this feat using 120 times fewer samples than WebN-T5-3B and 11,000 times fewer samples than CC-Net. Obtaining expert demonstrations and defining reward functions for computer tasks can be a daunting challenge, but our research highlights the potential of using LLMs to overcome these obstacles and achieve success in general computer tasks.

Check out our paper!

Our paper is available on Arxiv. If you use this code in your research, we kindly ask that you cite our paper.

@article{kim2023language,
      title={Language Models can Solve Computer Tasks}, 
      author={Geunwoo Kim and Pierre Baldi and Stephen McAleer},
      journal={arXiv preprint arXiv:2303.17491},
      year={2023},
}

rci-agent's People

Contributors

posgnu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.