GithubHelp home page GithubHelp logo

stateflow's Introduction

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

This repository is the implementation of COLM 2024 submission StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows. This implementation is based on an open-source LLM framework AutoGen. This version of the code is made to facilitate the peer review of the COLM 2024 submission, and will be removed after. We plan to release the code accompanying the formal publication of the paper.

Datasets

  • InterCode: InterCode is designed as an interactive code environments to evaluate language agents that can code. From it, we evaluate StateFlow on two datasets:

    • (1) SQL: The InterCode-SQL adapts the Spider dataset for MySQL, containing 1034 task instances. For each task, a MySQL interpreter is set up with all relevant tables within a docker container.
    • (2) Bash: The InterCode-Bash dataset has 200 task instances curated from the NL2Bash dataset.
  • ALFWorld: ALFWorld contains interactive TextWorld environments that parallel embodied worlds in the ALFRED dataset. The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.

Experiments

We recommend create separate environments for InterCode and ALFWorld.

Both benchmarks require the installation of AutoGen:

pip install pyautogen

Then, create a "OAI_CONFIG_LIST" file and add your key, this will be used to access the LLM models:

[
    {
        "model": "gpt-35-turbo-1106",
        "api_key": "Your openai key here",
    },
    {
         "model": "gpt-35-turbo-1106",
         "api_key": "Your azure key",
         "api_type": "azure",
         "base_url": "Your base url here",
         "api_version": "Your api version here",
    }
]

When running the experiments, make sure to change the path to the OAI_CONFIG_LIST file in corresponding python files (e.g., ALFWorld/stateflow.py, InterCode/flow_bash.py, InterCode/flow_sql.py):

config_list = autogen.config_list_from_json(
    "Your path to OAI_CONFIG_LIST file here",
    filter_dict={"model": model},
)

Run InterCode

  1. Please follow the instructions in the InterCode repository to download intercode. Use the build from source instructions:

    git clone https://github.com/princeton-nlp/intercode.git
    cd intercode
    conda env create -f environment.yml
    conda activate intercode
  2. After you are in intercode folder, copy files from InterCode folder to intercode folder:

    bash ../InterCode/copy_files.sh

    We did some modifications to the setup.sh and the docker files:

    • Change sql dockerfile path to ic_spider_dbs.sql.
    • Create 4 different docker images for the 4 different bash tasks.
  3. Run setup.sh to create the docker images for the InterCode Bash and SQL environments.

    bash setup.sh
  4. Run StateFlow for InterCode SQL:

    bash scripts/stateflow.sh

Run ALFWorld

  1. Please follow the instructions in the ALFWorld repository to install the ALFWorld environment.

  2. Change the relevant path in stateflow.py:

    os.environ["ALFWORLD_DATA"] = "Your path to ALFWorld data here."
  3. Run stateflow for ALFWorld:

    python stateflow.py

Results

Results on InterCode SQL:

SQL

Results on InterCode Bash:

Bash

Results on ALFWorld:

ALFWorld

Ablation of states on the InterCode SQL dataset with GPT-3.5-Turbo:

ablation

StateFlow + Reflexion on ALFWorld (with 6 iterations):

Reflexion

stateflow's People

Contributors

yiranwu0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.