StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

This repository is the implementation of COLM 2024 submission StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows. This implementation is based on an open-source LLM framework AutoGen. This version of the code is made to facilitate the peer review of the COLM 2024 submission, and will be removed after. We plan to release the code accompanying the formal publication of the paper.

Datasets

InterCode: InterCode is designed as an interactive code environments to evaluate language agents that can code. From it, we evaluate StateFlow on two datasets:
- (1) SQL: The InterCode-SQL adapts the Spider dataset for MySQL, containing 1034 task instances. For each task, a MySQL interpreter is set up with all relevant tables within a docker container.
- (2) Bash: The InterCode-Bash dataset has 200 task instances curated from the NL2Bash dataset.
ALFWorld: ALFWorld contains interactive TextWorld environments that parallel embodied worlds in the ALFRED dataset. The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.

Experiments

We recommend create separate environments for InterCode and ALFWorld.

Both benchmarks require the installation of AutoGen:

pip install pyautogen

Then, create a "OAI_CONFIG_LIST" file and add your key, this will be used to access the LLM models:

[
    {
        "model": "gpt-35-turbo-1106",
        "api_key": "Your openai key here",
    },
    {
         "model": "gpt-35-turbo-1106",
         "api_key": "Your azure key",
         "api_type": "azure",
         "base_url": "Your base url here",
         "api_version": "Your api version here",
    }
]

When running the experiments, make sure to change the path to the OAI_CONFIG_LIST file in corresponding python files (e.g., ALFWorld/stateflow.py, InterCode/flow_bash.py, InterCode/flow_sql.py):

config_list = autogen.config_list_from_json(
    "Your path to OAI_CONFIG_LIST file here",
    filter_dict={"model": model},
)

Run InterCode

Please follow the instructions in the InterCode repository to download intercode. Use the build from source instructions:

git clone https://github.com/princeton-nlp/intercode.git
cd intercode
conda env create -f environment.yml
conda activate intercode

After you are in intercode folder, copy files from InterCode folder to intercode folder:
```
bash ../InterCode/copy_files.sh
```
We did some modifications to the setup.sh and the docker files:
- Change sql dockerfile path to ic_spider_dbs.sql.
- Create 4 different docker images for the 4 different bash tasks.
Run setup.sh to create the docker images for the InterCode Bash and SQL environments.
```
bash setup.sh
```
Run StateFlow for InterCode SQL:
```
bash scripts/stateflow.sh
```

Run ALFWorld

Please follow the instructions in the ALFWorld repository to install the ALFWorld environment.

Change the relevant path in stateflow.py:

os.environ["ALFWORLD_DATA"] = "Your path to ALFWorld data here."

Run stateflow for ALFWorld:
```
python stateflow.py
```

Results

Results on InterCode SQL:

Results on InterCode Bash:

Results on ALFWorld:

Ablation of states on the InterCode SQL dataset with GPT-3.5-Turbo:

StateFlow + Reflexion on ALFWorld (with 6 iterations):

micateo / stateflow Goto Github PK

stateflow's Introduction

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

Datasets

Experiments

Run InterCode

Run ALFWorld

Results

stateflow's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs