GithubHelp home page GithubHelp logo

hkust-nlp / agentboard Goto Github PK

View Code? Open in Web Editor NEW
190.0 4.0 20.0 16.58 MB

An Analytical Evaluation Board of Multi-turn LLM Agents

Python 20.25% Shell 0.08% CSS 0.05% HTML 0.38% PDDL 22.21% C 0.61% Makefile 0.04% SAS 56.35% Common Lisp 0.02%

agentboard's Introduction

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Data License Code License Python 3.8+ slack badge

๐ŸŒ Website | ๐Ÿ† Leaderboard | ๐Ÿ“š Data | ๐Ÿ“ƒ Paper | ๐Ÿ“Š Panel

What's New

  • [2024.01.15] ๐Ÿ“ฃ AgentBoard is released.
  • [2024.03.11] ๐Ÿฅณ AgentBoard is accepted by LLMAgents @ ICLR 2024

Introduction

AgentBoard emphasizes analytical evaluation for Large Language Models (LLMs) as generalist agents to perceive and act within various environments. It outlines four principles for constructing a benchmark to evaluate LLMs as generalist agents:

  1. Task Diversity: AgentBoard incorporates 9 distinct tasks to comprehensively understand the generalist ability of LLM agents, which is built upon LLM's extensive knowledge base and exceptional scenario comprehension.
  2. Multi-round Intercation: AgentBoard provides multi-round interaction between agents and environment, which is necessary to reflect the evolutionary nature of human intelligence, which continuously receives information and adapts towards the environment.
  3. Partially-Observable Environments: In AgentBoard, the complete state of the environment is not available to the agent, which assesses agent world modeling ability as additional knowledge needs to be acquired through online exploration.
  4. Analytical Evaluation: AgentBoard is a systematic evaluation platform: it includes a user-friendly script to construct goal-oriented reflex agents for a range of models, and features a panel for visualizing and interpreting results across multiple dimensions of agent proficiency, including fine-grained progress rates, grounding accuracy, performance breakdown for hard and easy examples, long-range in- teractions, detailed performance across various sub-skills, and trajectory with friendly visualization

Table of Contents

Click to expand the table of contents

๐Ÿš€ Quick Start

Here we provide a quick start guide to evaluate LLM agents on AgentBoard within 30 minutes.

Setup Environment

We provide both local setup (recommended) and docker as follows:

Click to expand local setup procedures (~ 15 minutes).

Setup with a setup.sh:

Step 1. Create a conda environment

conda create -n ${YOUR_ENV_NAME} python=3.8.13  # python version should be 3.8.13
conda activate ${YOUR_ENV_NAME}

Step 2. Git clone this repo

git clone https://github.com/hkust-nlp/AgentBoard.git

Step 3. Download the data from huggingface

# Download the data and move it to the project root dir
cd AgentBoard
mkdir data
wget https://huggingface.co/datasets/hkust-nlp/agentboard/resolve/main/data.tar.gz
tar -zxvf data.tar.gz

Step 4. Set up the environment for tasks except WebArena

INSTALL_WEBARENA=false bash ./setup.sh

# After running the above command, the env will support other tasks than WebArena

Step 5. Set up the environment for WebArena

# Please check whether the dubs and Xvfb are installed before building it
# For Ubuntu or Debian
dpkg -l | grep dbus  # will return the info
systemctl status dbus  # will return the status(active (running))
dpkg -l | grep xvfb  # will return the info

#-----------------------------------------------------------------------#

# For CentOS
yum list installed | grep Xvfb  # will return the Xvfb info
systemctl status dbus  # will return the status(active (running))
dnf list installed | grep dbus  # will return the dbus info

If so, you may install the webarena environment directly.

INSTALL_WEBARENA=true bash ./setup.sh

If not, please jump to Step 6 or Installation by Docker

(Additional) Step 6. Install the dubs and Xvfb

# You must use the sudo permission to do the following:

# For Ubuntu or Debian
# Install and start the dbus service
apt-get install dbus
/etc/init.d/dbus start

# Install ans start the Xvfb
sudo apt-get update
sudo apt-get install xvfb

INSTALL_WEBARENA=true bash ./setup.sh
#--------------------------------------------------------#

# For Centos
# Install and start the dbus service
yum install -y dbus-x11
/etc/init.d/dbus start

# Install ans start the Xvfb
yum update
yum install -y Xvfb

INSTALL_WEBARENA=true bash ./setup.sh
Click to expand docker setup procedures. (~12G, 5 minutes)

Docker info: CentOS

Step 1. Pull the docker image and run docker locally

docker pull zzh202121/agentboard:0117
docker run -itd \
    --gpus all \
    --network host \
    --name agent_space \
    --shm-size 64gb \
    -v /MODEL_PATH:/model_download \
    -v /DATA_PATH:/data \
    zzh202121/agentboard:0117 \
    /bin/bash
docker attach agent_space # YOUR_CONTAINER_NAME

Step 2. activate the env

conda activate agentboard

Step 3. Download the code and data

git clone https://github.com/hkust-nlp/AgentBoard.git  # clone repo
# Download the data and move it to the project root dir
cd AgentBoard
mkdir data
wget https://huggingface.co/datasets/hkust-nlp/agentboard/resolve/main/data.tar.gz
tar -zxvf data.tar.gz

Step 3. Build search engine index(For WebShop)

cd ./agentboard/environment/WebShop/search_engine
mkdir -p resources resources_100 resources_1k resources_100k
python convert_product_file_format.py # convert items.json => required doc format
mkdir -p indexes
./run_indexing.sh
cd ../../../

Step 4. Start web service(For Webarena)

/etc/init.d/dbus start  # start dbus
Xvfb :99 -screen 0 1280x720x24 &  # start xvfb display
export DISPLAY=:99
python -m playwright install

Setup Environment Variables in AgentBoard/.env

Environment Variables needed for AgentBoard include:

PROJECT_PATH = {path to project}/AgentBoard

ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...

TODO_KEY=...
MOVIE_KEY=...
SHEET_EMAIL=...

WANDB_API_KEY=...
Click to expand API key setup procedures.

Variables 1: API keys for Tool tasks

Since API keys for Tool tasks are private, we do not provide them in this repo.

Please follow this detailed guide to get API keys for Tool tasks.

Variables 2: Weights&Bias key for AgentBoard Online Visualization

Please paste WANDB_API_KEY from here guide in .env file to login Weights&Bias for AgentBoard Visulization.

Variables 3: API keys for Proprietary models

โš ๏ธ You don't need to setup API keys for models you don't want to use.

If you use OpenAI models, please put your API keys in .env file.

OPENAI_API_TYPE="open_ai"
OPENAI_API_KEY=${YOUR_OPENAI_API_KEY}

If you use Anthropic models, please put your API keys in .env file.

ANTHROPIC_API_KEY=${YOUR_ANTHROPIC_API_KEY}

Evaluate Models

Example script for GPT-3.5-Turbo:

python agentboard/eval_main.py \
    --cfg-path eval_configs/main_results_all_tasks.yaml \
    --tasks alfworld \
    --model gpt-3.5-turbo-0613 \
    --wandb \
    --log_path ./results/gpt-3.5-turbo-0613 \
    --project_name evaluate-gpt-35-turbo-0613 \
    --baseline_dir ./data/baseline_results

We now offer configuration for 12 SOTA LLM models (gpt-4,gpt-3.5-turbo-0613, text-davinci-003,claude2,deepseek-67b,lemur-70b, mistral-7b,codellama-13b(34b),llama2-13b(70b),vicuna-13b-16k) and a simple reflex agent based on act-only prompting. You could also customize your own agents and LLMs. Models supported by vLLM should be generally supported in AgentBoard, while different models may require specific prompt templates.

Launch AgentBoard Analytical Evaluation Panel

AgentBoard integrates illustrative Weights&Bias visualization to help researchers better systematically analyze LLM agents. You can simply turn on --wandb switch in the arguments and customize the project_name and baseline_dir of your wandb project as the evaluation command above.

Before running, you need to setup wandb login or environment variable as instructed in quick-start. The visualization results would be both stored offline at \wandb. Normally after executing the evaluation command, you can visualize the live AgentBoard panel online at https://wandb.ai/{your_wandb_id}/{project_name}. We provide example WandB logging pages for GPT-4, GPT-3.5-Turbo, and DeepSeek-67b.

Note that if your run is not logged online (on a cluster without internet), you could later sync local runs to wandb online with wandb sync [OPTIONS] [PATH].. as detailed in wandb docs. For more information about the features of the AgentBoard panel, Please kindly check this Blog for more information.

Local log files

In addition to online results viewing, local logs are automatically stored in {log_path}. In WebArena, we additionally support more detailed trajectory files, including web page screenshots and network traffic records.

Log file organization:
{log_path}
โ”œโ”€โ”€ logs                    # detailed example-wise logs for each task
โ”‚  โ”œโ”€โ”€ webarena_tracks      # WebArena provided rendered HTML files of the execution trace and a './trace' folder which is automatically generated with Playwright
โ”‚  โ”‚  โ”œโ”€โ”€ traces
โ”‚  โ”‚  โ”‚  โ”œโ”€โ”€ 102.zip
โ”‚  โ”‚  โ”œโ”€โ”€ render_102.html
โ”‚  โ”‚  โ”œโ”€โ”€ ...
โ”‚  โ”œโ”€โ”€ alfworld.jsonl       # each line is a json dictionary logging the statistics, trajectory, and prompt for each example
โ”‚  โ”œโ”€โ”€ babyai.jsonl
โ”‚  โ”œโ”€โ”€ ...
โ”œโ”€โ”€ all_results.txt         # overall metrics for each task
โ”œโ”€โ”€ dimension.txt           # agent capability dimensional scores for current LLM agent
โ”œโ”€โ”€ alfworld.txt            # a general log for example-wise statisitcs for each task
โ”œโ”€โ”€ babyai.txt              
โ””โ”€โ”€ ...              

Data

Data Overview

AgentBoard is composed of 9 diverse tasks which can be divided into 4 types, including Embodied AI, Game, Web, and Tool:

Embodied AI Game Web Tool
  • AlfWorld
  • ScienceWorld
  • BabyAI
  • Jericho
  • PDDL
  • WebShop
  • WebArena
  • Tool-Query
  • Tool-Operation

To help researchers quickly understand evaluation data of each task, we provide Dataset Viewer at Huggingface Dataset: ๐Ÿค— AgentBoard.

Note: Please download the dataset from the link provided below for the reason that the data in Dataset Viewer is not complete.

Download Link

You can download the whole evaluation data by running the following command:

wget https://huggingface.co/datasets/hkust-nlp/agentboard/resolve/main/data.tar.gz

Please uncommpress the file and move the data to AgentBoard/data.

cd AgentBoard
mkdir data
tar -zxvf data.tar.gz

The file structure of evaluation data is as follows:

Click to expand the file structure
data
โ”œโ”€โ”€ baseline_results
โ”œโ”€โ”€ alfworld
โ”‚   โ”œโ”€โ”€ alfred.pddl # additional data for alfworld
โ”‚   โ”œโ”€โ”€ alfred.twl2 # additional data for alfworld
โ”‚   โ”œโ”€โ”€ json_2.1.1  # additional data for alfworld
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ babyai
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ jericho
โ”‚   โ”œโ”€โ”€ test.jsonl
โ”‚   โ””โ”€โ”€ z-machine-games-master  # additional data for jericho
โ”œโ”€โ”€ pddl
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ scienceworld
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ tool-operation
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ tool-query
โ”‚   โ”œโ”€โ”€ academia  # additional data for academia tool
โ”‚   โ””โ”€โ”€ test.jsonl
โ”œโ”€โ”€ webarena
โ”‚   โ””โ”€โ”€ test.jsonl
โ””โ”€โ”€ webshop
    โ””โ”€โ”€ test.jsonl

**We also provide baseline run loggings in data/baseline_results, which can be used for visualization in our panel. **

Evaluation Details

Evaluation Preparation

Internet Access

For regions with Internet restrictions, to evaluate the Tool-Query, Tool-Operation and WebArena tasks, please make sure that the machine can access the Internet.

You can check whether you have network issues by observing the output during the execution process.

Environment Preparation

We provide two ways to install the environment of AgentBoard, as specified in QuickStart.

Running Proprietary Models

In this section, we provide a script to evaluate the closed-source models on each task.

Please do not forget to set the environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) before running the following commands.

For Tasks except WebShop

We provide a quick start script to evaluate the gpt-3.5-turbo-0613 model on alfworld task.

python agentboard/eval_main.py \
    --cfg-path eval_configs/main_results_all_tasks.yaml \
    --tasks alfworld \
    --model gpt-3.5-turbo-0613 \
    --wandb \
    --log_path ./results/gpt-3.5-turbo-0613 \
    --project_name evaluate-gpt-35-turbo-0613 \
    --baseline_dir ./data/baseline_results

Parameters:

  • --cfg-path: The path of the config file, please refer to eval_configs/main_results_all_tasks.yaml for more details.
  • --tasks: The tasks to be evaluated, e.g. tool-query, tool-operation, webarena, alfworld, babyai, jericho, pddl, scienceworld.
  • --model: The LLM to be evaluated. We provide some LLM models, including:
    • gpt-3.5-turbo
    • gpt-3.5-turbo-16k
    • gpt-4
    • text-davinci-003
    • claude2
  • wandb: Online visualization will be launched given this parameter. Remove this parameter from the script if you don't need visualization, e.g. during debugging.
  • log_path: Path to save logs, as specified here.
  • project_name: Project name for Weights&Bias. This parameter is not necessary when wandb parameter is not used.
  • baseline_dir: Directory to results files of baseline models you want to compare with during the run.

For WebShop

First, please start the WebShop server by running the following commands:

cd ./agentboard/environment/WebShop
bash ./run_dev.sh
cd ../../..

Then, run the following command to evaluate the gpt-3.5-turbo-0613 model on webshop task.

python agentboard/eval_main.py \
    --cfg-path eval_configs/main_results_all_tasks.yaml \
    --tasks webshop \
    --model gpt-3.5-turbo-0613 \
    --wandb \
    --log_path ./results/gpt-3.5-turbo-0613 \
    --project_name evaluate-gpt-35-turbo-0613 \
    --baseline_dir ./data/baseline_results

Running Open-source Models

In AgentBoard, we have pre-supported the following 8 open-source models, by default we use vLLM to speed up inference.

  • llama2-13b
  • llama2-34b
  • codellama-13b
  • codellama-34b
  • vicuna-13b-16k
  • lemur-70b
  • deepseek-67b
  • mistral-7b

Please refer to eval_configs/main_results_all_tasks.yaml for more details about these models.

To evaluate these models, you can run the following command:

python agentboard/eval_main.py \
    --cfg-path eval_configs/main_results_all_tasks.yaml \
    --tasks ${TASK_NAME} \
    --model ${OPEN_SOURCE_MODEL_NAME}

We also provide LLM customizations, please refer to LLM Customization for more details.

LLM Customization

Please refer to llm_customization.md for more details about LLM customization.

Agent Customization

Please refer to agent_customization.md for more details about agent customization.

Runtime Estimation

The evaluation runtime for a language model depends on the device/API, model, and inference architecture used. In the case of open-source LLMs, the vllm inference speed is approximately 10 times faster than the huggingface pipeline.

To estimate the total time needed for evaluation, you can run a few steps to measure the inference speed and multiply it by the total number of LLM inferences, which is within 15,000 rounds.

The general formula for estimating the total time is 4h * speed. Here are some examples of our runtime:

Model Device/API Inference Architecture Inference Speed Total-time
GPT4 azure API - 1.5s/round 5.5h
GPT-3.5-Turbo azure API - 1s/round 3h
DeepSpeed-67b 8*V100 vllm 5s/round 18.5h
Llama2-70b 8*V100 vllm 8s/round 28h
Llama2-70b 4*A100 vllm 4s/round 13.5h

๏ธCitation

If you find this repository useful, please consider giving star and citing our paper:

@misc{ma2024agentboard,
      title={AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents}, 
      author={Chang Ma and Junlei Zhang and Zhihao Zhu and Cheng Yang and Yujiu Yang and Yaohui Jin and Zhenzhong Lan and Lingpeng Kong and Junxian He},
      year={2024},
      eprint={2401.13178},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Apache-2.0 license

The AgentBoard codebase is licensed under a Apache-2.0 License.

GPL-2.0

The AgentBoard dataset is licensed under a GNU General Public License, version 2.

agentboard's People

Contributors

chang-github-00 avatar jxhe avatar yc1999 avatar zzh068 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

agentboard's Issues

reactagent็‰ˆๆœฌๆ˜ฏ่ฟ˜ๆฒกๆœ‰ไธŠไผ ๅ—๏ผŸ

ๅฆ‚้ข˜๏ผŒreactagent็‰ˆๆœฌๅฅฝๅƒไธๅคชๅฏน. ้™คไบ†toolsไปฅๅค–๏ผŒๅ…ถไป–็š„react็‰ˆๆœฌไผšไธŠไผ ๅ—๏ผŸ๏ผˆ่€Œไธ”่ฒŒไผผreactagent่ฆๆ”นไธ€ไธ‹๏ผŒๆ”นๆˆ็”จๆญฃๅˆ™่กจ่พพๅผๆŠฝๅ–็š„ๅฝขๅผ๏ผ‰

SRๅฅฝๅƒๆœ‰ไธ€ไบ›bug

jericho
"Interaction Turn 17": {
"Action": "Push other to sword",
"Observation": "(the gleaming silver sword to the gleaming silver sword) Dropped. You pass out due to lack of oxygen and never wake. *** You have died *** In that game you scored 0 out of a possible 90, in 10 turns, earning you the rank of Primordial Ooze Candidate. Would you like to RESTART, RESTORE a saved game, give the FULL score for that game or QUIT",
"Progress Rate": 0
}
SRไผš็ป™1

GPT-4 model missing

python agentboard/eval_main.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks scienceworld --model gpt-4 --log_path ./results/gpt-4 --project_name evaluate-gpt-4 \

I try to test the tasks with GPT-4, but get the following errors:

File "/home/ycchen/miniconda3/envs/Agentboard/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'OPENAI_API_TYPE'

pddlๅ’Œjericho็š„check valid actionๅฏ่ƒฝๆœ‰bug

image pddlๅ’Œjericho็š„check valid actionๅฅฝๅƒๆ— ๆณ•็กฎไฟไธ€่‡ดๆ€งใ€‚ๅฏ่ƒฝๅฏผ่‡ดๆœ€็ปˆ็ป“ๆžœๅ‡บ็Žฐ่พƒๅคงๅๅทฎ๏ผˆๆœ‰ไบ›case่ƒฝpr่ƒฝๅทฎ0.5ไธช็‚น๏ผ‰ ๅฆ‚ๅ›พ๏ผŒๅ‰้ข็š„actionไธ€ๆ ท็š„ๆƒ…ๅ†ตไธ‹่ฟ™้‡Œ็ป™ๅ‡บๅฏๆ‰ง่กŒ่กŒๅŠจ็š„้กบๅบไธไธ€ๆ ท ๆˆ‘ๆ นๆฎwarning่ฎพ็ฝฎไบ†export TOKENIZERS_PARALLELISM=false๏ผŒ่ฟ™้œ€่ฆไฟฎๆ”นๅ—๏ผŸๆˆ–่€…ๆ€Žไนˆ่ƒฝ่งฃๅ†ณ่ฟ™็งไธไธ€่‡ดๅ‘ข๏ผŸ

[Refactor Request] Structured objects/typing

Amazing work here! Having tests for agents is super valuable.

Just a note to encourage contributions from other developers -- it would help to have more typing, as well as base classes for things.

Trying to wrap llama-index in this framework has been slightly confusing, since there are no types and I have to read a ton of code and debug line-by-line to see what the inputs to things are :)

webshopๅฅฝๅƒไนŸๆœ‰ไธ€ไบ›bug

{
"id": 3,
"task_name": "webshop",
"goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars",
"difficulty": "easy",
"is_done": false,
"progress_rate": 0.9166666666666666,
"grounding_acc": 1.0,
"score_change_record": [
[
1,
0.3333333333333333
],
[
2,
0.6666666666666666
],
[
4,
0.9166666666666666
]
],
"trajectory": {
"Interaction Turn 0": {
"Goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars",
"Observation": "WEB PAGE: {Let us begin shopping \n[Search]}",
"Action": "reset[]",
"Progress Rate": 0.0
},
"Interaction Turn 1": {
"Action": "search[gluten free vegetarian smoked peppered bacon 4 ounce pack of 2 price < 60.00]",
"Observation": "WEB PAGE: {\n[Back to Search] \nPage 1 (Total results: 15) \n[Next >] \n[B06Y96MXJV] \nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B06Y96N1KG] \nSpicy Hot Pepper Sea Salt 3-Pack: Ghost Pepper, Jalapeno, Habanero - All Natural, Delicious Blends of Peppers & Sea Salt - Gluten-Free, Kosher, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B07GJTKYJQ] \nLouisville Vegan Jerky - 5 Flavor Variety Pack, Non-GMO Soy Protein, Gluten-Free (Black Pepper, Buffalo Dill, Pepperoni, Maple Bacon, and Carolina BBQ, 3 Ounces) \n$42.99 }",
"Progress Rate": 0.3333333333333333
},
"Interaction Turn 2": {
"Action": "click[B06Y96MXJV]",
"Observation": "WEB PAGE: {\n[Back to Search] \n[< Prev] \nflavor [cocktail salts][dessert toppers][exotic][finishing salts][foodie gift][french blends][french classics][fruits & flowers][garlic lover's][gourmet grillmaster][hawaiian][herb garden][i love you & food][italian classics][italian flavors][keto smoked][mediterranean diet][natural salts][olive oil dippers][pacific cooking][paleo][pegan smoked][pizza salts][smoked #1][smoked bacon][smoked classics][smoked grillmaster][spicy chili peppers][steak rubs][summer fruits][taco seasonings][the pepper][truffle][tuxedo][vegan smoked][vegetarian smoked]\nsize [4 ounce (pack of 3)][3 piece assortment][4 ounce (pack of 2)][4 ounce (pack of 6)][4 ounce (3 count)][4 ounce (6 count)]\nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \nPrice: $29.99 \nRating: N.A. \n[Description] \n[Features] \n[Reviews] \n[Attributes] \n[Buy Now] }",
"Progress Rate": 0.6666666666666666
},
"Interaction Turn 3": {
"Action": "click[4 ounce (pack of 2)]",
"Observation": "WEB PAGE: {You have clicked 4 ounce (pack of 2).}",
"Progress Rate": 0.6666666666666666
},
"Interaction Turn 4": {
"Action": "click[Buy Now]",
"Observation": "WEB PAGE: {Result: [False]}",
"Progress Rate": 0.9166666666666666
}
}
}

{
"id": 3,
"task_name": "webshop",
"goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars",
"difficulty": "easy",
"is_done": false,
"progress_rate": 0.75,
"grounding_acc": 1.0,
"score_change_record": [
[
1,
0.3333333333333333
],
[
2,
0.6666666666666666
],
[
4,
0.75
]
],
"trajectory": {
"Interaction Turn 0": {
"Goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars",
"Observation": "WEB PAGE: {Let us begin shopping \n[Search]}",
"Action": "reset[]",
"Progress Rate": 0.0
},
"Interaction Turn 1": {
"Action": "search[gluten free vegetarian smoked peppered bacon 4 ounce pack of 2 price < 60.00]",
"Observation": "WEB PAGE: {\n[Back to Search] \nPage 1 (Total results: 15) \n[Next >] \n[B06Y96MXJV] \nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B06Y96N1KG] \nSpicy Hot Pepper Sea Salt 3-Pack: Ghost Pepper, Jalapeno, Habanero - All Natural, Delicious Blends of Peppers & Sea Salt - Gluten-Free, Kosher, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B07GJTKYJQ] \nLouisville Vegan Jerky - 5 Flavor Variety Pack, Non-GMO Soy Protein, Gluten-Free (Black Pepper, Buffalo Dill, Pepperoni, Maple Bacon, and Carolina BBQ, 3 Ounces) \n$42.99 }",
"Progress Rate": 0.3333333333333333
},
"Interaction Turn 2": {
"Action": "click[B06Y96MXJV]",
"Observation": "WEB PAGE: {\n[Back to Search] \n[< Prev] \nflavor [cocktail salts][dessert toppers][exotic][finishing salts][foodie gift][french blends][french classics][fruits & flowers][garlic lover's][gourmet grillmaster][hawaiian][herb garden][i love you & food][italian classics][italian flavors][keto smoked][mediterranean diet][natural salts][olive oil dippers][pacific cooking][paleo][pegan smoked][pizza salts][smoked #1][smoked bacon][smoked classics][smoked grillmaster][spicy chili peppers][steak rubs][summer fruits][taco seasonings][the pepper][truffle][tuxedo][vegan smoked][vegetarian smoked]\nsize [4 ounce (pack of 3)][3 piece assortment][4 ounce (pack of 2)][4 ounce (pack of 6)][4 ounce (3 count)][4 ounce (6 count)]\nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \nPrice: $29.99 \nRating: N.A. \n[Description] \n[Features] \n[Reviews] \n[Attributes] \n[Buy Now] }",
"Progress Rate": 0.6666666666666666
},
"Interaction Turn 3": {
"Action": "click[4 ounce (pack of 2)]",
"Observation": "WEB PAGE: {You have clicked 4 ounce (pack of 2).}",
"Progress Rate": 0.6666666666666666
},
"Interaction Turn 4": {
"Action": "click[Buy Now]",
"Observation": "WEB PAGE: {Result: [False]}",
"Progress Rate": 0.75
}
}
}

ๅŒๆ—ถsampleไธคๆฌก๏ผŒ่ฟ‡็จ‹ไธ€ๆ ท๏ผŒ็ป“ๆžœๅดไธไธ€ๆ ท๏ผŸ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.