GithubHelp home page GithubHelp logo

allenai / lumos Goto Github PK

View Code? Open in Web Editor NEW
411.0 9.0 24.0 14.44 MB

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

Home Page: https://allenai.github.io/lumos

License: MIT License

Python 91.84% Shell 2.71% CSS 0.74% HTML 4.72%
decision-making grounding language-agent maths planning question-answering reasoning web-agent

lumos's Introduction

๐Ÿช„ Agent Lumos: Unified and Modular Training for Open-Source Language Agents

๐Ÿ–‹ Authors: Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin

We introduce ๐Ÿช„Lumos, Language Agents with Unified Data Formats, Modular Design, and Open-Source LLMs. Lumos unifies a suite of complex interactive tasks and achieves competitive performance with GPT-4/3.5-based and larger open-source agents.

โ€ผ๏ธ Lumos has following features:

  • ๐Ÿงฉ Modular Architecture:
    • ๐Ÿงฉ Lumos consists of planning, grounding, and execution modules built based on LLAMA-2-7B/13B and off-the-shelf APIs.
    • ๐Ÿค— Lumos utilizes a unified data format that encompasses multiple task types, thereby enabling the developed agent framework to conveniently support a range of interactive tasks.
  • ๐ŸŒ Diverse Training Data:
    • ๐ŸŒ Lumos is trained with ~56K diverse high-quality subgoal/action annotations from ground-truth reasoning steps in existing benchmarks with GPT-4.
    • โš’๏ธ Lumos data can be instrumental for future research in developing open-source agents for complex interactive tasks.
  • ๐Ÿš€ Competitive Performance:
    • ๐Ÿš€ Lumos is comparable or even beats GPT-series agents on web/complex QA tasks Mind2Web and HotpotQA, and larger open agents on math and multimodal tasks.
    • ๐Ÿš€ Lumos exceeds contemporaneous agents that have been fine-tuned with in-domain HotpotQA, Mind2Web and ScienceQA annotations, such as FiReAct, AgentLM, and AutoAct.
    • ๐Ÿš€ Lumos performs better than open agent baseline formulations including chain-of-thoughts and integrated training.
    • ๐Ÿš€ Lumos surpasses larger open LLM agents and domain-specific agents on unseen tasks, WebShop and InterCode_SQL.

๐Ÿคฉ Citation

If you find this work is relevant with your research, please feel free to cite our work!

@article{yin2023lumos,
  title={{Agent Lumos: Unified and Modular Training for Open-Source Language Agents}},
  author={Yin, Da and Brahman, Faeze and Ravichander, Abhilasha and Chandu, Khyathi and Chang, Kai-Wei and Choi, Yejin and Lin, Bill Yuchen},
  journal={arXiv preprint arXiv:2311.05657},
  year={2023}
}

๐Ÿ”ฅ News

  • [2024, Mar 18] We release the latest Lumos version:
    • ๐Ÿ“‘ Lumos paper that covers new multimodal tasks and 13B-scale model experiments
    • ๐Ÿค— Lumos demo that illustrates Lumos planning and grounding processes
  • [2023, Nov 8] We release the important items for training and evaluating Lumos:
    • ๐Ÿ’ป Lumos code for annotation generation, training and evaluation
    • ๐Ÿค— Lumos checkpoints with 7B model size
    • ๐Ÿค— Lumos training annotations and their raw data

๐Ÿงฉ Architecture

๐Ÿ› ๏ธ Setup

./setup.sh

Please make sure that the cudatoolkit version in setup.sh aligns with your local cuda version.

Training

๐Ÿ“ˆ Training Data Download

We collect all the training annotations, raw data and prompt converted annotations in a single Google Drive folder. It can be downloaded by

cd data
python -c "import gdown; gdown.download_folder('https://drive.google.com/drive/folders/1ASFhOkhezgewVxR01dQg-8KUVR8IdBlY?usp=sharing', quiet=True)" 

We also provide generated annotations for planning and grounding modules in ๐Ÿค— Huggingface Datasets.

Dataset Names ๐Ÿค— Huggingface Links
lumos_complex_qa_iterative Planning, Grounding
lumos_complex_qa_onetime Planning, Grounding
lumos_web_agent_iterative Planning, Grounding
lumos_multimodal_iterative Planning, Grounding
lumos_maths_iterative Planning, Grounding
lumos_maths_onetime Planning, Grounding
lumos_unified_iterative Planning, Grounding

๐Ÿง‘โ€๐ŸŽ“๏ธ Train Modules with Generated Annotation

./train.sh [MODULE] [FORMULATION]

[MODULE] can be either plan or ground. [FORMULATION] can be either iterative or onetime.

You can adjust the fine-tuning hyperparameters and specific task you want to fine-tune in the training scripts such as finetune_llama2_plan_iterative.sh in scripts/train.

We also provide the fine-tuned planning and grounding module checkpoints in ๐Ÿค— Huggingface.

Model Names ๐Ÿค— Huggingface Links
lumos_complex_qa_iterative Planning, Grounding
lumos_complex_qa_iterative-13B Planning, Grounding
lumos_complex_qa_onetime Planning, Grounding
lumos_web_agent_iterative Planning, Grounding
lumos_web_agent_iterative-13B Planning, Grounding
lumos_maths_iterative Planning, Grounding
lumos_maths_onetime Planning, Grounding
lumos_maths_onetime-13B Planning, Grounding
lumos_unified_iterative Planning, Grounding
lumos_unified_iterative-13B Planning, Grounding

โœ… Evaluation

Evaluation scripts for different datasets are under scripts/eval. For example, you can evaluate Lumos on HotpotQA by running:

./scripts/eval/hotpotqa.sh

Others

๐Ÿ“ˆ Data Annotation Generation

We provide the code for generating training annotations based on raw existing benchmarks from scratch.

Before generating annotations, we first need to download the existing benchmarks providing ground-truth intermediate reasoning steps. The raw data are can be downloaded via this Google Drive folder.

python -m data.prompt_convertion \
  --domain DOMAIN \
  --data_fn DATA_FN \
  --convert_all

domain covers maths, complex QA, web agent, multimodal. data_fn is the path where raw benchmarks are stored.

For multimodal task annotation generation, please download COCO 2017 train images in data/train/multimodal/raw_data and unzip it.

โค๏ธ Acknowledgement

We greatly thank Tulu team for providing awesome code to finetune LLAMA-2. We also sincerely appreciate the contributors of zeno-build, Mind2Web, and WebShop for providing fast GPT prompting, HTML preprocessing and evaluation docker environment.

lumos's People

Contributors

wadeyin9712 avatar yuchenlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lumos's Issues

Merge "Planning" and "Grounding" data

Thanks for the valuable work!

I noticed that there are distinct models for both planning and grounding tasks, specifically the lumos_unified_plan_iterative and lumos_unified_ground_iterative

I'm interested in knowing whether it's feasible to combine the datasets for planning and grounding and then use them to train a single model that can perform both functions. If this approach is viable, what specific adjustments would be necessary? And if it's not advisable, could you shed some light on why that might be the case?

how to generalize to alfworld

I am curious about how to organize the prompt when encountering tasks like alfworld. Because several tasks mentioned in the paper do not have a strong dependency on environment feedback, while alfworld requires finding things based on environment feedback. The number of actions needed for the grounding module in alfworld is not fixed. Therefore, I am troubled by how to use the trained model from the paper to evaluate alfworld.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.