GithubHelp home page GithubHelp logo

nhsjgczryf / tora Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/tora

0.0 0.0 0.0 12.2 MB

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools.

Home Page: https://microsoft.github.io/ToRA/

License: MIT License

Shell 3.95% Python 96.05%

tora's Introduction

ToRA
ToRA: A Tool-Integrated Reasoning Agent


PWC

[๐ŸŒ Website] โ€ข [๐Ÿ“œ Paper] โ€ข [๐Ÿค— HF Models] โ€ข [๐Ÿฑ GitHub]
[๐Ÿฆ Twitter] โ€ข [๐Ÿ’ฌ Reddit] โ€ข [๐Ÿ€ Unofficial Blog]

Repo for "ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving"


Figure 1: Comparing ToRA with baselines on LLaMA-2 base models from 7B to 70B.

๐Ÿ”ฅ News

  • [2023/10/08] ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ All ToRA models released at ๐Ÿค— HuggingFace!!!
  • [2023/09/29] ToRA paper, repo, and website released.

๐Ÿ’ก Introduction

ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical reasoning problems by interacting with tools, e.g., computation libraries and symbolic solvers. ToRA series seamlessly integrate natural language reasoning with the utilization of external tools, thereby amalgamating the analytical prowess of language and the computational efficiency of external tools.

Model Size GSM8k MATH AVG@10 math tasksโ€ 
GPT-4 - 92.0 42.5 78.3
GPT-4 (PAL) - 94.2 51.8 86.4
ToRA ToRA-7B 7B 68.8 40.1 62.4
ToRA ToRA-Code-7B 7B 72.6 44.6 66.5
ToRA-Code-7B + self-consistency (k=50) 7B 76.8 52.5 -
ToRA ToRA-13B 13B 72.7 43.0 65.9
ToRA ToRA-Code-13B 13B 75.8 48.1 71.3
ToRA-Code-13B + self-consistency (k=50) 13B 80.4 55.1 -
ToRA ToRA-Code-34B* 34B 80.7 51.0 74.8
ToRA-Code-34B + self-consistency (k=50) 34B 85.1 60.0 -
ToRA ToRA-70B 70B 84.3 49.7 76.9
ToRA-70B + self-consistency (k=50) 70B 88.3 56.9 -
  • *ToRA-Code-34B is currently the first and only open-source model to achieve over 50% accuracy (pass@1) on the MATH dataset, which significantly outperforms GPT-4โ€™s CoT result (51.0 vs. 42.5), and is competitive with GPT-4 solving problems with programs. By open-sourcing our codes and models, we hope more breakthroughs will come!

  • โ€ 10 math tasks include GSM8k, MATH, GSM-Hard, SVAMP, TabMWP, ASDiv, SingleEQ, SingleOP, AddSub, and MultiArith.

Tool-Integrated Reasoning


Figure 2: A basic example of single-round tool interaction, which interleaves rationales with program-based tool use.

ToRA Training Pipeline


Figure 3: Training ToRA contains โ‘  Imitation Learning, and โ‘ก output space shaping.

๐Ÿš€ Quick Start

โš™๏ธ Setup

We recommend using Conda to manage your environment. We use vLLM (0.1.4) to accelerate inference. Run the following commands to setup your environment:

git clone https://github.com/microsoft/ToRA.git && cd ToRA/src
conda create -n tora python=3.10
conda activate tora
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118 # CUDA 11.8 for example
pip install -r requirements.txt

๐Ÿช Inference

We provide a script for inference, simply config the MODEL_NAME_OR_PATH and DATA in src/scripts/infer.sh and run the following command:

bash scritps/infer.sh

We also open-source the model outputs from our best models (ToRA-Code-34B and ToRA-70B) in the src/outputs/ folder.

โš–๏ธ Evaluation

The src/eval/grader.py file contains the grading logic that assesses the accuracy of the predicted answer by comparing it to the ground truth. This logic is developed based on the Hendrycks' MATH grading system, which we have manually verified on the MATH dataset to minimize false positives and false negatives.

To evaluate the predicted answer, run the following command:

python -m eval.evaluate \
    --data_name "math" \
    --prompt_type "tora" \
    --file_path "outputs/llm-agents/tora-code-34b-v1.0/math/test_tora_-1_seed0_t0.0_s0_e5000.jsonl" \
    --execute

then you will get:

Num samples: 5000
Num scores: 5000
Timeout samples: 0
Empty samples: 2
Mean score: [51.0]
Type scores: {'Algebra': 67.3, 'Counting & Probability': 42.2, 'Geometry': 26.1, 'Intermediate Algebra': 40.0, 'Number Theory': 59.3, 'Prealgebra': 63.8, 'Precalculus': 34.2}

โšก๏ธ Training

Due to some restrictions, ToRA-Corpus 16k is under review and cannot be released immediately. However, we open-source our complete training scripts as well as output space shaping pipelines for the community, and you may construct your own dataset for training.

To train a model, run the following command:

bash scritps/train.sh codellama 7b

โ˜•๏ธ Citation

If you find this repository helpful, please consider citing our paper:

@misc{gou2023tora,
      title={ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving}, 
      author={Zhibin Gou and Zhihong Shao and Yeyun Gong and yelong shen and Yujiu Yang and Minlie Huang and Nan Duan and Weizhu Chen},
      year={2023},
      eprint={2309.17452},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

๐Ÿ€ Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

๐ŸŒŸ Star History

Star History Chart

tora's People

Contributors

zubingou avatar microsoftopensource avatar zhihongshao avatar microsoft-github-operations[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.