FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

arXiv preprint: https://arxiv.org/abs/2310.20410

we introduce FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs.

FollowBench comprehensively includes five different types (i.e., Content, Situation, Style, Format, and Example) of fine-grained constraints.
To enable a precise constraint following estimation on diverse difficulties, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level.
To evaluate whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with constraint-evolution paths to handle challenging open-ended instructions.

Data

The data of FollowBench can be found in the data/.

How to Implement

Install Dependencies

conda create -n followbench python=3.10
conda activate followbench
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Model Inference

cd FollowBench/
python code/model_inference.py --model-path <model_name_or_path>

Evaluation

You should first use GPT-4's API to acquire the LLM-based evaluation results, then we can organize and merge the rule-based evaluation results and LLM-based evaluation results using the following script:

cd FollowBench/
python code/eval.py --model_names <a_list_of_evaluated_models>

Experiments

By evaluating 10 closed-source and open-source popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.

Citation

Please cite our paper if you use the code in this repo.

@misc{jiang2023followbench,
      title={FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models}, 
      author={Yuxin Jiang and Yufei Wang and Xingshan Zeng and Wanjun Zhong and Liangyou Li and Fei Mi and Lifeng Shang and Xin Jiang and Qun Liu and Wei Wang},
      year={2023},
      eprint={2310.20410},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

sfrrrr / followbench Goto Github PK

followbench's Introduction

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

Data

How to Implement

Install Dependencies

Model Inference

Evaluation

Experiments

Citation

followbench's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs