haelm's Introduction

HaELM

An automatic MLLM hallucination detection framework

1. Installing

Install peft

$ pip install git+https://gitclone.com/github.com/huggingface/peft.git -i https://pypi.mirrors.ustc.edu.cn/simple --trusted-host=pypi.mirrors.ustc.edu.cn

2. Preparing

Download the checkpoint of llama-7b-hf

3. Training

We provide the hallucination training dataset in "data/train_data.jsonl" and the manually labeled validation set in "data/eval_data.jsonl". If you want to:

Retrain
Use another scale of llama
Use llama-2
Use additional data

see here.

Modify the path in lines 19-21 of finetune.py
Run the command below

python finetune.py

4. Interface

We provide interface templates populated by the output of mPLUG-Owl in "LLM_output/mPLUG_caption.jsonl".

Modify the path in lines 14-16 of interface.py
Run the command below

python interface.py

5. Citation

@article{wang2023evaluation,
  title={Evaluation and Analysis of Hallucination in Large Vision-Language Models},
  author={Wang, Junyang and Zhou, Yiyang and Xu, Guohai and Shi, Pengcheng and Zhao, Chenlin and Xu, Haiyang and Ye, Qinghao and Yan, Ming and Zhang, Ji and Zhu, Jihua and others},
  journal={arXiv preprint arXiv:2308.15126},
  year={2023}
}

haelm's People

Contributors

Stargazers

Watchers

haelm's Issues

How to use HaELM to eval MLLM's hallucination, like LLaVA?

你好，我运行了inference.py这个文件，它的输出都是'yes' or 'no'，文章中给的指标是F1 score等，如何计算出呢？看到prompt给的是 prompt += "\nIs our caption accurate?\n"，所以是不是理解为accuracy =(yes)/(yes+no)？并且是否可以提供py文件，以便我们评测更多的MLLM模型而不是只能使用mplug.json?

Recommend Projects