GithubHelp home page GithubHelp logo

chanliang / conner Goto Github PK

View Code? Open in Web Editor NEW
29.0 2.0 1.0 16.14 MB

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

Home Page: https://arxiv.org/abs/2310.07289

Python 67.50% Shell 32.50%
llm-evaluation hallucinations emnlp2023 large-language-models factuality nlg-evaluation chatgpt llama

conner's Introduction

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Welcome to the repository for our EMNLP 2023 paper, "Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators." In this work, we introduce CONNER (COmpreheNsive kNowledge Evaluation fRamework), a systematic approach designed to evaluate the output of Large Language Models (LLMs) across key dimensions such as Factuality, Relevance, Coherence, Informativeness, Helpfulness, and Validity.

Here, you'll find the necessary code and resources to replicate our findings and further explore the potential of LLMs. We hope they help facilitate your work in exploring the frontiers of LLMs with a touch of ease.

CONNER Framework

Intrinsic Evaluation

  • Factuality: Assessing the verifiability of the information against external evidence.
  • Relevance: Ensuring the knowledge aligns with the user's query intent.
  • Coherence: Evaluating the logical flow of information at both sentence and paragraph levels.
  • Informativeness: Measuring the novelty or unexpectedness of the knowledge provided.

Extrinsic Evaluation

  • Helpfulness: Gauging whether the knowledge aids in enhancing performance on downstream tasks.
  • Validity: Certifying the factual accuracy of downstream task results when utilizing the knowledge.

Getting Started

Setting Up the Environment

Begin by setting up your Conda environment with the provided environment.yaml file, which will install all necessary packages and dependencies.

conda env create -f env/environment.yaml -n CONNER
conda activate CONNER

If you run into any missing packages or dependencies, please install them as needed.

Evaluating Your LLMs

Run the evaluation script that corresponds to your dataset and chosen metric. Replace ${data} with your dataset choice (nq or wow) and ${metric} with one of the following metrics: factuality, relevance, info, coh_sent, coh_para, validity, helpfulness.

# Run evaluation script. Example usage:
# bash scripts/nq_factuality.sh
# bash scripts/wow_relevance.sh
bash scripts/${data}_${metric}.sh

Viewing Results

Once you have completed the evaluation, you can easily view the results with our provided script:

# Display the evaluation results. Example usage:
# bash scripts/nq_factuality_view.sh
# bash scripts/wow_relevance_view.sh
bash scripts/${data}_${metric}_view.sh

Model Sources

Below is a list of models utilized in our CONNER framework for each metric:

Metric Model Source
Factuality NLI-RoBERTa-large, ColBERTv2 Hugging Face, GitHub
Relevance BERT-ranking-large GitHub
Sentence-level Coherence GPT-neo-2.7B Hugging Face
Paragraph-level Coherence Coherence-Momentum Hugging Face
Informativeness GPT-neo-2.7B Hugging Face
Helpfulness LLaMA-65B GitHub
Validity NLI-RoBERTa-large, ColBERTv2 Hugging Face, GitHub

Citing Our Work

If you find our work helpful in your research, please citing our paper:

@misc{chen2023factuality,
      title={Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators}, 
      author={Liang Chen and Yang Deng and Yatao Bian and Zeyu Qin and Bingzhe Wu and Tat-Seng Chua and Kam-Fai Wong},
      year={2023},
      eprint={2310.07289},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

conner's People

Contributors

chanliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

just4jc

conner's Issues

关于公式上的一些请教

作者您好,请问论文中的公式提到的NLI()指的是ColbertV2这类模型吗,例如Factuality这部分公式计算具体是如何操作得到的,有一点搞不清楚,烦请作者赐教!感谢!

Code missing!

Thanks for sharing your work. Please let me know by when it will be possible to share the code.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.