GithubHelp home page GithubHelp logo

consistest's Introduction

consistest

A conversational benchmark for factual consistency

This is the repository for the paper: What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation

Instructions

With the code you can evaluate the factual consistency for a generative model using the dataset and hybrid pipeline described in paper.

Evaluate a model

  1. Finetune the model on PersonaChat train set (not necessary but for a fair comparisson).
  2. Run: python evaluate.py --model_checkpoint={path to your model} --model_sep_tokens={your model's special tokens}. The second parameter is the list of potential special tokens used in finetuning to separate agent and user utterances, and defaults to ["<user>", "<agent>"].
  3. The code first curates the Consistest dataset and then runs inference and evaluation. The results are saved in the results folder. Note that the code assumes your model and tokenizer can be read using the .from_pretrained() method in transformers library.

Evaluate responses

You can also de the evaluation on already generated responses (as a .csv file) by running: python evaluate.py --eval_only=True --responses_to_eval={path to response file}. Note that the .csv file should follow the expected format; i.e. Question, Response, Reference, NLI_Ref, Source, Type, Distance as columns.


In both cases since the code uses the trainer method from transformers for NLI inference, all available GPUs will be used. This can be changed with CUDA_VISIBLE_DEVICES={list of GPU ids} in front of the python evaluate.py line.

consistest's People

Contributors

elotfi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.