GithubHelp home page GithubHelp logo

libr-ai / openfactverification Goto Github PK

View Code? Open in Web Editor NEW
892.0 4.0 38.0 40.74 MB

Loki: Open-source solution designed to automate the process of verifying factuality

Home Page: https://loki.librai.tech/

License: MIT License

Python 91.64% HTML 8.36%
ai factuality hallucination

openfactverification's Introduction

Loki: An Open-source Tool for Fact Verification

Overview

Loki is our open-source solution designed to automate the process of verifying factuality. It provides a comprehensive pipeline for dissecting long texts into individual claims, assessing their worthiness for verification, generating queries for evidence search, crawling for evidence, and ultimately verifying the claims. This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information. To stay updated, please subscribe to our newsletter at our website or join us on Discord!

Quick Start

Clone the repository and navigate to the project directory

git clone https://github.com/Libr-AI/OpenFactVerification.git
cd OpenFactVerification

Installation with poetry (option 1)

  1. Install Poetry by following it installation guideline.
  2. Install all dependencies by running:
poetry install

Installation with pip (option 2)

  1. Create a Python environment at version 3.9 or newer and activate it.

  2. Navigate to the project directory and install the required packages:

pip install -r requirements.txt

Configure API keys

You can choose to export essential api key to the environment

  • Example: Export essential api key to the environment
export SERPER_API_KEY=... # this is required in evidence retrieval if serper being used
export OPENAI_API_KEY=... # this is required in all tasks

Alternatively, you configure API keys via a YAML file, see user guide for more details.

A sample test case:

Usage

The main interface of Loki fact-checker located in factcheck/__init__.py, which contains the check_response method. This method integrates the complete fact verification pipeline, where each functionality is encapsulated in its class as described in the Features section.

Used as a Library

from factcheck import FactCheck

factcheck_instance = FactCheck()

# Example text
text = "Your text here"

# Run the fact-check pipeline
results = factcheck_instance.check_response(text)
print(results)

Used as a Web App

python webapp.py --api_config demo_data/api_config.yaml

Multimodal Usage

# String
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world"
# Text
python -m factcheck --modal text --input demo_data/text.txt
# Speech
python -m factcheck --modal speech --input demo_data/speech.mp3
# Image
python -m factcheck --modal image --input demo_data/image.webp
# Video
python -m factcheck --modal video --input demo_data/video.m4v

Customize Your Experience

For advanced usage, please see our user guide.

Ready for More?

💪 Join Our Journey to Innovation with the Supporter Edition

As we continue to evolve and enhance our fact-checking solution, we're excited to invite you to become an integral part of our journey. By registering for our Supporter Edition, you're not just unlocking a suite of advanced features and benefits; you're also fueling the future of trustworthy information.

Your support enables us to:

🚀 Innovate continuously: Develop new, cutting-edge features that keep you ahead in the fight against misinformation.

💡 Improve and refine: Enhance the user experience, making our app not just powerful, but also a joy to use.

🌱 Grow our community: Invest in the resources and tools our community needs to thrive and expand.

🎁 And as a token of our gratitude, registering now grants you complimentary token credits—a little thank you from us to you, for believing in our mission and supporting our growth!

Feature Open-Source Edition Supporter Edition
Trustworthy Verification Results
Diverse Evidence from the Open Web
Automated Correction of Misinformation
Privacy and Data Security
Multimodal Input
One-Stop Custom Solution
Customizable Verification Data Sources
Enhanced User Experience
Faster Efficiency and Higher Accuracy

TRY NOW!

Contributing to Loki project

Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our Contribution Guidelines.

Acknowledgments

  • Special thanks to all contributors who have helped in shaping this project.

Stay Connected and Informed

Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community.

💌 Subscribe now at our website!

Star History

Star History Chart

Cite as

@misc{Loki,
  author       = {Wang, Hao and Wang, Yuxia and Wang, Minghan and Geng, Yilin and Zhao, Zhen and Zhai, Zenan and Nakov, Preslav and Baldwin, Timothy and Han, Xudong and Li, Haonan},
  title        = {Loki: An Open-source Tool for Fact Verification},
  month        = {04},
  year         = {2024},
  publisher    = {Zenodo},
  version      = {v0.0.2},
  doi          = {10.5281/zenodo.11004461},
  url          = {https://zenodo.org/records/11004461}

openfactverification's People

Contributors

haonan-li avatar hanxudong avatar yilin-geng avatar dependabot[bot] avatar zmactep avatar kongzii avatar ruixing76 avatar zanebla avatar zenanz avatar

Stargazers

Nigel Mathes avatar  avatar Heechul Ryu avatar Ragnor Comerford avatar  avatar Ben Pyungwoo Yoo avatar Neko Ayaka avatar Mohamed Chorfa avatar  avatar  avatar Alex avatar Linchn avatar  avatar  avatar Martin Riedel avatar  avatar 诸葛蛋 avatar JVM avatar Morris avatar  avatar Bruno Amaral avatar Andreas Stuhlmüller avatar Gael Grosch avatar Shantanu Oak avatar 一叶知秋olka avatar cocoonk1d avatar ASTONE avatar Stefan Richthofer avatar Yassine Az avatar Jeff Lowe avatar trvr avatar Martin Seckar avatar  avatar  avatar Sabir avatar Matt Topper avatar Quinn Comendant avatar Michael Mior avatar Victor Chen avatar H Choroomi avatar Shantanu Gautam avatar Pyokyeong Son avatar Kirill Naumenko avatar Igor avatar Jason Choo avatar Jeff Carpenter avatar Khoi Nguyen Tinh Song avatar Caleb Faruki avatar  avatar  avatar  avatar snaegl avatar Mo Zhang avatar Emma Hyde avatar Alexander King avatar 佐毅 avatar  avatar Evan Aromin avatar Eamonn Power avatar Ji Hoon Bae avatar Carolin Lawrence avatar Juha Lehtiranta avatar  avatar Derick Rodriguez avatar Medhat Omr avatar Joel Keyser avatar allen.hu avatar Qcy avatar Sandeep Sharma avatar Saneef Ansari avatar Md. Rashidul Hasan avatar Thiago Cavalcante avatar Bran Sorem avatar Santiago Braida avatar  avatar  avatar Illia Ananich avatar wangenius avatar Evan Griffiths avatar MO_DEV avatar Ian Withers avatar  avatar Lukas Steinbrecher avatar Mike Ornstein avatar Zeh Fernandes avatar  avatar Flewid avatar Ruì Qiū avatar Behboud Kalantary avatar Dario Castañé avatar GOΠZO avatar Sunwoo Park avatar 林榮顯 avatar Paulo Basilio avatar Rifo Ahmad Genadi avatar Jason avatar  avatar  avatar Kristopher avatar Seo, Yeong-Hak avatar

Watchers

Ryan Orban avatar  avatar  avatar  avatar

openfactverification's Issues

Resource punkt not found. . with llama3

I tried the project with llama3 using:
poetry run python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --client local_openai --model llama3 --prompt factcheck/config/sample_prompt.yaml
but I end up with a lib not setup. not sure if it is expected?

[2024-04-20 10:42:20 - httpx:1026 - INFO] HTTP Request: POST http://linuxmain.local:4000/chat/completions "HTTP/1.1 200 OK"
[2024-04-20 10:42:20 - openai._base_client:986 - DEBUG] HTTP Request: POST http://linuxmain.local:4000/chat/completions "200 OK"
[ERROR]2024-04-20 10:42:20,511 Decompose.py:60: Parse LLM response error eval() arg 1 must be a string, bytes or code object, response is: None
[2024-04-20 10:42:20 - FactCheck:60 - ERROR] Parse LLM response error eval() arg 1 must be a string, bytes or code object, response is: None
[ERROR]2024-04-20 10:42:20,511 Decompose.py:61: Parse LLM response error, prompt is: [[{'role': 'system', 'content': 'You are a helpful assistant designed to output JSON.'}, {'role': 'user', 'content': 'Your task is to decompose the text into atomic claims.\nThe answer should be a JSON with a single key "claims", with the value of a list of strings, where each string should be a context-independent claim, representing one fact.\nNote that:\n1. Each claim should be concise (less than 15 words) and self-contained.\n2. Avoid vague references like \'he\', \'she\', \'it\', \'this\', \'the company\', \'the man\' and using complete names.\n3. Generate at least one claim for each single sentence in the texts.\n\nFor example,\nText: Mary is a five-year old girl, she likes playing piano and she doesn\'t like cookies.\nOutput:\n{"claims": ["Mary is a five-year old girl.", "Mary likes playing piano.", "Mary doesn\'t like cookies."]}\n\nText: MBZUAI is the first AI university in the world\nOutput:'}]]
[2024-04-20 10:42:20 - FactCheck:61 - ERROR] Parse LLM response error, prompt is: [[{'role': 'system', 'content': 'You are a helpful assistant designed to output JSON.'}, {'role': 'user', 'content': 'Your task is to decompose the text into atomic claims.\nThe answer should be a JSON with a single key "claims", with the value of a list of strings, where each string should be a context-independent claim, representing one fact.\nNote that:\n1. Each claim should be concise (less than 15 words) and self-contained.\n2. Avoid vague references like \'he\', \'she\', \'it\', \'this\', \'the company\', \'the man\' and using complete names.\n3. Generate at least one claim for each single sentence in the texts.\n\nFor example,\nText: Mary is a five-year old girl, she likes playing piano and she doesn\'t like cookies.\nOutput:\n{"claims": ["Mary is a five-year old girl.", "Mary likes playing piano.", "Mary doesn\'t like cookies."]}\n\nText: MBZUAI is the first AI university in the world\nOutput:'}]]
[INFO]2024-04-20 10:42:20,511 Decompose.py:63: It does not output a list of sentences correctly, return self.doc2sent_tool split results.
[2024-04-20 10:42:20 - FactCheck:63 - INFO] It does not output a list of sentences correctly, return self.doc2sent_tool split results.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 45, in <module>
    check(args)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 30, in check
    res = factcheck.check_response(content)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__init__.py", line 76, in check_response
    claims = self.decomposer.getclaims(doc=response)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Decompose.py", line 64, in getclaims
    claims = self.doc2sent(doc)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Decompose.py", line 29, in _nltk_doc2sent
    sentences = nltk.sent_tokenize(text)
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/home/shuther/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/share/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

After whole process of fact checking I am getting None

File "...\LibrAi\OpenFactVerification\factcheck\core\Retriever\SerperEvidenceRetrieve.py", line 77, in _retrieve_evidence_4_all_claim
if query != result.get("searchParameters").get("q"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

Is this just happening with me ?

AttributeError error

not sure about the origin of the problem.
with the model mistral is worked but it didn't go that far.

poetry run python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --client local_openai --model wizardlm2 --prompt factcheck/config/sample_prompt.yaml
== Init decompose_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init checkworthy_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init query_generator_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init evidence_retrieval_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init claim_verify_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
[INFO]2024-04-17 19:29:27,895 __init__.py:67: ===Sub-modules Init Finished===
[INFO]2024-04-17 19:29:27,895 multimodal.py:89: == Processing: Modal: string, Input: MBZUAI is the first AI university in the world
[INFO]2024-04-17 19:29:27,895 multimodal.py:103: == Processed: Modal: string, Input: MBZUAI is the first AI university in the world
[INFO]2024-04-17 19:31:01,724 __init__.py:78: == response claims 0: MBZUAI was established as the first artificial intelligence university globally.
[INFO]2024-04-17 19:31:01,724 __init__.py:78: == response claims 1: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.
[INFO]2024-04-17 19:33:42,643 __init__.py:86: == Check-worthy claims 0: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.
[INFO]2024-04-17 19:35:40,188 __init__.py:117: == Claim: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded. --- Queries: ['The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.', 'Who founded the Mohammed Bin Zayed University of Artificial Intelligence?', 'When was the Mohammed Bin Zayed University of Artificial Intelligence established?']
[INFO]2024-04-17 19:35:40,188 SerperEvidenceRetrieve.py:30: Collecting evidences ...
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 45, in <module>
    check(args)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 30, in check
    res = factcheck.check_response(content)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__init__.py", line 122, in check_response
    claim_evidence_dict = self.evidence_crawler.retrieve_evidence(claim_query_dict=claim_query_dict)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Retriever/SerperEvidenceRetrieve.py", line 33, in retrieve_evidence
    evidence_list = self._retrieve_evidence_4_all_claim(
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Retriever/SerperEvidenceRetrieve.py", line 72, in _retrieve_evidence_4_all_claim
    if query != result.get("searchParameters").get("q"):
AttributeError: 'NoneType' object has no attribute 'get'

Allow openai compatible endoints

Would it be possible to make the openai base url an environment variable (as well as the model name) so we could experiment with self hosted llm?

Add token count to return of FactCheck().check_response

It would be nice to be able to track tokens used by a fact check call using the python api, e.g. something like:

from factcheck import FactCheck
results = FactCheck().check_response("The sky is green")
print(results["token_count"])

which would contain prompt and completion token info, like

{'num_raw_tokens': 4, 'num_checkworthy_tokens': 5, 'total_prompt_tokens': 1748, 'total_completion_tokens': 231}

Hacky example implementation for openai client only here: #11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.