libr-ai / openfactverification Goto Github PK

View Code? Open in Web Editor NEW

892.0 4.0 38.0 40.74 MB

Loki: Open-source solution designed to automate the process of verifying factuality

Home Page: https://loki.librai.tech/

License: MIT License

Python 91.64% HTML 8.36%

ai factuality hallucination

openfactverification's Introduction

Loki: An Open-source Tool for Fact Verification

Overview

Loki is our open-source solution designed to automate the process of verifying factuality. It provides a comprehensive pipeline for dissecting long texts into individual claims, assessing their worthiness for verification, generating queries for evidence search, crawling for evidence, and ultimately verifying the claims. This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information. To stay updated, please subscribe to our newsletter at our website or join us on Discord!

Quick Start

Clone the repository and navigate to the project directory

git clone https://github.com/Libr-AI/OpenFactVerification.git
cd OpenFactVerification

Installation with poetry (option 1)

Install Poetry by following it installation guideline.
Install all dependencies by running:

poetry install

Installation with pip (option 2)

Create a Python environment at version 3.9 or newer and activate it.
Navigate to the project directory and install the required packages:

pip install -r requirements.txt

Configure API keys

You can choose to export essential api key to the environment

Example: Export essential api key to the environment

export SERPER_API_KEY=... # this is required in evidence retrieval if serper being used
export OPENAI_API_KEY=... # this is required in all tasks

Alternatively, you configure API keys via a YAML file, see user guide for more details.

A sample test case:

Usage

The main interface of Loki fact-checker located in factcheck/__init__.py, which contains the check_response method. This method integrates the complete fact verification pipeline, where each functionality is encapsulated in its class as described in the Features section.

Used as a Library

from factcheck import FactCheck

factcheck_instance = FactCheck()

# Example text
text = "Your text here"

# Run the fact-check pipeline
results = factcheck_instance.check_response(text)
print(results)

Used as a Web App

python webapp.py --api_config demo_data/api_config.yaml

Multimodal Usage

# String
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world"
# Text
python -m factcheck --modal text --input demo_data/text.txt
# Speech
python -m factcheck --modal speech --input demo_data/speech.mp3
# Image
python -m factcheck --modal image --input demo_data/image.webp
# Video
python -m factcheck --modal video --input demo_data/video.m4v

Customize Your Experience

For advanced usage, please see our user guide.

Ready for More?

💪 Join Our Journey to Innovation with the Supporter Edition

As we continue to evolve and enhance our fact-checking solution, we're excited to invite you to become an integral part of our journey. By registering for our Supporter Edition, you're not just unlocking a suite of advanced features and benefits; you're also fueling the future of trustworthy information.

Your support enables us to:

🚀 Innovate continuously: Develop new, cutting-edge features that keep you ahead in the fight against misinformation.

💡 Improve and refine: Enhance the user experience, making our app not just powerful, but also a joy to use.

🌱 Grow our community: Invest in the resources and tools our community needs to thrive and expand.

🎁 And as a token of our gratitude, registering now grants you complimentary token credits—a little thank you from us to you, for believing in our mission and supporting our growth!

Feature	Open-Source Edition	Supporter Edition
Trustworthy Verification Results	✅	✅
Diverse Evidence from the Open Web	✅	✅
Automated Correction of Misinformation	✅	✅
Privacy and Data Security	✅	✅
Multimodal Input	✅	✅
One-Stop Custom Solution	❌	✅
Customizable Verification Data Sources	❌	✅
Enhanced User Experience	❌	✅
Faster Efficiency and Higher Accuracy	❌	✅

TRY NOW!

Contributing to Loki project

Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our Contribution Guidelines.

Acknowledgments

Special thanks to all contributors who have helped in shaping this project.

Stay Connected and Informed

Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community.

💌 Subscribe now at our website!

Star History

Cite as

@misc{Loki,
  author       = {Wang, Hao and Wang, Yuxia and Wang, Minghan and Geng, Yilin and Zhao, Zhen and Zhai, Zenan and Nakov, Preslav and Baldwin, Timothy and Han, Xudong and Li, Haonan},
  title        = {Loki: An Open-source Tool for Fact Verification},
  month        = {04},
  year         = {2024},
  publisher    = {Zenodo},
  version      = {v0.0.2},
  doi          = {10.5281/zenodo.11004461},
  url          = {https://zenodo.org/records/11004461}

openfactverification's People

Contributors

Stargazers

Watchers

openfactverification's Issues

Resource punkt not found. . with llama3

I tried the project with llama3 using:
poetry run python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --client local_openai --model llama3 --prompt factcheck/config/sample_prompt.yaml
but I end up with a lib not setup. not sure if it is expected?

[2024-04-20 10:42:20 - httpx:1026 - INFO] HTTP Request: POST http://linuxmain.local:4000/chat/completions "HTTP/1.1 200 OK"
[2024-04-20 10:42:20 - openai._base_client:986 - DEBUG] HTTP Request: POST http://linuxmain.local:4000/chat/completions "200 OK"
[ERROR]2024-04-20 10:42:20,511 Decompose.py:60: Parse LLM response error eval() arg 1 must be a string, bytes or code object, response is: None
[2024-04-20 10:42:20 - FactCheck:60 - ERROR] Parse LLM response error eval() arg 1 must be a string, bytes or code object, response is: None
[ERROR]2024-04-20 10:42:20,511 Decompose.py:61: Parse LLM response error, prompt is: [[{'role': 'system', 'content': 'You are a helpful assistant designed to output JSON.'}, {'role': 'user', 'content': 'Your task is to decompose the text into atomic claims.\nThe answer should be a JSON with a single key "claims", with the value of a list of strings, where each string should be a context-independent claim, representing one fact.\nNote that:\n1. Each claim should be concise (less than 15 words) and self-contained.\n2. Avoid vague references like \'he\', \'she\', \'it\', \'this\', \'the company\', \'the man\' and using complete names.\n3. Generate at least one claim for each single sentence in the texts.\n\nFor example,\nText: Mary is a five-year old girl, she likes playing piano and she doesn\'t like cookies.\nOutput:\n{"claims": ["Mary is a five-year old girl.", "Mary likes playing piano.", "Mary doesn\'t like cookies."]}\n\nText: MBZUAI is the first AI university in the world\nOutput:'}]]
[2024-04-20 10:42:20 - FactCheck:61 - ERROR] Parse LLM response error, prompt is: [[{'role': 'system', 'content': 'You are a helpful assistant designed to output JSON.'}, {'role': 'user', 'content': 'Your task is to decompose the text into atomic claims.\nThe answer should be a JSON with a single key "claims", with the value of a list of strings, where each string should be a context-independent claim, representing one fact.\nNote that:\n1. Each claim should be concise (less than 15 words) and self-contained.\n2. Avoid vague references like \'he\', \'she\', \'it\', \'this\', \'the company\', \'the man\' and using complete names.\n3. Generate at least one claim for each single sentence in the texts.\n\nFor example,\nText: Mary is a five-year old girl, she likes playing piano and she doesn\'t like cookies.\nOutput:\n{"claims": ["Mary is a five-year old girl.", "Mary likes playing piano.", "Mary doesn\'t like cookies."]}\n\nText: MBZUAI is the first AI university in the world\nOutput:'}]]
[INFO]2024-04-20 10:42:20,511 Decompose.py:63: It does not output a list of sentences correctly, return self.doc2sent_tool split results.
[2024-04-20 10:42:20 - FactCheck:63 - INFO] It does not output a list of sentences correctly, return self.doc2sent_tool split results.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 45, in <module>
    check(args)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 30, in check
    res = factcheck.check_response(content)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__init__.py", line 76, in check_response
    claims = self.decomposer.getclaims(doc=response)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Decompose.py", line 64, in getclaims
    claims = self.doc2sent(doc)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Decompose.py", line 29, in _nltk_doc2sent
    sentences = nltk.sent_tokenize(text)
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/python3.10/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/home/shuther/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/share/nltk_data'
    - '/home/shuther/.cache/pypoetry/virtualenvs/openfactverification-3iFqEQnw-py3.10/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

After whole process of fact checking I am getting None

File "...\LibrAi\OpenFactVerification\factcheck\core\Retriever\SerperEvidenceRetrieve.py", line 77, in _retrieve_evidence_4_all_claim
if query != result.get("searchParameters").get("q"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

Is this just happening with me ?

AttributeError error

not sure about the origin of the problem.
with the model mistral is worked but it didn't go that far.

poetry run python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --client local_openai --model wizardlm2 --prompt factcheck/config/sample_prompt.yaml
== Init decompose_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init checkworthy_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init query_generator_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init evidence_retrieval_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
== Init claim_verify_model with model: wizardlm2
[INFO]2024-04-17 19:29:27,894 __init__.py:53: == Use specified client: local_openai
[INFO]2024-04-17 19:29:27,895 __init__.py:67: ===Sub-modules Init Finished===
[INFO]2024-04-17 19:29:27,895 multimodal.py:89: == Processing: Modal: string, Input: MBZUAI is the first AI university in the world
[INFO]2024-04-17 19:29:27,895 multimodal.py:103: == Processed: Modal: string, Input: MBZUAI is the first AI university in the world
[INFO]2024-04-17 19:31:01,724 __init__.py:78: == response claims 0: MBZUAI was established as the first artificial intelligence university globally.
[INFO]2024-04-17 19:31:01,724 __init__.py:78: == response claims 1: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.
[INFO]2024-04-17 19:33:42,643 __init__.py:86: == Check-worthy claims 0: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.
[INFO]2024-04-17 19:35:40,188 __init__.py:117: == Claim: The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded. --- Queries: ['The Mohammed Bin Zayed University of Artificial Intelligence (MBZUAI) was founded.', 'Who founded the Mohammed Bin Zayed University of Artificial Intelligence?', 'When was the Mohammed Bin Zayed University of Artificial Intelligence established?']
[INFO]2024-04-17 19:35:40,188 SerperEvidenceRetrieve.py:30: Collecting evidences ...
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 45, in <module>
    check(args)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__main__.py", line 30, in check
    res = factcheck.check_response(content)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/__init__.py", line 122, in check_response
    claim_evidence_dict = self.evidence_crawler.retrieve_evidence(claim_query_dict=claim_query_dict)
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Retriever/SerperEvidenceRetrieve.py", line 33, in retrieve_evidence
    evidence_list = self._retrieve_evidence_4_all_claim(
  File "/home/shuther/Documents/Projects/OpenFactVerification/factcheck/core/Retriever/SerperEvidenceRetrieve.py", line 72, in _retrieve_evidence_4_all_claim
    if query != result.get("searchParameters").get("q"):
AttributeError: 'NoneType' object has no attribute 'get'

from factcheck import FactCheck
results = FactCheck().check_response("The sky is green")
print(results["token_count"])

which would contain prompt and completion token info, like

{'num_raw_tokens': 4, 'num_checkworthy_tokens': 5, 'total_prompt_tokens': 1748, 'total_completion_tokens': 231}

Hacky example implementation for openai client only here: #11