GithubHelp home page GithubHelp logo

fact-checking's Introduction

✅ Fact-checking ⁉️

This repository contains a project realized for an assignment of the Natural Language Processing course of the Master's degree in Artificial Intelligence, University of Bologna.

Description

Fact checking is a popular NLP task which consists in verify the reliability of some statement, by comparing it with a given knowledge base. In this experiment, we’ll use the FEVER dataset to train a model able to understand whether a fact is verifiable. The model consists in a neural network, structured in different ways, working on embeddings obtained with GloVE. A voting mechanism is then required to make predictions.

Dataset

The FEVER dataset is about facts taken from Wikipedia documents that have to be verified. In particular, facts could face manual modifications in order to define fake information or to give different formulations of the same concept.

The dataset consists of 185,445 claims manually verified against the introductory sections of Wikipedia pages and classified as Supported, Refuted or NotEnoughInfo. For the first two classes, systems and annotators need to also return the combination of sentences forming the necessary evidence supporting or refuting the claim.

An already pre-precessed version of the dataset is been used, in order to concentrate on the classification pipeline (pre-processing, model definition, evaluation and training).

Request and solution proposed

The task to comply with is described in the assignment description. In order to have a better understanging of our proposed solution, take a look to the notebook and the report.

Model

What the neural network does is to encode two different inputs (claim and evidence), merge them in some way, and output a single value, representing the probability that the claim is correct. The following simplified schema shows the model architecture:

Two different models are been trained:

  • a base one that use the last state of a LSTM layer for the sentence embedding and combine the two sentences through an Add layer
  • the other one is just an extension using the same configuration of the previous and adding the cosine similarity between claim and evidence to the input of the network.

The trained models are available at the following link.

Results

Two evaluation strategies have been used: multi-input evaluation (standard approach in classification) and claim evaluation (majority voting). As seen in the table below, performances were better for the extended model.

Dataset split Accuracy base model Accuracy extension model
Validation set 0.7609 0.7611
Test set (normal) 0.735 0.743
Test set (majority vote) 0.710 0.718

This is an example of the prediction made by the extended model on a test set of pairs claim-evidence:

CLAIM:  Scream has some level of success.
EVIDENCES:
	1. (SUPPORTS) The first series entry , Scream , was released on December 20 , 1996 and is currently the highest-grossing slasher film in the United States 
	2. (SUPPORTS) It received several awards and award nominations 
	3. (SUPPORTS) The film went on to financial and critical acclaim , earning $ 173 million worldwide , and became the highest-grossing slasher film in the US in unadjusted dollars 
PREDICTION:
	1. SUPPORTED (Confidence 98%)
	2. SUPPORTED (Confidence 99%)
	3. SUPPORTED (Confidence 98%)

Resources & Libraries

  • NLTK
  • Tensorflow + Keras

Versioning

We use Git for versioning.

Group members

Reg No. Name Surname Email Username
1005271 Giuseppe Boezio [email protected] giuseppeboezio
983806 Simone Montali [email protected] montali
997317 Giuseppe Murro [email protected] gmurro

License

This project is licensed under the MIT License - see the LICENSE file for details

fact-checking's People

Contributors

giuseppeboezio avatar gmurro avatar montali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.