GithubHelp home page GithubHelp logo

master-thesis's Introduction

Master's thesis

Trabajo de Fin de Master (Master's thesis) Fake News Detection

(English description below the spanish's)

Se añadirá el código que se va a desarrollar para el TFM. La base de datos está disponible en Zenodo: Proppy Corpus 1.0

La versión de Python utilizada ha sido Python 3.8.2. El sistema operativo utilizado ha sido Ubuntu 20.04 LTS. El framework que se ha utilizado es TensorFlow, en concreto la versión 2.2.0.

Se ha desarollado un modelo base usando una BiLSTM y usando los embeddings de Glove/FastText, obteniendo mejor resultado con FastText que con Glove.

También se han desarrollado modelos de atención, uno basado en la atención que utilizan los transformers (attention_model.py) y otro que utiliza tanto atención local como atención global (mean_model.py).

Se ha desarrollado un modelo basado en transformers (modeltransformer.py), en el que sólo se ha utilizado en encoder del transformer, pero no se ha llegado a profundidar en su uso y por tanto no se ha preparado un buen modelo, sí bien se tiene una base para el futuro.

Se ha decidido utilizar el modelo de BERT para realizar fine-tuning en la predicción de FakeNews. Además puede servir de base para utilizar Albert u otros modelos pre-entrenados sin tener que realizar muchos cambios. El fichero de este modelo es bertmodel.py

Para poder ejecutar este ćodigo es necesario disponer de los embeddings de Glove y FastText. Una vez se tengan los archivos, se debe actualizar el path del fichero 'embeddings.py'.

Master's thesis

This master's thesis is about resolving a problem about Fake News detection. The problem is called Propaganda Detection, and the news aren't 100% fake but they try to convince readers to think alike. To reach this goal, the authors of this news use misinformation if needed to get more people by their side. The dataset I'm working with have news from the 2016 U.S. elections, and the data is divided into propaganda/non-propaganda. The data is available in Zenodo: Proppy Corpus 1.0.

To solve this problem I've developed a base model using a BiLSTM and using two type of embeddings Glove and FastText, getting better results with FastText than with Glove.

To improve the BiLSTM, I added some attention mechanism, local, global and self attention. The best model is the model that uses local attention (mean_model.py). Also I've built a transformer model (attention_model.py), but it didn't work well for this problem.

Also I did transfer learning, fine-tuning BERT and to use it for the Propaganda Detection problem. The code for BERT is at bertmodel.py. Also this could work as a base architecture to use other pre-trained models based on BERT, like DistilBERT, Albert, etc.

To execute the code it's necessary to have the Glove/FastText downloaded, and put the correct path in the embeddings file (embeddings.py).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.