GithubHelp home page GithubHelp logo

miguelglopes / clickbaitdetection Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.43 MB

Click Bait Detection Algorithm

Python 0.39% Jupyter Notebook 99.61%
deep-learning deep-neural-networks clickbait-detection

clickbaitdetection's Introduction

Detecting clickbait in Tweets

This project aims to identify whether a news post in Twitter is a clickbait or not. Clickbait is a form of false advertisement which uses text and/or images designed to attract attention and entice users to follow the link and read, view, or listen to the linked piece of online content, with a defining characteristic of being deceptive, typically sensationalized or misleading. Many papers have been written about clickbait detection techniques, but many as well fail in integrating the image as an input to detect clickbaits in the posts. Vaibhaw et al (2018) have created a model that not only incorporates textual features, modeled using BiLSTM and augmented with an attention mechanism, but also considers related images for clickbait detection. This article sets itself to improve this model. To do so, we have applied a bidirectional LSTM with an attention mechanism to understand the effect a word has on classifying a post as a clickbait or not; a Siamese net to capture the similarity between the text on the post and the information on the article it refers to; and a CNN-model for the images. In the end, the concatenation of the three models will serve as input to a fully connected layer. This model achieved a RSME of 0.118 and an accuracy of 83.2%.

The following should be the folder structure of the project:

App root 
    └──	model.ipynb
    └──  preprocessing.ipynb
    └──  visualization.ipynb
    │	
    └──  data
    │	   └── instances.jsonl
    │	   └── truth.jsonl
    |      └── media
    │	         └── (image files)
    │
    └──  binaries
    │	   └── GoogleNews-vectors-negative300.bin
    │      └── (other relevant binaries: .pkl or .h5)
    |
    └──  libraries
        └── attention.py

Our source-code consists, essentially, of 3 files:

  • preprocessing.ipynb
  • model.ipynb
  • visualization.ipynb

The preprocessing should be run first. It will output the needed pickles to the binaries folder. These pickles will imported for the model input and the visualization input.

After running the preprocessing, one can run either the model or the visualization notebook.

The visualization notebook contains charts that we considered relevant to explore when designing the model architecture.

The model contains all the needed code to create, train, evaluate and optimize our clickbait prediction model.

Regarding the local libraries, we had to create a custom Keras attention layer, inspired on the work of Yang et al.. For that, we created a class AttentionLayer that inherits from keras.layers.Layer. This was needed since keras doesn't have a built in attention layer. This local module is imported in the model notebook.

To get original data we used in this work, use this link (big dataset) or this link (small dataset). To download the GoogleNews pre-trained model use this link.

clickbaitdetection's People

Contributors

miguelglopes avatar

Watchers

 avatar

clickbaitdetection's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.