GithubHelp home page GithubHelp logo

dongqifu / disco Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 0.0 1.5 MB

DISCO: Comprehensive and Explainable Disinformation Detection, CIKM 2022

Python 100.00%
text-classification machine-learning nlp

disco's Introduction

DISCO: Comprehensive and Explainable Disinformation Detection

"DISCO" is a disinformation detection toolkit. An online demo video is available here, a preprint paper is available here.

1. Function of DISCO

  • Input: A batch of susceptive information

  • Output:

    • The fake news probability and real news probability for an news article query
    • Misleading degree rankings of each word of in that query article

2. Required Library

  • numpy 1.20.1
  • scipy 1.6.2
  • pandas 1.2.4
  • nltk 3.6.2
  • gensim 4.0.1
  • sklearn 0.24.1

3. Quick Start

  • Download the code
  • Download pre-trained word2vec model (here or here) and put it in the "pretrained-word2vec" folder
  • Run the "gui_disco.py" to get the software as shown in the demo video
  • [Optional]: You can train DISCO from the scratch as below
    • First, you can put raw fake news data and raw real news data in "raw-dataset" folder and run "data_preprocessing.py". Then feature_matrix.pkl and label_matrix.pkl will be automatically saved in the "preprocessed-dataset" folder.
    • Then, you can run "model_training.py" to obtain the inner classifier of DISCO, the inner classifier of DISCO will be automatically saved in the "trained-classifier" folder.
    • Now, you get the complete DISCO and could run "gui_disco.py" to get the software as shown in the demo video.

4. Technical Logic of DISCO

  • Building Word Graph. We contrust an undirected word graph for each input news article. Briefly, if two words co-occur in a length-specified sliding window, then there will be an edge connecting these two words. For example, "I eat an apple" and the length of the window is 3, then edges could be {I-eat, I-an, eat-an, eat-apple, an-apple} (with stop words kept). More details of constructing a word graph can be found at TextRank.
  • Geometric Feature Extraction. We use the idea of the SDG to obtain node embeddings. Briefy, a node's representation is aggregated based on its personalized PageRank vector weighted neighours' features. Then we call any pooling function (like sum pooling or mean pooling) to aggregate node embeddings into the graph-level representation vector for each constructed word graph.
  • Neural Detection. We train a model-agnostic classification module as the inner classifier of DISCO.
  • Misleading Degree Analysis. With the support of SDG, we can mask any word node in the contrusted word graph and fast track the new Personalized PageRank to get the new graph-level embedding vector. Without fine-tuning the inner classifier of DISCO, we can investigate each word's contribution (positive or negative) towards the ground-truth label prediction probability.
  • [Optional]: You can access our additional repository for a more thorough disinformation study, such as different inner classifiers, truncated feature dimensions, label noise injection, etc.

Reference

If you use the materials from this repositiory, please refer to our paper.

@inproceedings{DBLP:conf/cikm/FuBTMH22,
  author    = {Dongqi Fu and
               Yikun Ban and
               Hanghang Tong and
               Ross Maciejewski and
               Jingrui He},
  editor    = {Mohammad Al Hasan and
               Li Xiong},
  title     = {{DISCO:} Comprehensive and Explainable Disinformation Detection},
  booktitle = {Proceedings of the 31st {ACM} International Conference on Information
               {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
  pages     = {4848--4852},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3511808.3557202},
  doi       = {10.1145/3511808.3557202},
  timestamp = {Wed, 19 Oct 2022 17:09:02 +0200},
  biburl    = {https://dblp.org/rec/conf/cikm/FuBTMH22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

disco's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.