GithubHelp home page GithubHelp logo

ubastic / aard Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yunzhusong/aard

0.0 0.0 0.0 148 KB

Codes for the paper: Adversary-Aware Rumor Detection

Shell 0.84% Python 97.94% CSS 0.06% HTML 1.16%

aard's Introduction

Adversary-Aware Rumor Detection (AARD) ACL 2021 Finding

[Paper][Dataset]

Note:

  1. The file TD_RvNN.vol_5000.txt in the Dataset is obtained from https://github.com/majingCUHK/Rumor_RvNN.

If you use any source code or dataset included in this repo, please cite this paper:

@inproceedings{song2021adversary,
  title={Adversary-Aware Rumor Detection},
  author={Song, Yun-Zhu and Chen, Yi-Syuan and Chang, Yi-Ting and Weng, Shao-Yu and Shuai, Hong-Han},
  booktitle={ACL-IJCNLP: Findings},
  year={2021}
}

Introdiction

Many rumor detection models have been proposed to automatically detect the rumors based on the contents and propagation path. However, most previous works are not aware of malicious attacks, e.g., framing. Therefore, we propose a novel rumor detection framework, Adversary-Aware Rumor Detection, to improve the vulnerability of detection models, including Weighted-Edge Transformer-Graph Network and Position-aware Adversarial Response Generator. To the best of our knowledge, this is the first work that can generate the adversarial response with the consideration of the response position. Even without the adversarial learning process, our detection model (Weighted-Edge Transformer-Graph Network) is also a strong baseline for rumor detection task on Twitter15, Twitter16 and Pheme.

Performance

Getting started

Requirements

Detailed env is included in requirement.txt

Dataset and Model Preparation

  1. We collect the user comments following Twitter's policy, and the processed dataset is available here. The dataset should be placed in ./dataset/
  2. To train the generator, we need the pretrained model, which can be downloaded here). The pretrained generation model should be placed in ./results/pretrain/

The data preprocessing is followed BiGAN. The raw datasets except the comments can be downloaded in raw_pheme provided by Zubiagaet al., 2016 and raw_twitter15_twitter16 provided by Ma et al., 2017.

Code structure

|_src\
      |_run.sh  -> script to run the code
      |_main.py
      |_models\
            |_trainer_gen.py    -> warpping different experiments
            |_trainer.py        -> model trainer
            |_model.py          -> main class for AARD model
            |_model_detector.py -> for supporting model.py
            |_model_decoder.py  -> for supporting model.py
            |_predictor.py      -> for decoding form generator
      |_data\   -> for spliting 5-fold and building datagraph
      |_eval\   -> define evaluation metric (Recall, Precision and F-score of each class)
      |_others\ -> define loss, logging info
|_dataset\
      |_Pheme\
      |_Phemetextgraph\     -> can be automatically generated data/getgraph.py
      |_twitter15\
      |_twitter15textgraph\ -> can be automatically generated data/getgraph.py
      |_twitter16\
      |_twitter16textgraph\ -> can be automatically generated data/getgraph.py
|_results\
      |_pretrain\
            |_XSUM_BertExtAbs\

How to run the code

Three-stage training

python main.py \
  -train_detector \
  -train_adv \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/Pheme' \
  -savepath '../results/Pheme' \
  -batch_size 48 \
  -filter True \
  -log_tensorboard \
  -warmup_steps 100 \

Only train detector

python main.py \
  -train_detector \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/Pheme' \
  -savepath '../results/Pheme' \
  -filter True \
  -batch_size 48 \
  -log_tensorboard \
  -warmup_steps 100 \

Evaluate detector

python main.py \
  -test_detector \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/Pheme' \
  -savepath '../results/Pheme' \
  -filter True \
  -batch_size 48 \
  -log_tensorboard \
  -warmup_steps 100 \

Evaluate detector under adversarial attack

python main.py \
  -test_detector \
  -test_gen \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/Pheme' \
  -savepath '../results/Pheme' \
  -filter True \
  -batch_size 48 \
  -log_tensorboard \
  -warmup_steps 100 \

Other experiments in paper

Early rumor detection (only testing)

Run the model testing under different data time stamp.

python main.py \
  -early '0,6,12,18,24,30,36,42,48,54,60,120' \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/twitter15' \
  -savepath '../results/twitter15/early_detection' \
  -filter True \
  -batch_size 48 \
python main.py \
  -early '0,6,12,18,24,30,36,42,48,54,60,120' \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/twitter16' \
  -savepath '../results/twitter16/early_detection' \
  -filter True \
  -batch_size 48 \
python main.py \
 -early '0,60,120,240,480,720,1440,2880' \
 -fold '0,1,2,3,4' \
 -dataset_dir '../dataset/Pheme' \
 -savepath '../results/Pheme/early_detection' \
 -filter True \
 -batch_size 48 \

Data scarcity experiment (require training)

Train the models under different quantities of data, ranging from 5% to100%, and evaluate them on the same testing set.

python main.py \
  -train_detector \
  -quat '5,10,25,50,75,100' \
  -fold '0,1,2,3,4' \
  -dataset_dir '../dataset/Pheme' \
  -savepath '../results/pheme/data_scarcity' \
  -filter True \
  -batch_size 48 \
  -log_tensorboard \

aard's People

Contributors

yunzhusong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.