The detecting-incongruity from jimp93

detecting-incongruity

This repository contains the source code & data corpus used in the following paper,

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder, AAAI-19, paper

Requirements

  tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
  python==2.7
  scikit-learn==0.20.0
  nltk==3.3

Download Dataset

download preprocessed dataset with the following script

cd data
sh download_processed_dataset_aaai-19.sh
the downloaded dataset will be placed into the following path of the project

/data/aaai-19/para
/data/aaai-19/whole
format (example)

test_title.npy: [100000, 49] - (#samples, #token (index))
test_body: [100000, 1200] - (#samples, #token (index))
test_label: [100000] - (#samples)
dic_mincutN.txt: dictionary

Source Code

according to the training method

whole-type: using the codes in the ./src_whole
para-type: using the codes in the ./src_para

Training Phase

each source code folder contains a reference script for training the model

train_reference_scripts.sh
<< for example >>
train dataset with AHDE model and "whole" method

python AHDE_Model.py --batch_size 256 --encoder_size 80 --context_size 10 --encoderR_size 49 --num_layer 1 --hidden_dim 300  --num_layer_con 1 --hidden_dim_con 300 --embed_size 300 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'aaai-19_whole' --data_path '../data/target_aaai-19_whole/'

Results will be displayed in the console
The final test result will be stored in "./TEST_run_result.txt"

※ hyper parameters

major parameters: edit from the training script
other parameters: edit from "./params.py"

Inference Phase

each source code folder contains an inference script
you need to modify the "model_path" in the "eval_AHDE.sh" to a proper path

<< for example >>
evaluate test dataset with AHDE model and "whole" method

	src_whole$ sh eval_AHDE.sh

Results will be displayed in the console
scores for the testset will be stored in "./output.txt"

Dataset Statistics

whole case

data Samples tokens (avg)
headline tokens (avg)
body text

train 1,700,000 13.71 499.81

dev 100,000 13.69 499.03

test 100,000 13.55 769.23
Note

We crawled articles for "dev" and "test" dataset from different media outlets.

data	Samples	tokens (avg) headline	tokens (avg) body text
train	1,700,000	13.71	499.81
dev	100,000	13.69	499.03
test	100,000	13.55	769.23

Newly introduced dataset (English version)

We create an English version of the dataset, nela-17, using NELA 2017 data. Please refer to the dataset repository [link].
If you want to run our model (AHDE) with the nela-17 data, you can use the preprocessed dataset that is compatible with our code.

cd data
sh download_processed_dataset_nela-17.sh
training script (refer to the "train_reference_scripts.sh")

python AHDE_Model.py --batch_size 64 --encoder_size 200 --context_size 50 --encoderR_size 25 --num_layer 1 --hidden_dim 100  --num_layer_con 1 --hidden_dim_con 100 --embed_size 300 --use_glove 1 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'nela-17_whole' --data_path '../data/target_nela-17_whole/'

Other implementation (pytorch version)

Pytorch implementation [link] by M. Lee
compatible with the preprocessed dataset

cite

Please cite our paper, when you use our code | dataset | model

@inproceedings{yoon2019detecting,
title={Detecting Incongruity between News Headline and Body Text via a Deep Hierarchical Encoder},
author={Yoon, Seunghyun and Park, Kunwoo and Shin, Joongbo and Lim, Hongjun and Won, Seungpil and Cha, Meeyoung and Jung, Kyomin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={791--800},
year={2019}
}

jimp93 / detecting-incongruity Goto Github PK

detecting-incongruity's Introduction

detecting-incongruity

Requirements

Download Dataset

Source Code

Training Phase

Inference Phase

Dataset Statistics

Newly introduced dataset (English version)

Other implementation (pytorch version)

cite

detecting-incongruity's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs