GithubHelp home page GithubHelp logo

jimp93 / detecting-incongruity Goto Github PK

View Code? Open in Web Editor NEW

This project forked from david-yoon/detecting-incongruity

0.0 0.0 0.0 256 KB

TensorFlow implementation of "Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder," AAAI-19

Home Page: https://wvvw.aaai.org/ojs/index.php/AAAI/article/view/3756

License: MIT License

Shell 1.68% Python 22.86% Jupyter Notebook 75.46%

detecting-incongruity's Introduction

detecting-incongruity

This repository contains the source code & data corpus used in the following paper,

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder, AAAI-19, paper

Requirements

  tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
  python==2.7
  scikit-learn==0.20.0
  nltk==3.3

Download Dataset

  • download preprocessed dataset with the following script

    cd data
    sh download_processed_dataset_aaai-19.sh

  • the downloaded dataset will be placed into the following path of the project

    /data/aaai-19/para
    /data/aaai-19/whole

  • format (example)

    test_title.npy: [100000, 49] - (#samples, #token (index))
    test_body: [100000, 1200] - (#samples, #token (index))
    test_label: [100000] - (#samples)
    dic_mincutN.txt: dictionary

Source Code

  • according to the training method

    whole-type: using the codes in the ./src_whole
    para-type: using the codes in the ./src_para

Training Phase

  • each source code folder contains a reference script for training the model

    train_reference_scripts.sh
    << for example >>
    train dataset with AHDE model and "whole" method

python AHDE_Model.py --batch_size 256 --encoder_size 80 --context_size 10 --encoderR_size 49 --num_layer 1 --hidden_dim 300  --num_layer_con 1 --hidden_dim_con 300 --embed_size 300 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'aaai-19_whole' --data_path '../data/target_aaai-19_whole/'
  • Results will be displayed in the console
  • The final test result will be stored in "./TEST_run_result.txt"

โ€ป hyper parameters

  • major parameters: edit from the training script
  • other parameters: edit from "./params.py"

Inference Phase

  • each source code folder contains an inference script
  • you need to modify the "model_path" in the "eval_AHDE.sh" to a proper path

    << for example >>
    evaluate test dataset with AHDE model and "whole" method

	src_whole$ sh eval_AHDE.sh
  • Results will be displayed in the console
  • scores for the testset will be stored in "./output.txt"

Dataset Statistics

  • whole case

    data Samples tokens (avg)
    headline
    tokens (avg)
    body text
    train 1,700,000 13.71 499.81
    dev 100,000 13.69 499.03
    test 100,000 13.55 769.23
  • Note

    We crawled articles for "dev" and "test" dataset from different media outlets.

Newly introduced dataset (English version)

  • We create an English version of the dataset, nela-17, using NELA 2017 data. Please refer to the dataset repository [link].
  • If you want to run our model (AHDE) with the nela-17 data, you can use the preprocessed dataset that is compatible with our code.

    cd data
    sh download_processed_dataset_nela-17.sh

  • training script (refer to the "train_reference_scripts.sh")
python AHDE_Model.py --batch_size 64 --encoder_size 200 --context_size 50 --encoderR_size 25 --num_layer 1 --hidden_dim 100  --num_layer_con 1 --hidden_dim_con 100 --embed_size 300 --use_glove 1 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'nela-17_whole' --data_path '../data/target_nela-17_whole/'

Other implementation (pytorch version)

cite

  • Please cite our paper, when you use our code | dataset | model

@inproceedings{yoon2019detecting,
title={Detecting Incongruity between News Headline and Body Text via a Deep Hierarchical Encoder},
author={Yoon, Seunghyun and Park, Kunwoo and Shin, Joongbo and Lim, Hongjun and Won, Seungpil and Cha, Meeyoung and Jung, Kyomin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={791--800},
year={2019}
}

detecting-incongruity's People

Contributors

david-yoon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.