GithubHelp home page GithubHelp logo

simonucl / htcinfomax Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ringbdstack/htcinfomax

0.0 0.0 0.0 100.62 MB

The code for our NAACL 2021 paper "HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization".

Python 84.48% Jupyter Notebook 15.52%

htcinfomax's Introduction

HTCInfoMax

The code for our NAACL 2021 paper "HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization".

Requirements

  • Python >= 3.6
  • torch >= 0.4.1
  • numpy >= 1.17.4

Preparation before train the model

Data preprocess

dataset

  • Please get the original dataset of RCV1-V2 and WoS
  • use data.preprocess_rcv1_train.py and data.preprocess_rcv1_test.py to preprocess the RCV1-V2 dataset for hierarchical text classification.
  • use data.preprocess_wos.py to preprocess the WoS dataset for hierarchical text classification.

Generate prior probability

  • run helper.hierarchy_tree_statistic_rcv1.py to generate the prior probability between parent-child pair of the label hierarchy in the training set of RCV1-V2.
  • run helper.hierarchy_tree_statistic_wos.py to generate the prior probability between parent-child pair of the label hierarchy in the training set of WoS.

Train

To train the model on RCV1-V2 or WoS dataset, use the configuration file "htcinfomax-rcv1-v2.json" or "htcinfomax-wos.json" under "config" folder. Specifically, modify the line 162/163 in the train.py to use corresponding configuration file for the two datasets.

Then run the train.py file as follows:

python train.py

Citation

If you find our paper or code is helpful for your work, please consider citing our NAACL 2021 paper, our paper is available at: https://www.aclweb.org/anthology/2021.naacl-main.260/

@inproceedings{deng-etal-2021-htcinfomax,
    title = "{HTCI}nfo{M}ax: A Global Model for Hierarchical Text Classification via Information Maximization",
    author = "Deng, Zhongfen  and
      Peng, Hao  and
      He, Dongxiao  and
      Li, Jianxin  and
      Yu, Philip",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.260",
    doi = "10.18653/v1/2021.naacl-main.260",
    pages = "3259--3265",
}

Acknowledgements

Our code is based on HiAGM, we thank the authors of HiAGM for their open-source code.

htcinfomax's People

Contributors

zhongfendeng avatar simonucl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.