GithubHelp home page GithubHelp logo

asirem16 / topically-driven-language-model Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jhlau/topically-driven-language-model

0.0 1.0 0.0 141 KB

Tensorflow code to train TDLM

License: Apache License 2.0

Shell 1.81% Python 98.19%

topically-driven-language-model's Introduction

Requirements

  • python2.7 (development python3 code available in python3 branch; code still requires testing)
  • gensim: pip install gensim
  • tensorflow 0.8-0.12

Data Format

Running the code (example.sh)

Train a word2vec model using gensim. This step is optional, you'll only need to do this if you want to initialise TDLM with pre-trained embeddings. word2vec model settings are in the python file (word2vec.py)

python word2vec_train.py

Train a model; configurations/hyper-parameters are defined in tdlm_config.py

python tdlm_train.py

All test inferences are invoked with tdlm_test.py. E.g. to compute language and topic model perplexity

python tdlm_test.py -m output/toy-model/ -d data/toy-valid.txt --print_perplexity

Print topics (to topics.txt)

python tdlm_test.py -m output/toy-model/ -d data/toy-valid.txt --output_topic topics.txt

Infer topic distribution in documents (saved as a npy file)

python tdlm_test.py -m output/toy-model/ -d data/toy-valid.txt --output_topic_dist topic-dist.npy

Generate sentences conditioned on topics

python tdlm_test.py -m output/toy-model/ -d data/toy-valid.txt --gen_sent_on_topic topic-sents.txt

tdlm_test.py arguments:

usage: tdlm_test.py [-h] -m MODEL_DIR [-d INPUT_DOC] [-l INPUT_LABEL]
                    [-t INPUT_TAG] [--print_perplexity] [--print_acc]
                    [--output_topic OUTPUT_TOPIC]
                    [--output_topic_dist OUTPUT_TOPIC_DIST]
                    [--output_tag_embedding OUTPUT_TAG_EMBEDDING]
                    [--gen_sent_on_topic GEN_SENT_ON_TOPIC]
                    [--gen_sent_on_doc GEN_SENT_ON_DOC]

Given a trained TDLM model, perform various test inferences

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_DIR, --model_dir MODEL_DIR
                        directory of the saved model
  -d INPUT_DOC, --input_doc INPUT_DOC
                        input file containing the test documents
  -l INPUT_LABEL, --input_label INPUT_LABEL
                        input file containing the test labels
  -t INPUT_TAG, --input_tag INPUT_TAG
                        input file containing the test tags
  --print_perplexity    print topic and language model perplexity of the input
                        test documents
  --print_acc           print supervised classification accuracy
  --output_topic OUTPUT_TOPIC
                        output file to save the topics (prints top-N words of
                        each topic)
  --output_topic_dist OUTPUT_TOPIC_DIST
                        output file to save the topic distribution of input
                        docs (npy format)
  --output_tag_embedding OUTPUT_TAG_EMBEDDING
                        output tag embeddings to file (npy format)
  --gen_sent_on_topic GEN_SENT_ON_TOPIC
                        generate sentences conditioned on topics
  --gen_sent_on_doc GEN_SENT_ON_DOC
                        generate sentences conditioned on input test documents

Publication

Jey Han Lau, Timothy Baldwin and Trevor Cohn (2017). Topically Driven Neural Language Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, pp. 355--365.

topically-driven-language-model's People

Contributors

jhlau avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.