GithubHelp home page GithubHelp logo

alexflanker / marseille Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vene/marseille

0.0 1.0 0.0 77 KB

Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

marseille's Introduction

marseille

mining argument structures with expressive inference (with linear and lstm engines)

What is it?

Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.

Read more about it in our paper,

Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.

If you find this project useful, you may cite us using:

@inproceedings{niculae17marseille,
  author={Vlad Niculae and Joonsuk Park and Claire Cardie},
  title={{Argument Mining with Structured SVMs and RNNs}},
  booktitle={Proceedings of ACL},
  year=2017
}

Requirements

Usage

(replace $ds with cdcp or ukp)

  1. download the data from http://joonsuk.org/ and unzip it in the subdirectory data, i.e. the path ./data/process/erule/train/ is valid.

  2. extract relevant subset of GloVe embeddings:

    python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt
  1. extract features:
    python -m marseille.features $ds

    # (for cdcp only:)
    python -m marseille.features cdcp-test
  1. generate vectorized train-test split (for baselines only)
    mkdir data/process/.../
    python -m marseille.vectorize split cdcp
  1. run chosen model, for example:
    python -m experiments.exp_train_test $ds --method rnn-struct --model strict

(for dynet models, set --dynet-seed=42 for exact reproducibility)

  1. compare results:
    python -m experiments.plot_test_results.py $ds

To reproduce cross-validation model selection, you also would need to run:

    python -m marseille.vectorize folds $ds

Running a model on your own data:

If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.

  1. download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:
    export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp  #  path to CoreNLP
    export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus  #  path to WING-NUS parser

Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.

Otherwise, these files are generated the first time that a UserDoc object is instantiated for a given document. In particular, the step below will do this automatically.

  1. extract the features:
    python -m marseille.features user F G  # raw input must be in F.txt & G.txt

This is needed for the RNN models too, because the feature files encode some metadata about the document structure.

  1. predict, e.g. using the model saved in step 4 above:
    python -m experiments.predict_pretrained --method=rnn-struct \
    test_results/exact=True_cdcp_rnn-struct_strict F G

marseille's People

Contributors

vene avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.