GithubHelp home page GithubHelp logo

stanojevic / fast-mst-algorithm Goto Github PK

View Code? Open in Web Editor NEW
15.0 2.0 2.0 30 KB

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

License: MIT License

Python 100.00%

fast-mst-algorithm's Introduction

Fast MST Algorithm

Implementation of the fast algorithm for Single-Root Maximum Spanning Tree by Stanojević and Cohen (EMNLP 2021).

new

Much faster implementation of this algorithm is available in the SynJax package. SynJax dependencies include JAX and other libraries, but for running the non-projective spanning tree algorithm from SynJax you only need to install Numba and Numpy. The implementation in SynJax is essentially the same as the one in this repository except for the Numba annotations that compile the algorithm down to machine code. The module in the SynJax repository that does maximum spanning tree parsing is located in synjax/_src/deptree_algorithms/deptree_non_proj_argmax.py. If you want a pure Python version of the code without any Numba dependencies then the repository you are currently in is probably what you need. To see speed improvements with SynJax take a look at the Figure 3 in the SynJax paper.

Installation

pip install git+https://github.com/stanojevic/Fast-MST-Algorithm

Usage

The implementation finds Maximum Spanning Tree. If you want minimum spanning tree instead you can provide negative weights. The implementation contains three components:

  • Tarjan's algorithm for finding unconstrained MST
  • Reweighting meta-algorithm for constraining MST to have only one ROOT edge (see reference below)
  • ArcMax optimization for speed improvements on easy inputs

Everything relevant for MST dependency parsing can be accessed trough fast_parse function as shown here:

>>> from mst import fast_parse
>>> import numpy as np

>>> W = np.random.rand(5, 5)

>>> fast_parse(W, one_root=False)
array([-1,  2,  0,  4,  0])

>>> fast_parse(W, one_root=True)
array([-1,  2,  0,  4,  2])

Input matrix weight [i, j] is interpreted the weight of arc going from i to j (i is the head while j is the dependent). Token 0 is treated at the root note of the MST (it doesn't have an incoming arc). Note that this order of head and dependent is different from the presentation in the paper.

References

@inproceedings{stanojevic-cohen-2021-root,
    title = "A Root of a Problem: Optimizing Single-Root Dependency Parsing",
    author = "Stanojevi{\'c}, Milo{\v{s}}  and Cohen, Shay B.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.823",
    pages = "10540--10557",
    abstract = "We describe two approaches to single-root dependency parsing that yield significant speed ups in such parsing. One approach has been previously used in dependency parsers in practice, but remains undocumented in the parsing literature, and is considered a heuristic. We show that this approach actually finds the optimal dependency tree. The second approach relies on simple reweighting of the inference graph being input to the dependency parser and has an optimal running time. Here, we again show that this approach is fully correct and identifies the highest-scoring parse tree. Our experiments demonstrate a manyfold speed up compared to a previous graph-based state-of-the-art parser without any loss in accuracy or optimality.",
}

fast-mst-algorithm's People

Contributors

danielleee avatar stanojevic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.