GithubHelp home page GithubHelp logo

vene / sparse-structured-attention Goto Github PK

View Code? Open in Web Editor NEW
220.0 7.0 37.0 104 KB

Sparse and structured neural attention mechanisms

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
attention-mechanism attention-mechanisms fused-lasso deep-learning deeplearning deep-neural-networks sparsity sparse segmentation

sparse-structured-attention's Introduction

Sparse and structured attention mechanisms

Build Status PyPI version


Efficient implementation of structured sparsity inducing attention mechanisms: fusedmax, oscarmax and sparsemax.

Note: If you are just looking for sparsemax, I recommend the implementation in the entmax.

Currently available for pytorch >= 0.4.1. (For older versions, use a previous release of this package.) Requires python >= 2.7, cython, numpy, scipy.

Usage example:

In [1]: import torch
In [2]: import torchsparseattn
In [3]: a = torch.tensor([1, 2.1, 1.9], dtype=torch.double)
In [4]: lengths = torch.tensor([3])
In [5]: fusedmax = torchsparseattn.Fusedmax(alpha=.1)
In [6]: fusedmax(a, lengths)
Out[6]: tensor([0.0000, 0.5000, 0.5000], dtype=torch.float64)

For details, check out our paper:

Vlad Niculae and Mathieu Blondel A Regularized Framework for Sparse and Structured Neural Attention In: Proceedings of NIPS, 2017. https://arxiv.org/abs/1705.07704

See also:

André F. T. Martins and Ramón Fernandez Astudillo From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification In: Proceedings of ICML, 2016 https://arxiv.org/abs/1602.02068

X. Zeng and M. Figueiredo, The ordered weighted L1 norm: Atomic formulation, dual norm, and projections. eprint http://arxiv.org/abs/1409.4271

sparse-structured-attention's People

Contributors

vene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparse-structured-attention's Issues

How to run tests

Please add some docs about how to run the tests presented in the paper.
Thank you!

Support for PyTorch 1.6.0

Hi,
Thanks for sharing your code!
I run the tests with PyTorch 1.6.0 and quite a lot of them are failing. It might be because PyTorch changed from version 1.1.0 on how they manage the bool tensors.

It would be nice if you have time to update your code so newest version of PyTorch

Best,
Pierre

About compiling in latest pytorch

Hi, thank you for sharing codes firstly.
I am considering to recompile the project using latest pytorch and other related packages.

It seems that there are some pre-compiled file like _fused_jv.pyx without source files.
My questions are then: Are they important? Is it possible to re-compile by just ignoring them without influencing the modules' functions?

Thank you very much!

Comparison to softmax with temperature

Very interesting work.
However, I noticed that neither the paper not the repo has results of softmax with a tunable temperature "T" on the softmax (it is writted as "gamma" in the paper's notation). Setting T < 1 will result in a sparser softmax output, and the optimal value for T can be determined on a held-out validation set. This type of temperature-softened (or hardened) softmax has been used often in areas like knowledge distillation.

I wonder if this is a meaningful baseline to compare sparsemax against?

Different results of SparseMax

Hi, thank you very much for providing the open source code!
However, I meet a problem when I use this code.
The model is not converged when I replace the Softmax function in attention with the Sparsemax function in my model.
Next, I try to use the Sparsemax function which is implemented by https://github.com/msobroza/SparsemaxPytorch, the model is converged as soon as possible.
So, I want to know whether some errors are in this code.
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.