GithubHelp home page GithubHelp logo

separius / awesome-sentence-embedding Goto Github PK

View Code? Open in Web Editor NEW
2.2K 77.0 259.0 289 KB

A curated list of pretrained sentence and word embedding models

License: GNU General Public License v3.0

Python 100.00%
wordembedding word-embeddings sentence-embeddings nlp pretrained-embedding awesome-list pretrained-models unsupervised-learning sentence-representations contextualized-representation

awesome-sentence-embedding's People

Contributors

basicv8vc avatar oborchers avatar separius avatar zhanqiuzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-sentence-embedding's Issues

Sentence embeddings for Arabic

Hello @Separius ,
We've leveraged SIF and AraVec for computing sentence embeddings for Arabic, which we termed it as AraSIF. The paper is yet to appear in official ACL-2019 proceedings. However, please feel free to add it to this list. You can find the bibtex in AraSIF repo.

Thanks!

New Model: LaBSE

I recently found a recent sentence embedding model that isn't on this list. If you think it's interesting, it might be worthwhile to include it ๐Ÿ™‚

  • Name: LaBSE
  • Blog Post: https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html
  • Summary: For LaBSE, we leverage recent advances on language model pre-training, including MLM and TLM, on a BERT-like architecture and follow this with fine-tuning on a translation ranking task. A 12-layer transformer with a 500k token vocabulary pre-trained using MLM and TLM on 109 languages is used to increase the model and vocabulary coverage. The resulting LaBSE model offers extended support to 109 languages in a single model.
  • Pretrained Model: https://tfhub.dev/google/LaBSE/1

Thanks for creating this resource! ๐Ÿ˜„

New Model: BERT-Flow

I just found a recent sentence embedding model that doesn't show up on this list. If you think it's interesting enough, it might make sense to include here ๐Ÿ™‚

  • Name: BERT-Flow
  • Paper: https://arxiv.org/abs/2011.05864
  • Summary (from abstract): Transforms anisotropic sentence embedding distribution from BERT to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks.
  • Code: https://github.com/bohanli/BERT-flow

Add a new sentence embedding method: DeCLUTR

Hi there,

Not sure if this list is still being maintained but if so, might I (shamelessly) recommend adding DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. It is an unsupervised method for learning high-quality sentence embeddings that we recently developed. It is similar to Sentence Transformers in that it pre-trains a transformer-based language model, but because it is unsupervised, you do not need any labels!

CLIP?

I'm curious how CLIP performs when treated simply as a sentence embedding. Is it competitive?

General Framework

Hi Separius,
As you have described how the sentence embedding work, I have some questions about applying a model into the framework. For example, for doc2vec, what is the encoder to generate contextualized embeddings, and what is the pooling method for those embeddings to build the sentence embedding? Also, for CNN encoder, what output of the encoder can be viewed as the contextualized embeddings? If the framework does not apply for these models, may I ask why ?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.