GithubHelp home page GithubHelp logo

arnabkar / bert_mask Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ajitrajasekharan/bert_mask

0.0 1.0 0.0 1.1 MB

This is an example program illustrating BERTs masked language model.

License: MIT License

Python 99.00% Shell 1.00%

bert_mask's Introduction

bert_mask

This is an sample program illustrating BERTs masked language model. Given a sentence as input, we can specify any term (could be a subword of a word) to mask and examine its neighbors, where the neighbors are terms in BERT's vocab. We can use this for a variety of tasks

  • To fill in missing puncuations in a sentence.
  • To harvest phrases of a particular entity type (all phrases beloing to a particular entity type, are likely to share common neighbor terms in the top k neighbors in a sentence a term of that entity type occurs).
  • In general any task where the sentence context of a word/phrase would be useful.

Install steps

Usage

  • python mask_word.py (default uses bert-base-cased model. To use custom model or other model see python mask_word.py -h for options)

  • To mask a word just type in "entity" in the sentence in the place of a word (this is useful especically if the input word could potentially break into subwords)

  • To mask a phrase, just input the term "entity" in the sentence in place of it

  • To mask a specific subword of a word, type in full sentence, and then used the tokenized output that is displayed to choose the speicific subword to mask

Sample outputs

A sentence "He went to prison cell with his cell phone to extract blood cell samples from inmates" with the word cell having different senses.

Output of mask_word.py - 1 of 3

Output of mask_word.py - 2 of 3

Output of mask_word.py - 3 of 3

The neighbors for the word "cell" in the sentence above are different for the different contexts. Note all displayed neighbors are words in BERT vocab. This test was done using pretrained model - bert-base-cased

Data files.

BERT vector and vocab files

https://drive.google.com/file/d/1X1mE8OZVnYZnFXgZx7Wfaop7_pOLDGnP/view?usp=sharing https://drive.google.com/file/d/1vBEOR25_ajAoNmtgoy-TFJBX-4wj5WF_/view?usp=sharing

Roberta vector and vocab files

https://drive.google.com/file/d/1izKfjzqCf1QEifSMsnaDmZt7NVe0UfpU/view?usp=sharing https://drive.google.com/file/d/1z8gz1MPS4AmagriKlw7cMVPGIpQrJW-e/view?usp=sharing

License

MIT License

bert_mask's People

Contributors

ajitrajasekharan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.